Strata 2012: Making Data Work

It’s been over a month since Strata 2012. Lest they meet the same fate as scribbles from conferences long past, I’m putting my notes here instead of leaving them in whatever notebook I happened to be toting around that week.

The biggest a-ha moment, and one that I’ll be writing about in the future, came from Ben Goldacre’s keynote, when he compares big data practitioners to drunks looking for car keys only where the light shines. We focus on the data that’s available without asking, “what’s missing?” Plus, it’s fun to hear someone with a British accent say “blobogram.”

Continue Reading →

Comments { 0 }

Protovis Visualization for Older IE

Two days ago, I posted my Flare visualizations—based on a Flash/Actionscript library–explaining that we can’t yet use the D3 visualization library because it outputs SVG, which isn’t supported by older versions of IE.

The very next day, Hjalmar Gislasun of DataMarket gave a talk at O’Reilly’s Strata Conference. DataMarket faced the same problem back in 2010 after reviewing over 100 visualization libraries and choosing Protovis (a predecessor of D3). Not wanting to exclude the 20% of the world still using IE 7/8, they developed protovis-msie, a tool to convert Protovis SVG output to VML, a vector format understood by older browsers.

And…they open sourced it. So Protovis is now on the table for use at National Priorities Project. Thank you, DataMarket!

Like Flare, Protovis is no longer under active development. That said, it still has an active user community (unlike Flare). And the output won’t be Flash, so iOS is back on the table.

DataMarket’s strategy is to continue using Protovis until most IE users are on version 9 (which supports SVG) and then switch over to D3. It was refreshing to hear browser support strategies from people developing visualizations for commercial use; they don’t have the luxury of ignoring IE 8, which is tempting to do but not viable in the real world.

Comments { 0 }

Data Visualizations with Flare

Two weeks ago, the White House released President Obama’s FY 2013 budget request. Using the numbers scrubbed by NPP’s crack research team, I created a few visualizations using the Actionscript/Flash-based Flare data visualization library (h/t Washington Post and Nathan Yau).

Flare was ideal because it includes sample code for a stacked area chart with tooltips–exactly what we wanted. I had some concerns about the Flash output, but many of our website visitors use browsers that don’t support SVG (IE8), so tools like D3 aren’t an option just yet.

Here’s a preview of what we’ll include (not the final version).  The first example is built with normalized data:

Apologies, but you need Flash to view this content.

Get Adobe Flash player

For the second example (total federal spending by category), we wanted to convey the overall size of the budget over time, so we didn’t normalize the data. As a result, the huge numbers caused some formatting issues, but it’s still an interesting story–especially the 2009 spike. Also note the rise in healthcare spending over time: 7% of the budget in 1976 and 25% in 2013.

Apologies, but you need Flash to view this content.

Get Adobe Flash player

Flare makes it easy to lay out the data and create the animated transitions, and after making a few tweaks to the Flare library and the stacked area sample code, I’m happy with the way these turned out.

That said, I’d be reluctant to use Flare again. It isn’t being actively developed, and there’s nowhere to turn for help when you get stuck (also, the whole Flash thing). Visualizations are evolving, and the tools to create them–no matter how good they are–evolve too.

Comments { 1 }

No Excuses for Ugly Excel Charts

2/2/2012: Corrected the revised bar chart by setting the horizontal axis minimum to zero. Thanks to Jon Peltier for catch.

Excel remains the de-facto graphing tool at National Priorities Project. A simple chart is often the best way to convey information about federal spending and budgeting, and Excel is the common language among our researchers and IT team.

Using Excel, however, is no excuse for ignoring style and the best practices of information display. So many organizations put out amazing, well-researched publications and then tack on default Excel graphs as an afterthought. But graphs are often what people look at first, and they deserve to be first-class citizens in the editing process.

I created some Excel chart templates for NPP, drawing on two sources for inspiration and practical advice: the classic Visual Display of Quantitative Information by Edward Tufte and The Wall Street Journal Guide to Information Graphics by Dona Wong.

Tufte is big on eliminating “unnecessary ink” that distracts from the information, and Wong advocates requiring the least amount of work on the reader’s part. With their advice in mind, I modified Excel’s default bar chart from this:

Excel bar chart - default

Bar chart: Excel default

To this:

Excel bar chart - modified

Bar chart: new template

  • Smaller gap between bars
  • Don’t make readers guess the numbers; if possible, label the bars directly
  • Direct labeling means you don’t need the noisy gridlines or even the x-axis
  • Remove the y-axis tick marks for even more noise reduction
  • Get rid of those zeros by showing data in millions or billions
  • Make sure the entire length of the bars is shown (in this case, by setting the horizontal axis minimum to zero). HT Jon Peltier.

The pie chart got a similar treatment. The Excel default:

Excel pie chart - default

Pie chart: Excel default

The new template:

Excel pie chart - modified

Pie chart: new template

  • Label the pie slices directly—don’t make people use a legend to decode
  • Avoid the default Office color palette and develop your own (ours is based on colors from our website)
  • A white line between pie slices emphasizes the boundaries

Excel isn’t perfect, but it’s out there in the world, and you can’t ignore it. Luckily, a little extra effort goes a long way.

Comments { 2 }

Be Accessible to the Dummy Demographic

I’ve always considered writing to be one of my professional strengths, and I’m passionate about making technology accessible to people. So when writing about government data for National Priorities Project, I try to make sure the language is easy to understand and non-wonky (wonk is the political equivalent of  nerd). After all, our mission statement says that NPP “makes complex budget information transparent and accessible.”

So when writing a brief paragraph to introduce a new dataset in the Federal Priorities Database, I thought I nailed it:

Federal tax collections represent the amount of federal taxes paid in a year.  The IRS publishes this information by state, breaking it into ten categories.  Because part of our job is to make government-published data more useful, we condensed these ten categories into three: taxes paid by businesses, taxes paid by individuals, and total taxes collected. We believe that these broad categories better reflect what most people want to know about tax revenue and are less confusing than the original data. In fact, the individual taxes calculation played an integral part in a recent publication, Federal Spending Keeps Iowa, New Hampshire Afloat.

I have a few trusted readers in the office, so I sent the text around for some quick feedback. About five minutes later, one of them appeared at my desk, saying that my language was non-accessible and “made her eyes roll back in her head.” Wow. Hard to hear, but she followed up with some specific suggestions that resulted in a much improved, non-eye-rolling version:

We recently took some data from the IRS and made it even better. We call it Federal Tax Collections, and it shows federal taxes collected in a year. Although the IRS helpfully provides these numbers for each state, they break the amounts into ten categories, not all of which are useful. For example, do any but the wonkiest people need to know how much railroad retirement tax was collected in 2010? To make things easier, we condensed these categories into taxes paid by businesses, taxes paid by individuals, and total taxes paid. We think these broader groupings are less confusing and more practical. Our recent publication, Federal Spending Keeps Iowa, New Hampshire Afloat, shows the individual taxes category in use.

Thank goodness for honest feedback! It seems obvious, but the lessons here are easy to overlook when you’re cranking through a to-do list.

  • No matter what kind of deadline you think you have, it’s always worth the time to get other eyes on something your organization will publish, even if it’s just a small blurb, and even if your colleagues are busy. After all, it’s in everyone’s best interest to show the world your best stuff.
  • Make sure your trusted readers include, as my friend calls herself, the dummy demographic, a term of affection for readers who aren’t experts in your subject matter.
Comments { 0 }

The Lazy Card

lazy

The administrator of a blog I follow recently asked the question, “should I force our blog’s authors to use Markdown?”

I then read a completely unrelated post on another blog that advised WordPress users to get rid of the visual text editor because “it makes you lazy” and you should “force yourself to learn some basic HTML.”

When I got into a heated discussion with another developer about the above two items, he used the same word to describe non-Markdown/HTML-writing, WYSIWYG-dependent online content authors: lazy.

This isn’t the first argument I’ve had with a technologist who likes to play the lazy card, and it won’t be the last. I’m not disputing that everyone in today’s workforce should always be learning; the days of doing the same job the same way for thirty years are over. But to imply that a blogging co-worker isn’t holding up her end of the learning bargain because she doesn’t want to learn HTML or Markdown is arrogant.

As technologists, we explore and experiment with new technology. Our content-creating colleagues presumably explore and experiment in their respective areas of expertise. Who are we to dictate that they should increase their cognitive overhead by worrying about valid XHTML markup?

Yes, WYSIWYG editors are pretty terrible. They puke out Microsoft Word detritus and let you change the font color to cyan. Forcing people to write in Markdown, however, shifts a technical problem from the technologist to the user, which is the opposite of ideal.

This rant isn’t about blog authors or Markdown. It’s about acknowledging that technology isn’t everyone’s primary concern, nor should it be.

Of course there are appropriate times to force technology changes. But even in those situations the resistors aren’t lazy–they just have a different job than you. In today’s hectic and stressed workplace, shouldn’t we give colleagues the benefit of the doubt and help them succeed with technology rather than slapping a label on them?

Comments { 1 }

Python, Django, MySQL & Win 7

When starting to learn Python and Django, my goal was to set up a robust development environment similar to what we use at National Priorities Project: isolated virtual environments, MySQL, and tools like pip and iPython. Stubbornly, I resolved to make it all work on Windows.

I achieved the goal, but not without a lot pain. If you’re a Windows user getting started with Python/Django, you might have an easier time installing a virtual Linux machine.

Here’s a re-cap of the Windows-specific instructions for installing Python, Django, MySQL, and a few necessary packages and tools.

Parting thoughts:
  • I abandoned the Cygwin approach after running into trouble with Cygwin’s Python install vs the Windows Python install.
  • People have good things to say about ActivePython as a tool to help Python developers to avoid headaches.
Comments { 4 }

Python, Django, & MySQL on Windows 7, Part 5: Installing MySQL

This is the fifth and final post in a  dummies guide to getting stared with Python, Django, & MySQL on Windows 7.

By now, you should have Django installed into a virtual environment.  These tutorials aren’t meant to cover building a django app, just to point out the quirks involved with getting a project up and running on Windows.  These tutorials also assume you want to construct real applications using a real development environment.

To that end, you’ll want a heftier database than sqlite.  We use MySQL at the office, so these instructions cover installing it and using it with Django.

Install MySQL

  1. Download and install MySQL.
  2. Once MySQL is installed, proceed through the configuration wizard. Check Include Bin Directory in Windows PATH box.
  3. When prompted, set a password for the MySQL root account.
  4. Once the installation wizard is done, open a command window and log in to MySQL with the root account: mysql -uroot -p (you’ll be prompted for the password).
  5. After logging in, run the following commands to create a database, create a user for your Django project, and grant the user database access.

Install MySQL-python

You’ll need the MySQL-python package, a Python interface to MySQL.

  1. Download the windows MySQL-python distribution here.  The author has some instructions about the appropriate version; assuming a 32-bit version of Python 2.7, you’d download this package (.exe).
  2. After downloading, do not run the Windows installer. Doing so will install MySQL-python to your root python, which virtual environments created via –no-site-packages won’t be able to see.
  3. Instead, install the downloaded package to your virtual environment by using easy_install, which can install from Windows binary installers:
    easy_install file://c:/users/you/downloads/mysql-python-1.2.3.win32-py2.7.exe (modify to reflect the location of the downloaded installer and its name).installing mysql-python package via easy_install

Configure Django

Next, you’ll need to update the database-related settings of your Django project.

  1. From the directory of your Django project, open settings.py using your favorite editor.
  2. Update the default key in the DATABASES dictionary.  Set ENGINE to django.db.backends.mysql and set NAME, USER, and PASSWORD to the database name, username, and password you chose when installing MySQL.  See Part I of the Django tutorial for more information about database settings.
  3. Open a command window, activate your virtual environment, and change to the directory of your Django project.
  4. Type python manage.py syncdb. This command creates the underlying tables required for your Django project.
    syncdb output
  5. If the syncdb worked, you have Python, Django, and MySQL communicating in harmony.  Congratulations!  You can now proceed through the Django tutorial and create your first application.
Comments { 19 }

Python, Django, & MySQL on Windows 7, Part 4: Installing Django

This is the fourth post in a  dummies guide to getting stared with Python, Django, & MySQL on Windows 7.

We’re finally ready to install Django, a popular Web-development framework. Detailed instructions for building out a Django site are beyond the scope of this humble tutorial; try The Definitive Guide to Django or Django’s online Getting started docs for that.

These directions will simply make sure you can get up and running.

Installing Django

  1. Open a command window.
  2. Go to (or create) the virtual environment you’ll be using for your django project. For this example, I created a virtualenv called django-tutorial: virtualenv django-tutorial --no-site-packages
  3. Install django: pip install django
    install django 
  4. Start an interactive interpreter by typing python (or iPython, if you’ve made it virtual environment-aware).
  5. Test the install by importing the django module and checking its version: https://gist.github.com/1177372
  6. Create a new directory to hold your Django projects and code. Change to it.
  7. Think of a name for your first Django project and create it by running the following command: python -m django-admin startproject [projectname].
    If that doesn’t work, try python -m django-admin startproject [projectname] (thanks JukkaN!)
    Important: most Django docs show django-admin.py startproject [projectname] to start a new project, which can cause import errors and other trouble for Windows users. See this stackoverflow thread for details.
  8. You should now see the project’s folder in your Django directory:django project folder
  9. Change into the new project folder.
  10. Test the new project by typing python manage.py.  Manage.py is Django’s command line utility; you should see a list of its available subcommands.
  11. A further test is to start up Django’s development server: python manage.py runserver. You should see something like this:
    django runserver

If you’ve made it this far, you’ve successfully installed Django and created your first project.

Next up is Part 5: Installing MySQL.

Comments { 9 }

Python, Django, & MySQL on Windows 7, Part 3: iPython & Virtual Environments

This is the third post in a dummies guide to getting started with Python, Django, & MySQL on Windows 7.

The last installment covered setting up virtual environments.

Part 3: iPython and Virtual Environments

iPython is not virtual environment-aware

iPython is a valuable tool, but you’ll have to tweak it to work with virtual environments. By default, iPython isn’t aware of a virtual environment’s packages.

For example, if you install a package into a virtualenv and try to use it via Python’s built-in shell, everything works:

import via python

Try the same thing with iPython, however, and you get an import error:

import via ipython

A little intervention is required to make iPython virtual environment-aware. If you’re okay with using Python’s built-in interactive shell instead of iPython, skip ahead to Part 4: Installing Django.

Install iPython to the virtual environment

The easiest and most obvious solution to make iPython work with a virtual environment is to install it to the virtual environment.

However, you will have to do this for each environment you work in.  Furthermore, if you created the virtual environment without the –no-site-packages option (which tells virtualenv not to inherit anything from global site-packages), you may get an “already installed” message.

Modify iPython config file

Another way to ensure iPython behaves with virtual environments is to use its configuration file to check for an active virtual environment and modify the import path accordingly. The directions below assume iPython is installed globally but not in any of your virtual environments.

    1. Open a command window.
    2. Instruct iPython to generate a sample configuration file (called ipython_config.py) by typing ipython profile create
    3. The sample file should now be in your iPython profile folder. On Windows, this is [your user folder]\.ipython\profile_default.
    4. Open ipython_config.py in a text editor and add the following code to the bottom. I relied heavily on code from here and here, making a few tweaks.

      https://gist.github.com/1176035

    5. Activate a virtual environment.
    6. Start up iPython and look for the output confirming the current active virtual environment:ipython & virtualenv
The next installment is Part 4: Installing Django.
Comments { 8 }