Archive | Programming RSS feed for this section

Useful Pandas Snippets

Even after almost two years of working with Pandas, the incredibly useful Python data analysis library, I still need to look up syntax for some common tasks. Finally got around to putting everything on a single “useful Pandas snippets” cheat sheet: these are essential tools for munging federal budget data.
Continue Reading →

Comments { 33 }

IPython, IPython Notebook, Anaconda, and R (rpy2)


IPython and the IPython Notebook have vast potential beyond their traditional use in the Python scientific programming community. Specifically, the Notebook is a great learning tool, and that’s something I plan to highlight in an upcoming talk at the New England Regional Developer (NERD) Summit.

Because the NERD mission is to reduce barriers for people entering IT (as opposed to having them waste years of their lives untangling Python package dependencies), the plan is to demo everything using the Anaconda Python installation. Overkill maybe, since even on Windows installing Python and IPython isn’t too terrible. That said:

  • It’s important for those who aren’t proficient on the command line to jump right in.
  • If people get hooked on Python and want to do more, they’ll have the important packages at the ready (Anaconda includes some important machine learning packages that are missing from the free version of its competitor, Enthought Canopy).

To understand the worst case pain scenario before recommending Anaconda to beginners, I installed it on Windows. So far, so good.

The only snafu I’ve hit so far is trying to get the IPython–>R integration working. This isn’t really a feature for beginners, but I want to show it because R is heavily used at local universities.

To be clear, this mess isn’t an Anaconda problem. However, if you’re using Anaconda on Windows and want to use IPython’s rmagic extension, here’s how.

  1. Install R.
  2. Add the directory with the R executables to your PATH. It has to be the directory with executables, not the main R folder. On a 64-bit machine, the directory is something like C:\Program Files\R\R-3.1.0\bin\x64.
  3. Add these two environment variables (h/t):
    • R_HOME (path of the main R folder, e.g. C:\Program Files\R\R-3.1.0)
    • R_USER (your Windows username).
  4. Restart Windows.
  5. Modify your Windows Python install registry key to point to your Anaconda Python location instead of the default Python installation (h/t). If you followed the default Anaconda install prompts (which installs for the current user, rather than all users on the machine), you’d change HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore\2.7\InstallPath.
  6. Download and install Dr. Gohlke’s rpy2 Windows binary (grab version 2.4.0 or higher):
  7. Change the registry key from step 5 back to it’s original value.
  8. Open an IPython notebook or terminal and load the rmagic extension:
    %load_ext rpy2.ipython
  9. You should be able to test everything out using this sample code.

For comparison, these are the Ubuntu instructions.

  1. Install R:
    sudo apt-get install r-base r-base-core r-base-html
  2. Install rpy2:
    pip install rpy2
Comments { 3 }

Revisiting Python on Windows

Three years ago I wrote a series of tutorials for setting up Python/Django on Windows.

Despite taking great pains to make it all work and then meticulously documenting the details, I abandoned that idea in favor of an Ubuntu VirtualBox soon after those posts went live. It’s a long story, but at some point you need to cut your losses and stop throwing good time after bad.

But this summer marks a return to Windows. I decided our data intern should learn Python or R, so he can experience a world beyond the proprietary stats packages they use at colleges. We decided on Python and decided that Windows is the best option; no need to add Linux to his already long list of things to pick up.

To test things out for him, I crossed my fingers and installed Enthought Canopy, a canned Python environment for data viz and analysis, hoping it would take away the pain of installing Python packages on Windows. For the most part, it did.

Canopy (which has a free version) makes it easy to get up and running quickly. If you’re getting started with Python data analysis, use it, and don’t spend hours of  your life installing all the packages yourself. That way lies madness.

That said, some of the latest and greatest Python data viz packages aren’t included in the Canopy distribution. If you want to learn those, you’ll have to install them yourself, which is where things can go awry. For example, if you’re on Windows and the package you’re installing needs a 64-bit C compiler, you have to follow these 6 simple steps to get one:

The Python data ecosystem is extremely compelling, but there’s still too many barriers for a beginner to jump right in, especially on Windows.

Comments { 1 }

Running iPython Notebook From Vagrant/VirtualBox

Updated 9/1/2014 to add a few more IPython Notebook dependencies.

Honestly, you’d think it would be easy to remember these four simple steps, but I never seem to. Since IPython notebook is pretty much the greatest thing since sliced bread, here’s how to run it in Vagrant/VirtualBox and access the notebook from the host machine’s browser.

  1. Make sure the prerequisite packages are installed in the virtual machine’s Python environment:*
    • jinja2
    • sphinx
    • pyzmq
    • pygments
    • tornado
    • ipython
  2. Make sure your Vagrant file is forwarding port 8888 to port 8888 (or whatever you’d like to use):
  3. In your virtual machine, run the IPython notebook server: ipython notebook ––ip=
  4. View the notebook in the host’s browser: http://localhost:8888

*Alternately, you can pip install ipython[notebook] to install IPython and all Notebook dependencies. I got errors when doing this via zsh, though it worked after switching to Bash.

Update 11/6/2014: Praful Mathur left a good tip for using the pip install ipython[notebook] syntax with zsh. You have to escape the hard brackets: pip install ipython\[all\]. Thanks!

Comments { 11 }

Strata 2012: Making Data Work

It’s been over a month since Strata 2012. Lest they meet the same fate as scribbles from conferences long past, I’m putting my notes here instead of leaving them in whatever notebook I happened to be toting around that week.

The biggest a-ha moment, and one that I’ll be writing about in the future, came from Ben Goldacre’s keynote, when he compares big data practitioners to drunks looking for car keys only where the light shines. We focus on the data that’s available without asking, “what’s missing?” Plus, it’s fun to hear someone with a British accent say “blobogram.”

Continue Reading →

Comments { 0 }

Protovis Visualization for Older IE

Two days ago, I posted my Flare visualizations—based on a Flash/Actionscript library–explaining that we can’t yet use the D3 visualization library because it outputs SVG, which isn’t supported by older versions of IE.

The very next day, Hjalmar Gislasun of DataMarket gave a talk at O’Reilly’s Strata Conference. DataMarket faced the same problem back in 2010 after reviewing over 100 visualization libraries and choosing Protovis (a predecessor of D3). Not wanting to exclude the 20% of the world still using IE 7/8, they developed protovis-msie, a tool to convert Protovis SVG output to VML, a vector format understood by older browsers.

And…they open sourced it. So Protovis is now on the table for use at National Priorities Project. Thank you, DataMarket!

Like Flare, Protovis is no longer under active development. That said, it still has an active user community (unlike Flare). And the output won’t be Flash, so iOS is back on the table.

DataMarket’s strategy is to continue using Protovis until most IE users are on version 9 (which supports SVG) and then switch over to D3. It was refreshing to hear browser support strategies from people developing visualizations for commercial use; they don’t have the luxury of ignoring IE 8, which is tempting to do but not viable in the real world.

Comments { 0 }

Data Visualizations with Flare

Two weeks ago, the White House released President Obama’s FY 2013 budget request. Using the numbers scrubbed by NPP’s crack research team, I created a few visualizations using the Actionscript/Flash-based Flare data visualization library (h/t Washington Post and Nathan Yau).

Flare was ideal because it includes sample code for a stacked area chart with tooltips–exactly what we wanted. I had some concerns about the Flash output, but many of our website visitors use browsers that don’t support SVG (IE8), so tools like D3 aren’t an option just yet.

Here’s a preview of what we’ll include (not the final version).  The first example is built with normalized data:

[kml_flashembed publishmethod=”static” fversion=”8.0.0″ movie=”” width=”620″ height=”550″ targetclass=”flashmovie”  fvars=” datafile = “]

Apologies, but you need Flash to view this content.

Get Adobe Flash player


For the second example (total federal spending by category), we wanted to convey the overall size of the budget over time, so we didn’t normalize the data. As a result, the huge numbers caused some formatting issues, but it’s still an interesting story–especially the 2009 spike. Also note the rise in healthcare spending over time: 7% of the budget in 1976 and 25% in 2013.

[kml_flashembed publishmethod=”static” fversion=”8.0.0″ movie=”” width=”750″ height=”600″ targetclass=”flashmovie”  fvars=” datafile = “]

Apologies, but you need Flash to view this content.

Get Adobe Flash player


Flare makes it easy to lay out the data and create the animated transitions, and after making a few tweaks to the Flare library and the stacked area sample code, I’m happy with the way these turned out.

That said, I’d be reluctant to use Flare again. It isn’t being actively developed, and there’s nowhere to turn for help when you get stuck (also, the whole Flash thing). Visualizations are evolving, and the tools to create them–no matter how good they are–evolve too.

Comments { 1 }

Python, Django, MySQL & Win 7

When starting to learn Python and Django, my goal was to set up a robust development environment similar to what we use at National Priorities Project: isolated virtual environments, MySQL, and tools like pip and iPython. Stubbornly, I resolved to make it all work on Windows.

I achieved the goal, but not without a lot pain. If you’re a Windows user getting started with Python/Django, you might have an easier time installing a virtual Linux machine.

Here’s a re-cap of the Windows-specific instructions for installing Python, Django, MySQL, and a few necessary packages and tools.

Parting thoughts:
  • I abandoned the Cygwin approach after running into trouble with Cygwin’s Python install vs the Windows Python install.
  • People have good things to say about ActivePython as a tool to help Python developers to avoid headaches.
Comments { 4 }

Python, Django, & MySQL on Windows 7, Part 5: Installing MySQL

This is the fifth and final post in a  dummies guide to getting stared with Python, Django, & MySQL on Windows 7.

By now, you should have Django installed into a virtual environment.  These tutorials aren’t meant to cover building a django app, just to point out the quirks involved with getting a project up and running on Windows.  These tutorials also assume you want to construct real applications using a real development environment.

To that end, you’ll want a heftier database than sqlite.  We use MySQL at the office, so these instructions cover installing it and using it with Django.

Install MySQL

  1. Download and install MySQL.
  2. Once MySQL is installed, proceed through the configuration wizard. Check Include Bin Directory in Windows PATH box.
  3. When prompted, set a password for the MySQL root account.
  4. Once the installation wizard is done, open a command window and log in to MySQL with the root account: mysql -uroot -p (you’ll be prompted for the password).
  5. After logging in, run the following commands to create a database, create a user for your Django project, and grant the user database access.

Install MySQL-python

You’ll need the MySQL-python package, a Python interface to MySQL.

  1. Download the windows MySQL-python distribution here.  The author has some instructions about the appropriate version; assuming a 32-bit version of Python 2.7, you’d download this package (.exe).
  2. After downloading, do not run the Windows installer. Doing so will install MySQL-python to your root python, which virtual environments created via –no-site-packages won’t be able to see.
  3. Instead, install the downloaded package to your virtual environment by using easy_install, which can install from Windows binary installers:
    easy_install file://c:/users/you/downloads/mysql-python-1.2.3.win32-py2.7.exe (modify to reflect the location of the downloaded installer and its name).installing mysql-python package via easy_install

Configure Django

Next, you’ll need to update the database-related settings of your Django project.

  1. From the directory of your Django project, open using your favorite editor.
  2. Update the default key in the DATABASES dictionary.  Set ENGINE to django.db.backends.mysql and set NAME, USER, and PASSWORD to the database name, username, and password you chose when installing MySQL.  See Part I of the Django tutorial for more information about database settings.
  3. Open a command window, activate your virtual environment, and change to the directory of your Django project.
  4. Type python syncdb. This command creates the underlying tables required for your Django project.
    syncdb output
  5. If the syncdb worked, you have Python, Django, and MySQL communicating in harmony.  Congratulations!  You can now proceed through the Django tutorial and create your first application.
Comments { 19 }

Python, Django, & MySQL on Windows 7, Part 4: Installing Django

This is the fourth post in a  dummies guide to getting stared with Python, Django, & MySQL on Windows 7.

We’re finally ready to install Django, a popular Web-development framework. Detailed instructions for building out a Django site are beyond the scope of this humble tutorial; try The Definitive Guide to Django or Django’s online Getting started docs for that.

These directions will simply make sure you can get up and running.

Installing Django

  1. Open a command window.
  2. Go to (or create) the virtual environment you’ll be using for your django project. For this example, I created a virtualenv called django-tutorial: virtualenv django-tutorial --no-site-packages
  3. Install django: pip install django
    install django 
  4. Start an interactive interpreter by typing python (or iPython, if you’ve made it virtual environment-aware).
  5. Test the install by importing the django module and checking its version:
  6. Create a new directory to hold your Django projects and code. Change to it.
  7. Think of a name for your first Django project and create it by running the following command: python -m django-admin startproject [projectname].
    If that doesn’t work, try python -m django-admin startproject [projectname] (thanks JukkaN!)
    Important: most Django docs show startproject [projectname] to start a new project, which can cause import errors and other trouble for Windows users. See this stackoverflow thread for details.
  8. You should now see the project’s folder in your Django directory:django project folder
  9. Change into the new project folder.
  10. Test the new project by typing python is Django’s command line utility; you should see a list of its available subcommands.
  11. A further test is to start up Django’s development server: python runserver. You should see something like this:
    django runserver

If you’ve made it this far, you’ve successfully installed Django and created your first project.

Next up is Part 5: Installing MySQL.

Comments { 7 }