Hello, 18F!

As of next month, I’ll be a proud employee of the U.S. government.

In February I join the ranks of 18F, a digital startup-like team embedded in the federal government. Just shy of a year old, 18F collaborates with federal agencies to make digital products that serve the people. For so many reasons, this is a logical next career step:

  • 18F is mission-driven.
  • After organizing Hack for Western Mass for two years, this is a chance to be a professional civic hacker.
  • Some scary smart people from all over the country work at 18F.
  • I’ll get to work from Western Mass and maintain my outside-the-Beltway perspective.

At National Priorities Project (NPP), I championed the cause of better data about our government’s spending, and 18F is working with the Treasury Department on that. NPP uses several federal APIs in its apps, and 18F is helping to grow these services. NPP is a proponent of government transparency, and 18F works in the open.

In other words, this is an extraordinary opportunity to move from advocacy to execution. Thanks to everyone who encouraged me to take the leap, and hello to 18F. I can’t wait to join you.

Comments { 2 }

Goodbye, NPP

NPP Lunch, by ED and photographer Doug Hall

The NPP Dream Team by ED and photographer Doug Hall

Last month I left my role as Director of Data and Technology at National Priorities Project.

For more than 30 years, this small non-profit has made the complex U.S. budget understandable to ordinary people. As someone who likes breaking down barriers to entry, that mission resonated.

I’m grateful for the opportunity to contribute to NPP and proud of the work I did there, from open-sourcing data on the cost of tax breaks to tracking budget dollars in the states to advocating for better federal spending information. And I’m grateful that NPP supported my involvement in Western Mass data community projects like Hack for Western Mass and the Five College DataFest.

I highly recommend working on a small, mission-driven team at least once in your career—it’s life-changing. When the success or failure of your projects have a direct, meaningful impact on your organization, it’s a whole different ballgame. When you not only have to deliver the work but sell it too, you learn some skills. When generous people from around the country send money to support what you do every day, it’s humbling. Oh, and being nominated for a Nobel Peace Prize is also very cool.

Everything I learned at NPP led to the next logical, exciting step. I haven’t said anything publicly because it won’t be official until the background check is done, but if everything goes as planned, I’ll continue working for a better U.S.

Thanks, NPP, for four great years and for the invaluable work you do. See you at the next anniversary party!

Comments { 0 }

Useful Pandas Snippets

Even after almost two years of working with Pandas, the incredibly useful Python data analysis library, I still need to look up syntax for some common tasks. Finally got around to putting everything on a single “useful Pandas snippets” cheat sheet: these are essential tools for munging federal budget data.
Continue Reading →

Comments { 27 }

IPython, IPython Notebook, Anaconda, and R (rpy2)

IPython

IPython and the IPython Notebook have vast potential beyond their traditional use in the Python scientific programming community. Specifically, the Notebook is a great learning tool, and that’s something I plan to highlight in an upcoming talk at the New England Regional Developer (NERD) Summit.

Because the NERD mission is to reduce barriers for people entering IT (as opposed to having them waste years of their lives untangling Python package dependencies), the plan is to demo everything using the Anaconda Python installation. Overkill maybe, since even on Windows installing Python and IPython isn’t too terrible. That said:

  • It’s important for those who aren’t proficient on the command line to jump right in.
  • If people get hooked on Python and want to do more, they’ll have the important packages at the ready (Anaconda includes some important machine learning packages that are missing from the free version of its competitor, Enthought Canopy).

To understand the worst case pain scenario before recommending Anaconda to beginners, I installed it on Windows. So far, so good.

The only snafu I’ve hit so far is trying to get the IPython–>R integration working. This isn’t really a feature for beginners, but I want to show it because R is heavily used at local universities.

To be clear, this mess isn’t an Anaconda problem. However, if you’re using Anaconda on Windows and want to use IPython’s rmagic extension, here’s how.

  1. Install R.
  2. Add the directory with the R executables to your PATH. It has to be the directory with executables, not the main R folder. On a 64-bit machine, the directory is something like C:\Program Files\R\R-3.1.0\bin\x64.
  3. Add these two environment variables (h/t):
    • R_HOME (path of the main R folder, e.g. C:\Program Files\R\R-3.1.0)
    • R_USER (your Windows username).
  4. Restart Windows.
  5. Modify your Windows Python install registry key to point to your Anaconda Python location instead of the default Python installation (h/t). If you followed the default Anaconda install prompts (which installs for the current user, rather than all users on the machine), you’d change HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore\2.7\InstallPath.
  6. Download and install Dr. Gohlke’s rpy2 Windows binary (grab version 2.4.0 or higher): http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2.
  7. Change the registry key from step 5 back to it’s original value.
  8. Open an IPython notebook or terminal and load the rmagic extension:
    %load_ext rpy2.ipython
  9. You should be able to test everything out using this sample code.

For comparison, these are the Ubuntu instructions.

  1. Install R:
    sudo apt-get install r-base r-base-core r-base-html
  2. Install rpy2:
    pip install rpy2
Comments { 3 }

Revisiting Python on Windows

Three years ago I wrote a series of tutorials for setting up Python/Django on Windows.

Despite taking great pains to make it all work and then meticulously documenting the details, I abandoned that idea in favor of an Ubuntu VirtualBox soon after those posts went live. It’s a long story, but at some point you need to cut your losses and stop throwing good time after bad.

But this summer marks a return to Windows. I decided our data intern should learn Python or R, so he can experience a world beyond the proprietary stats packages they use at colleges. We decided on Python and decided that Windows is the best option; no need to add Linux to his already long list of things to pick up.

To test things out for him, I crossed my fingers and installed Enthought Canopy, a canned Python environment for data viz and analysis, hoping it would take away the pain of installing Python packages on Windows. For the most part, it did.

Canopy (which has a free version) makes it easy to get up and running quickly. If you’re getting started with Python data analysis, use it, and don’t spend hours of  your life installing all the packages yourself. That way lies madness.

That said, some of the latest and greatest Python data viz packages aren’t included in the Canopy distribution. If you want to learn those, you’ll have to install them yourself, which is where things can go awry. For example, if you’re on Windows and the package you’re installing needs a 64-bit C compiler, you have to follow these 6 simple steps to get one: http://springflex.blogspot.com/2014/02/how-to-fix-valueerror-when-trying-to.html

The Python data ecosystem is extremely compelling, but there’s still too many barriers for a beginner to jump right in, especially on Windows.

Comments { 1 }

Hack for Western Mass

Hack for Western Mass

For the second year in a row, I helped organize Hack for Western Mass, our local National Day of Civic Hacking event.

The Internet is full of people extolling the virtues of civic hacking and people criticizing civic hackers for building new things instead of connecting people to existing resources.

That’s all true, but there’s something more to hackathons—something beyond the actual projects.

It’s easy to take technology for granted and forget its purpose when you’re steeped in it every day. But technology isn’t about algorithms, frameworks, or finding the perfect text editor. It’s about meeting people where they are—wherever that is—and helping them do their best work, be their most creative, or maybe just get their stuff done faster and get on with life.

When a group of people comes together in a shared time and space to solve problems, an employee of a three-person non-profit learns about Google forms and saves hours of data entry. A community organizer changes the copy on her website for the first time. A group of kids learns that they have the power to make movies.

Say what you want about hackathons, but they’re both a catalyst of technology and a reminder of its true value.

Comments { 1 }

Hans Rosling Bubbles for Mere Mortals

Just a Google motion charts experiment to get ready for Hack for Western Mass.

This example is a bit nonsensical, but we’ll be working this weekend on what kind of story we can tell with hunger-related data from the World Bank.

Comments { 0 }

Running iPython Notebook From Vagrant/VirtualBox

Updated 9/1/2014 to add a few more IPython Notebook dependencies.

Honestly, you’d think it would be easy to remember these four simple steps, but I never seem to. Since IPython notebook is pretty much the greatest thing since sliced bread, here’s how to run it in Vagrant/VirtualBox and access the notebook from the host machine’s browser.

  1. Make sure the prerequisite packages are installed in the virtual machine’s Python environment:*
    • jinja2
    • sphinx
    • pyzmq
    • pygments
    • tornado
    • ipython
  2. Make sure your Vagrant file is forwarding port 8888 to port 8888 (or whatever you’d like to use):
  3. In your virtual machine, run the IPython notebook server: ipython notebook ––ip=0.0.0.0
  4. View the notebook in the host’s browser: http://localhost:8888

*Alternately, you can pip install ipython[notebook] to install IPython and all Notebook dependencies. I got errors when doing this via zsh, though it worked after switching to Bash.

Update 11/6/2014: Praful Mathur left a good tip for using the pip install ipython[notebook] syntax with zsh. You have to escape the hard brackets: pip install ipython\[all\]. Thanks!

Comments { 11 }

Transitioning to Open Government Data

Earlier this fall, I was on a panel at the Association of Public Data Users annual conference. I do love going to DC and being in a room full of people who know what the Consolidated Federal Funds Report is.

The point of the presentation was:

  • Open government data is really exciting and has so much potential.
  • If it’s going to replace traditional sources of “designed” government data, people will be left behind.

Josh Tauberer’s 2nd Principle of Open Government Data says that data should be provided in its most granular form:

This principle relates to the change in emphasis from providing government information to information consumers to providing information to mediators, including journalists, who will build applications and synthesize ideas that are radically different from what is found in the source material. While information consumers typically require some analysis and simplification, information mediators can achieve more innovative solutions with the most raw form of government data

As a data person, I support this principle 100%. That said, it’s a huge change for organizations used to getting pre-packaged government information. Congratulations–you’ve just been promoted to mediator!

 

Comments { 0 }

The Demise of Government-Created Statistical Data?

Washington MonumentLike most data people, I prefer order and logic. So it was a huge shock when I joined a federal budget research organization and started learning about the orderly and logical process by which the U.S. government creates an annual budget. An orderly and logical process that Congress mostly disregards.

Really, the whole politicized debacle offends my sensibilities as a citizen and as a data professional.

Furthermore, the recent zeal for budget cuts has resulted in budget cuts that affect our ability to make smart budget cuts. Specifically, I’m talking about attacks on government-created statistical data—data that’s* used by lawmakers, social service organizations, and businesses to make decisions and allocate increasingly-scarce resources.

Two examples I’ve written about recently:

  • Is Federal Spending Transparency on the Decline?: a guest post for the Sunlight Foundation’s blog about the demise of the Consolidated Federal Funds Report and why that makes it harder to understand federal spending.
  • American Community Survey Under Attack: the House recently passed a spending bill that prohibits the Department of Commerce from funding the American Community Survey (ACS). The yearly ACS replaced the decennial census long-form questionnaire, and its data helps* state and local governments determine how to distribute funds, among other things. See here, here, and here for more information about the widespread usefulness of the ACS.

Of course, order and logic sometimes need to be tempered with a dose of pragmatism. But when our governing body is governed almost entirely by short-term thinking, we should think about not electing them again.

*Language evolves!

Comments { 0 }