- They are relatively easily required with no need for a 4 year CS degree;
- They provide journalistically relevant and useful results
- They are reusable in a journalistic context
In my opinion, there are three skills that meet these criteria:
- Mapping. Most news happens in a place. Maps are ancient precisely because they are such an expressive and powerful form of data visualization.
- Grabbing. Thousands of websites have data gateways called APIs (Application Programming Interfaces) that allow you free access to some or all of that site’s data — as long as you can write a relatively simple program that can grab it and return it to you in a format you can use.
- Scraping. This is to get the data out of all those sites that don’t require API’s — read “crappy government websites.”
(For a brief and entertaining video on these three, see, “Do I Really Have To Learn How To Program?“)
I’ve done quite a bit of mapping, and some grabbing, but my experiences with scraping have been less successful, primarily because they proved to be less generalizable. I’d be able to pick my way (often slowly and with much frustration) through a scraping tutorial and get results. But at the end I did not feel that I could write a script on my own to scrape other things.
After taking a class on scraping at Journalism Interactive with Michelle Minkoff, I decided to buy Paul Bradshaw’s book “Scraping for Journalists,” and take another run at it. If you would like to read along, my notes as I pick through the book are after the jump. I would also like to thank Michelle and Paul for giving me the inspiration to restart this blog. I have been very busy with my new duties at INN, a network of 90+ investigative and community newsrooms, so I have not been devoting much time to adding to my own store of code-knowledge or developing tutorials to pass on what I’ve learned to others. But it’s something that I enjoy and believe gives back something of value to my peers in the field, so I welcome the chance to begin anew.