Not long into my research on information visualization, I came across the announcement for the Strata conference. I’ve fascinated with “big data” since I was introduced to it (though not the term) in Marc Davis‘ keynote at the Mobile Nation conference in 2007. It’s additionally fascinating to me now, because I’m thinking so much about scale. While not exactly pitched this way by O’Reilly, it was widely seen as the first big/broad conference on big data, and people remarked throughout the conference that this was the “first time” that “nearly everyone” was in the same room at the same time.
The first day of the conference was tutorials, and the next two days split between a series of short keynotes, and multiple streams of presentations and panels. The workshops and presentations/panels fell roughly into three categories: technology, business and visualization. Or, perhaps more accurately: technology, business and “other.” It might be said that the attendees could be categorized in the same way.
The visualization stream was the least well-defined, but there were some good presentations. On the first day, I went to the tutorials “Make People Fall in Love with Your Data: A Practical Tutorial for Data Visualization and UI Design” by Ken Hilburn & Zach Gemignani from Juice Analytics, and “Communicating Data Clearly” by Naomi Robbins. Both were good intros to visual communication for non-designers. The highlights in the first were the principles of dashboard design, and in the second, Robbins’ belaboured-but-funny criticism of Tufte. I’d have learned more at the tech or business tutorials, but it was a handy review.
Day 2 started with Hilary Mason from bit.ly, who gave a charming presentation pointing out connections between the use of bit.ly and real-world events. Mark Madsen’s talk, The Mythology of Big Data, was excellent — the kind of reality-check for big data that you’d normally expect on the far edge of the hype cycle, not as it’s finding its feet as a field(?). Werner Vogels from Amazon tastefully pitched AWS by talking about some of what they’ve learned, and customer success stories. Alistair Croll (conference co-chair and Montrealer) gave a quick presentation on the background for Strata, which was visionary and inspiring: it’s worth checking out the slideshare. Zane Adam from Microsoft talked about their data market, and mentioned Dundas Data Visualization, who look pretty cool.
In his presentation, Peter Skomoroch from LinkedIn gave some insights into how their team works, and talked about some of the cool (and open) tools that their team has built. In the MAD Skills presentation, was introduced to the Google Prediction API (only available in the US), and some other tools.
Sara Farmer from Global Pulse organized a BoF lunch on “Data Philanthropy,” a term that had just come out of the WEF in Davos a few days earlier. We had an amazing group around the table, and the conversation lasted long into the next session. I didn’t take notes, but I remember coming away thinking about the need (which I’d heard in several places) and challenge of partial data sharing agreements.
Matt Biddulph and Tom Coates from Nokia talked about prototyping with big data and showed some neat-looking projects. Philip Kromer from Infochimps shared some of his insights: find undervalued talent, fail in parallel, optimize for programmer joy, find the right toolset, and, strikingly, storage is free. In the ‘Visualizing Shared, Distributed Data’ panel I saw OpenHeatMap and Google Fusion Tables in action; both look great!
On day 3, Simon Rogers from the Guardian showed some of the amazing visualizations they’ve made using big datasets, and some of the tools they’ve built to help encourage others to do the same. I was excited about the ‘Posthumans, Big Data and New Interfaces’ keynote panel, but it fell flat — much as the organizers may have hoped otherwise, the audience wasn’t at all open to it. DJ Patil’s keynote, ‘Innovating Data Teams’ was exactly as good as a corporate presentation should be: celebrating LinkedIn’s successes through great stories, and showing the amazing work that they’ve produced. He launched and demoed LinkedIn Skills, which I never would’ve guessed could be as powerful as it is — an excellent example of big data at work. Carol McCall’s presentation, ‘Can Big Data Fix Healthcare?’ was provocative and inspiring; and it was nice to have a keynote from someone from outside of the tech world.
The data journalism panel was one of the highlights of the conference. Jer Thorpe showed the Cascade project (made with Processing) which he’s been developing at the NYTimes, tracking the ways in which their content is shared through social media. (Specifically: a cascade is made up of events: bit.ly encodes, tweets, bit.ly decodes, and pageviews. Tracking and visualizing the flows of these events gives you ‘cascades.’) It was so beautiful, you could hear gasps from the audience. Sadly, the video of the presentation’s not available, and the Cascade isn’t yet particularly well-documented online. Soon, hopefully.
Virginia Carlson from the Metro Chicago Information Centre gave a great presentation about open civic data. I particularly appreciated her critique of the ways in which developers have ended up using open data, for not understanding the provenance of the data they’re using.
J. J. Toothman’s presentation was a survey of cool ‘data art’ projects — most of which I’d seen, but valuable in the context of the conference because it seemed like many of the other attendees hadn’t.
The final panel, ‘Predicting the Future: Anticipating the World with Data’ was, despite the ambitious title, great. Robert McGrew from Palantir gave an excellent presentation on their work, which includes predicting terrorist attacks. Instead of mystification-through-technology (which is tempting in any business, but especially in his), he explained very simply that computers are bad at predicting human behaviour, while analysts can be quite good at it. Computers, on the other hand, can look for predefined patterns very quickly. Both approaches are required simultaneously.
I felt really lucky to attend; the conference had a great energy, and I met a really wide range of interesting people. I was struck by how many times I heard the phrases, “I’m not a designer, but…” and “There aren’t any designers on our team, so…” There was a clear recognition of the importance of design, but there remains a huge gap. I’m guessing that it’s because of one or more of:
- while the techies have recognized the importance of design, they’re not the ones making the budgeting decisions
- there’s still a lingering belief that good design isn’t worth the time or money
- it’s hard for them to find designers who know how to work on these kinds of projects
- there’s a cultural or language gap between designers and developers which is inhibiting these connections
It’ll be really interesting to watch on the next few months and years how big data embraces design.