As mentioned in an earlier blog post, I've been spending some time at the Alan Turing Institute recently working on The Gamma project. The goal is to make data visualizations on the web more transparent. When you see a visualization online, you should be able to see where the data comes from, how it has been transformed and check that it is not misleading, but you should also be able to modify it and visualize other aspects of the data that interest you.
I gave a talk about my work as part of a talk series at The Alan Turing Institute, which has been recorded and is now available on YouTube. If you prefer to watch talks, this is a good 45 minute overview of what I've been working on with great video quality (the video switches from camera view to screen capture for demos!)
If you prefer text or do not have 45 minutes to watch the talk (right now), I wrote a short summary that highlights the most important ideas from the talk. You can also check out the talk slides, although I'll include the most important ones here.
There were a lot of rumors recently about the death of facts and even the death of statistics. I believe the core of the problem is that working with facts is quite tedious and the results are often not particularly exciting. Social media made it extremely easy to share your own opinions in an engaging way, but what we are missing is a similarly easy and engaging way to share facts backed by data.
This is, in essence, the motivation for The Gamma project that I've been working on recently. After several experiments, including the visualization of Olympic medalists, I'm now happy to share the first reusable component based on the work that you can try and use in your data visualization projects. If you want to get started:
- Check out thegamma-script package on npm
- Minimal example of thegamma-script in action
- How to use thegamma-script in your projects
The package implements a simple scripting language that anyone can use for writing simple data aggregation and data exploration scripts. The tooling for the scripting language makes it super easy to create and modify existing data analyses. Editor auto-complete offers all available operations and a spreadsheet-inspired editor lets you create scripts without writing code - yet, you still get a transparent and reproducible script as the result.
When you start working in the programming language theory business, you'll soon find out that
lambda abstraction and functions break many nice ideas or, at least, make your life very hard.
For example, type inference is easy if you only have
var x = ..., but it gets hard once you
want to infer type of
x in something like
fun x -> ... because we do not know what is
x. Distributed programming is another example - sending around data is easy, but
once you start sending around function values, things become hard.
Every programming language researcher soon learns this trick. When someone tells you about a nice idea, you reply "Interesting... but how does this interact with lambda abstraction?" and the other person replies "Whoa, hmm, let me think more about this." Then they go back and either give up, because it does not work, or produce something that works, in theory, well with lambda abstraction, but is otherwise quite unusable.
When working on The Gamma project and the little scripting language it runs, I recently went through a similar thinking process. Instead of letting lambda abstraction spoil the party again, I think we need to think about different ways of code reuse.
Olympic Games are perfect opportunity to do a fun data visualization project - just like New Year, you can easily predict when they will happen and you can get something interesting ready in advance. I used this year's Games in Rio as a motivation to resume working on The Gamma project. If you did not read my previous article, the idea is to build tooling for open, reproducible and interactive data-driven storytelling. When you see a visualization, not only you should be able to see how it has been created (what data it uses & how), but you should also be able to modify it, without much programming experience, and look at other interesting aspects of the data.
The first version of The Gamma project tries to do some of this, using historical and current data on Olympic medals as a sample dataset. You can play with the project at The Gamma web site:
Without further ado, here are the most important links if you want to explore how it all works on your own. However, continue reading and I'll describe the most important parts!
The project is still in early stages and the code needs more documentation (and ReadMe files, I know!) However, if you would be interested in using it for something or you have some interesting data to visualize, do not hesitate to ping me at @tomaspetricek. Also, thanks to the DNI Innovation Fund for funding the project and to the Alan Turing Institute for providing a place to work on this over the coming months!
Last year, I wrote a bit about my interest in building programming tools for data journalism. Data journalism might sound like a niche field, but that is not the case if we talk about data-driven storytelling more generally,
In programming, your outcome is typically some application that does stuff. In data science, your outcome is very often a report or a story that aims to influence people's behavior or company decisions. No matter whether you are a journalist writing an article about government spending or an analyst producing internal sales reports, you are telling stories with data.
Being able to tell stories with data (but also verify and assess other people's stories that can be backed by data) is becoming a vital skill in the modern world, which is partly why I find this topic extremely important and interesting. But to do this currently, you need to be a skilled programmer, great designer and good storyteller, all at the same time!
I have not written about this topic much over the last year (mainly because I was busy with Coeffects, fsharpConf, FsLab and fsharpWorks), but this will be changing - I'm very happy to announce that my data-journalism related project The Gamma has been awarded funding from the DNI Innovation Fund and I'll be working on it over the next year at the Alan Turing Institute in London.
Just like last year and the year before, I wanted to participate in the #FsAdvent event, where someone writes a blog post about something they did with F# during December. Thanks to Sergey Tihon for the organization of the English version and the Japanese F# community for coming up with the idea a few years ago!
As my blog post ended up on 31 December, I wanted to do something that would fit well with the theme of ending of 2015 and starting of the new year 2016 and so I decided to write a little interactive web site that tracks the "Happy New Year" tweets live across the globe. This is partly inspired by Happy New Year Tweets from Twitter in 2014, but rather than analyzing data in retrospect, you can watch 2016 come live!
There are huge amounts of data around us that we could use to better understand the world. Every company collects large amounts of data about their sales or customers. Governments and international organizations increasingly release interesting data sets to the public through various open government data initiatives (data.gov or data.gov.uk). But raw data does not tell you much.
An interesting recent development is data journalism. Data journalists tell stories using data. A data driven article is based on an interesting observation from the data, it includes (interactive) visualizations that illustrate the point and it often allows the reader to get the raw data.
Adding a chart produced in, say, Excel to an article is easy, but building good interactive visualization is much harder. Ideally, the data driven article should not be just text with static pictures, but a program that links the original data source to the visualization. This lets readers explore how the data is used, update the content when new data is available and change parameters of the visualization if they need to understand different aspect of the topic.
This is in short what I'm trying to build with The Gamma project. If you're interested in building better reports or data driven articles, continue reading!
I did a talk about The Gamma project at the fantastic Future Programming workshop at the StrangeLoop conference last week (thanks for inviting me!) and there is a recording of my 40 minute talk on YouTube, so if you prefer to watch videos, check it out!
Are you a data journalist or data analyst? We're looking for early partners! I joined the EF programme to work on this and if the project sounds like something you'd like to see happen, please get in touch or share your contact details on The Gamma page!
In case you missed my recent official FsLab announcement, FsLab is a data-science package for .NET built around F# that makes it easy to get data using type providers, analyze them interactively (with great R integration) and visualize the results. You can find more on on fslab.org, which also has links to some videos and download page with templates and other instructions.
Last time, I mentioned that we are working on integrating FsLab with the XPlot charting library. XPlot is a wonderful F# library built by Taha Hachana that wraps two powerful HTML5 visualization libraries - Google Charts and plot.ly.
I thought I'd see what interesting visualizations I can built with XPlot, so I opened the World Bank type provider to get some data about the world and Euro area, to make the blog post relevant to what is happening in the world today.