Visualizing interesting world facts with FsLab
In case you missed my recent official FsLab announcement, FsLab is a data-science package for .NET built around F# that makes it easy to get data using type providers, analyze them interactively (with great R integration) and visualize the results. You can find more on on fslab.org, which also has links to some videos and download page with templates and other instructions.
Last time, I mentioned that we are working on integrating FsLab with the XPlot charting library. XPlot is a wonderful F# library built by Taha Hachana that wraps two powerful HTML5 visualization libraries - Google Charts and plot.ly.
I thought I'd see what interesting visualizations I can built with XPlot, so I opened the World Bank type provider to get some data about the world and Euro area, to make the blog post relevant to what is happening in the world today.
With type providers, getting data is amazingly easy. So if you have not seen much F# before, the following 9 lines is all I need to set things up:
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
Visualization #1: Structure of GDP in the Euro area
In the first visualization, let's look at the countries that contribute the most to the total GDP
of the Euro area (this is countries within the European Union that are using the Euro currency).
To get the countries, we can use WorldBank type provider which exposes the region as
wb.Regions.``Euro area``
. We first return the data for the root
element (the whole Euro area) and then yield data for each of the member countries:
1: 2: 3: 4: 5: |
|
When writing the code in an F#-enabled editor, you get auto-completion support on the indicators, so you can choose ``GDP (current US$)`` among thousands of indicators available from the World Bank.
Now we can pass the data to Chart.Treemap
(which takes a sequence of node name, parent
node and value) and set some options to make the visualization nicer:
1: 2: 3: 4: |
|
The tree map gives us a very nice overview of how different countries in the Euro area contribute to the total GDP of the region. As you can see above, we are showing total GDP (in current US$) and so larger countries obviously contribute more with Germany, France and Italy producing over half of the GDP.
Visualization #2: GDP per capita in the Euro area
Another interesting indicator we can get from the World Bank data is GDP per capita.
Using this indicator, we can find smaller countries that have higher GDP (which was
not visible in the previous visualization). This time, we'll use the Plotly bindings
and create a bar chart using the Bar
trace. To follow the visual theme of the
previous visualization, I also added a bit of code to calculate the colour (which you
can find in the full blog post source
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
|
Here, we can see that most countries in the Euro area are fairly ballanced, except for Luxembourg which has two times the GDP per capita than the second country. As one would expect, the new member countries are at the end of the scale and western countries are at the beginning of the scale.
Visualization #3: GDP growth in Euro area
There is yet another interesting indicator in the World Bank that we can look at. This is the GDP growth indicator, which gives us annual growth rate of the GDP. In this visualization, we'll look how the growth has been changing over the last 50 years. For most countries, this is very homogeneous, but there are some interesting spikes.
We'll use Plotly again. One nice feature is that we can get data for all countries, but make only a few of them visible by default. Click on the country name on the right to add/remove them from the cart!
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: |
|
Here, we can see (and you can also zoom in!) that there is a big spike in the economy of Malta (20% growth in 1975) and big spike in Latvia (-32% in 1992, followed by a fairly quick recovery). Most countries also suffered after the 2008 crisis.
Visualization #4: Correlating GDP and life expectancy
So far, I was looking at GDP, because it is one of the economic indicators that is easy to get from the World Bank. Let's look if GDP is correlated with some other indicators we can obtain. The following gets the GDP (per capita) and Life expectancy (in years) for all countries of the world (looking at Europe would not tell us much).
Now we draw a scatter plot showing life expectancy (on the Y axis) and logarithm (base 10) of
the GDP per capita (on the X axis). The following uses Google charts and also uses the
trendlines
parameter to add a linear trendline to the chart:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
|
If we look at the trendline, we can roughly say that countries with life expectancy greater by 10 years have 10 times larger GDP per capita. (That said, we are looking just at the data from 2010 here and we are also not checking any statistical significance, just building an interesting visualization!)
Visualization #5: EU is getting older
Another interesting fact we can nicely visualize is how the EU is getting older. We look at the enitre European Union here (which contains more countries than just the Euro area). In this visualization, we build three overlaying histograms that show the percentage of population over 65 years. To do that, we use Plotly, which lets us nicely compose histograms.
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
We can see that in 1960, most countries had 10-12% of population over the age of 65 (with just one country having 12-14%). 25 years later, in 1985, most countries had 12-14% of elderly population. In 2010, most countries have 16-18% of population over the age of 65 and for some countries, this is even 20-20%.
Visualization #6: World's biggest polluters
For the next two visualizations, we'll change the topic and look at green issues rather than the Europe. The XPlot wrapper for Google Charts has a nice wrapper for creating geo charts, and so we can easily take the CO2 emissions indicator from the World Bank and plot the biggest polluters, based on the values from 2010, on a map:
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
This shows the expected results. The world's biggest polluters are China, followed by USA, India and Russia. The next polluters are smaller and include Germany, UK, Canada and Brazil. This indicator returns the total number of CO2 emissions in kilotons, so larger countries are bigger polluters. What if we look at emissions per capita?
Visualization #7: CO2 emissions per capita
The World Bank data set does not directly include an indicator for CO2 emissions per capita, but it contains CO2 emissions and total population, so we can do the math ourselves. The following is very similar to the previous snippet, except that we get both of the indicators return the ratio:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
|
Here, we can see quite different picture from the previous visualization. The biggest polluters per capita include small Persian Gulf states including Quatar, Kuwait, UAR and Oman, followed by large developed countries (USA, Canada and Australia). We can still see China among the polluters too, but India almost completely disappears from this picture.
Building your own visualizations
I'm not a journalist or a statistician, so I'm sure many of the readers could build much more interesting visualizations than I did. The main point of this article is to show just how easy it is to put together data source like the World Bank with nice visualization libraries available thanks to XPlot. So, if you want to build your own visualizations, here are some links to get you started:
- Go to the FsLab download page and either download a template (the easiest option) or reference the FsLab package from NuGet.
- If you have any questions, ask on StackOverflow with 'fslab' tag or open a GitHub issue.
- The XPlot library has comprehensive documentation with many examples on using the Google Charts wrapper as well as the Plotly wrapper.
- To get data, check out the other F# Data type providers these give you easy ways to read CSV, XML and call JSON-based REST services. There are also nice libraries for calling Twitter and accessing SQL databases.
Finally, if you are interested in using some of the libraries in a commercial setting and are interested in help, support or trainings, get in touch with me and my colleagues at fsharpWorks. We'll be happy to help.
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.Data
--------------------
namespace Microsoft.FSharp.Data
private new : unit -> Plotly
static member Signin : username:string * password:string -> unit
Full name: XPlot.Plotly.Plotly
Full name: MyCredentials.userAndKey
Full name: Fslab-world-visualization.wb
static member GetDataContext : unit -> WorldBankDataService
nested type ServiceTypes
Full name: FSharp.Data.WorldBankData
<summary>Typed representation of WorldBank data. See http://www.worldbank.org for terms and conditions.</summary>
Full name: Fslab-world-visualization.euroGDP
<summary>The indicators for the region</summary>
<summary>The indicators for the region</summary>
<summary>The indicators for the country</summary>
static member Annotation : data:seq<#seq<DateTime * 'V * string * string>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'V :> value)
static member Annotation : data:seq<DateTime * #value * string * string> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Area : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
static member Area : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Bar : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
static member Bar : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Bubble : data:seq<string * #value * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Bubble : data:seq<string * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Bubble : data:seq<string * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Calendar : data:seq<DateTime * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
...
Full name: XPlot.GoogleCharts.Chart
static member Chart.Treemap : data:seq<string * string * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
type Options =
new : unit -> Options
member ShouldSerializeaggregationTarget : unit -> bool
member ShouldSerializeallValuesSuffix : unit -> bool
member ShouldSerializeallowHtml : unit -> bool
member ShouldSerializealternatingRowStyle : unit -> bool
member ShouldSerializeanimation : unit -> bool
member ShouldSerializeannotations : unit -> bool
member ShouldSerializeannotationsWidth : unit -> bool
member ShouldSerializeareaOpacity : unit -> bool
member ShouldSerializeavoidOverlappingGridLines : unit -> bool
...
Full name: XPlot.GoogleCharts.Configuration.Options
--------------------
new : unit -> Options
Full name: Fslab-world-visualization.euroGDPperCap
module List
from Microsoft.FSharp.Collections
--------------------
type List<'T> =
| ( [] )
| ( :: ) of Head: 'T * Tail: 'T list
interface IEnumerable
interface IEnumerable<'T>
member GetSlice : startIndex:int option * endIndex:int option -> 'T list
member Head : 'T
member IsEmpty : bool
member Item : index:int -> 'T with get
member Length : int
member Tail : 'T list
static member Cons : head:'T * tail:'T list -> 'T list
static member Empty : 'T list
Full name: Microsoft.FSharp.Collections.List<_>
Full name: Microsoft.FSharp.Collections.List.sortBy
Full name: Microsoft.FSharp.Core.Operators.snd
Full name: Microsoft.FSharp.Collections.List.rev
let hi = euroGDPperCap |> List.map snd |> List.max
let midColor clr1 clr2 v =
let mb a b = int (float a + (float b - float a) * v)
let clr1 = System.Drawing.ColorTranslator.FromHtml(clr1)
let clr2 = System.Drawing.ColorTranslator.FromHtml(clr2)
System.Drawing.Color.FromArgb
(mb clr1.R clr2.R, mb clr1.G clr2.G, mb clr1.B clr2.B)
let getColor (_, v) =
let k = (v - lo) / (hi - lo)
System.Drawing.ColorTranslator.ToHtml
( if k < 0.5 then midColor "#B24590" "#449AB5" (2.0*k)
else midColor "#449AB5" "#76B747" ((k-0.5)*2.0) )
Full name: Fslab-world-visualization.barChart
type Bar =
inherit Trace
new : unit -> Bar
member ShouldSerializeerror_x : unit -> bool
member ShouldSerializeerror_y : unit -> bool
member ShouldSerializemarker : unit -> bool
member ShouldSerializename : unit -> bool
member ShouldSerializeopacity : unit -> bool
member ShouldSerializeorientation : unit -> bool
member ShouldSerializer : unit -> bool
member ShouldSerializeshowlegend : unit -> bool
...
Full name: XPlot.Plotly.Graph.Bar
--------------------
new : unit -> Bar
Full name: Microsoft.FSharp.Collections.List.map
Full name: Microsoft.FSharp.Core.Operators.fst
type Marker =
new : unit -> Marker
member ShouldSerializecauto : unit -> bool
member ShouldSerializecmax : unit -> bool
member ShouldSerializecmin : unit -> bool
member ShouldSerializecolor : unit -> bool
member ShouldSerializecolorscale : unit -> bool
member ShouldSerializeline : unit -> bool
member ShouldSerializemaxdisplayed : unit -> bool
member ShouldSerializeopacity : unit -> bool
member ShouldSerializeoutliercolor : unit -> bool
...
Full name: XPlot.Plotly.Graph.Marker
--------------------
new : unit -> Marker
Full name: Fslab-world-visualization.getColor
type Figure =
new : data:Data * ?Layout:Layout -> Figure
member GetInlineHtml : filename:string -> string
member Plot : filename:string -> PlotlyResponse option
member Fileopt : string
member Height : int
member Layout : Layout option
member Origin : string
member Response : PlotlyResponse option
member Width : int
member Fileopt : string with set
...
Full name: XPlot.Plotly.Figure
--------------------
new : data:Data * ?Layout:Layout -> Figure
module Data
from XPlot.GoogleCharts
--------------------
namespace System.Data
--------------------
namespace Microsoft.FSharp.Data
--------------------
type Data =
new : traces:seq<Trace> -> Data
member Json : string
static member From : traces:seq<#Trace> -> Data
Full name: XPlot.Plotly.Data
--------------------
new : traces:seq<Trace> -> Data
type Layout =
new : unit -> Layout
member ShouldSerializeangularaxis : unit -> bool
member ShouldSerializeannotations : unit -> bool
member ShouldSerializeautosize : unit -> bool
member ShouldSerializebargap : unit -> bool
member ShouldSerializebargroupgap : unit -> bool
member ShouldSerializebarmode : unit -> bool
member ShouldSerializebarnorm : unit -> bool
member ShouldSerializeboxgap : unit -> bool
member ShouldSerializeboxgroupgap : unit -> bool
...
Full name: XPlot.Plotly.Graph.Layout
--------------------
new : unit -> Layout
Full name: Fslab-world-visualization.visible
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.set
Full name: Fslab-world-visualization.data
type Scatter =
inherit Trace
new : unit -> Scatter
member ShouldSerializeconnectgaps : unit -> bool
member ShouldSerializeerror_x : unit -> bool
member ShouldSerializeerror_y : unit -> bool
member ShouldSerializefill : unit -> bool
member ShouldSerializefillcolor : unit -> bool
member ShouldSerializeline : unit -> bool
member ShouldSerializemarker : unit -> bool
member ShouldSerializemode : unit -> bool
...
Full name: XPlot.Plotly.Graph.Scatter
--------------------
new : unit -> Scatter
from Microsoft.FSharp.Collections
Full name: Microsoft.FSharp.Collections.Seq.map
Full name: Fslab-world-visualization.gdpVsLifeExp
Full name: Microsoft.FSharp.Core.Operators.log10
Full name: Fslab-world-visualization.options
type Trendline =
new : unit -> Trendline
member ShouldSerializecolor : unit -> bool
member ShouldSerializedegree : unit -> bool
member ShouldSerializelabelInLegend : unit -> bool
member ShouldSerializelineWidth : unit -> bool
member ShouldSerializeopacity : unit -> bool
member ShouldSerializepointSize : unit -> bool
member ShouldSerializeshowR2 : unit -> bool
member ShouldSerializetype : unit -> bool
member ShouldSerializevisibleInLegend : unit -> bool
...
Full name: XPlot.GoogleCharts.Configuration.Trendline
--------------------
new : unit -> Trendline
type Axis =
new : unit -> Axis
member ShouldSerializeallowContainerBoundaryTextCufoff : unit -> bool
member ShouldSerializebaseline : unit -> bool
member ShouldSerializebaselineColor : unit -> bool
member ShouldSerializedirection : unit -> bool
member ShouldSerializeformat : unit -> bool
member ShouldSerializegridlines : unit -> bool
member ShouldSerializelogScale : unit -> bool
member ShouldSerializemaxAlternation : unit -> bool
member ShouldSerializemaxTextLines : unit -> bool
...
Full name: XPlot.GoogleCharts.Configuration.Axis
--------------------
new : unit -> Axis
static member Chart.Scatter : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
Full name: Fslab-world-visualization.getGrowths
type Histogram =
inherit Trace
new : unit -> Histogram
member ShouldSerializeautobinx : unit -> bool
member ShouldSerializeautobiny : unit -> bool
member ShouldSerializeerror_x : unit -> bool
member ShouldSerializeerror_y : unit -> bool
member ShouldSerializehistfunc : unit -> bool
member ShouldSerializehistnorm : unit -> bool
member ShouldSerializemarker : unit -> bool
member ShouldSerializename : unit -> bool
...
Full name: XPlot.Plotly.Graph.Histogram
--------------------
new : unit -> Histogram
val string : value:'T -> string
Full name: Microsoft.FSharp.Core.Operators.string
--------------------
type string = String
Full name: Microsoft.FSharp.Core.string
Full name: Fslab-world-visualization.emissions
Full name: Microsoft.FSharp.Core.Operators.not
struct
member CompareTo : value:obj -> int + 1 overload
member Equals : obj:obj -> bool + 1 overload
member GetHashCode : unit -> int
member GetTypeCode : unit -> TypeCode
member ToString : unit -> string + 3 overloads
static val MinValue : float
static val MaxValue : float
static val Epsilon : float
static val NegativeInfinity : float
static val PositiveInfinity : float
...
end
Full name: System.Double
static member Chart.Geo : data:seq<string * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
type ColorAxis =
new : unit -> ColorAxis
member ShouldSerializecolors : unit -> bool
member ShouldSerializelegend : unit -> bool
member ShouldSerializemaxValue : unit -> bool
member ShouldSerializeminValue : unit -> bool
member ShouldSerializevalues : unit -> bool
member colors : string []
member legend : Legend
member maxValue : int
member minValue : int
...
Full name: XPlot.GoogleCharts.Configuration.ColorAxis
--------------------
new : unit -> ColorAxis
Full name: Fslab-world-visualization.emissionPerCapita
Published: Tuesday, 30 June 2015, 5:07 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: f#, fslab, data science, data journalism, thegamma