Tomas Petricek

Searching for new ways of thinking in programming & working with data

I believe that the most interesting work is not the one solving hard problems, but the one changing how we think about the world. I follow this belief in my work on data science tools, functional programming and F# teaching, in my programming languages research and I try to understand it through philosophy of science.

The Gamma

I'm working on making data-driven storytelling easier, more open and reproducible at the Alan Turing Institute.

Consulting

I'm author of definitive F# books and open-source libraries. I offer my F# training and consulting services as part of fsharpWorks.

Academic

I published papers about theory of context-aware programming languages, type providers, but also philosophy of science.

Tomas Petricek
  • Tomas Petricek
  • Home
  • F# Trainings
  • Talks and books
  • The Gamma
  • Academic

Announcing FsLab Data science package

After over a year of working on FsLab and talking about it at conferences, it is finally time for an official announcement. So, today, I'm excited to announce FsLab - a cross-platform package for doing data science with .NET and Mono.

It is probably not necessary to explain why data science is an important area. We live surrounded by information, but extracting useful knowledge from the vast amounts of data is not an easy task. You have to access data in different formats (JSON-based REST services, XML, CSV files or even HTML tables), you need to deal with missing values, combine and align data from multiple sources and then build visualizations (or reports) to tell the right story.

The goal of FsLab is to make this process easier. FsLab combines the power of F# type providers, the efficiency and robustness of Mono and .NET and the high quality engineering of the open-source ecosystem around F# and C#.

FsLab links and resources

  • You can find more information about FsLab at our recently launched web site. The web site is hosted on GitHub, so please send corrections and improvements as pull requests!

  • To get started, check out the downloads page. The easiest option is to use one of the two templates also hosted on GitHub. You can get some inspiration from the getting started tutorial.

  • If you want to help us shape the future of FsLab, then please join the F# Data Science group on Google. For GitHub-related topics (projects, etc.) there is also FsLab admin repository.

FsLab questions and answers

Rather than writing a long introduction about FsLab, the following tries to answer the most important questions that you might have about FsLab using the Q & A format.

Why should I choose FsLab over X?
There is a couple of things that FsLab does exceptionally well. With F# Data type providers, you get type-safe access to a wide range of external data sources with tooling that no other data science package can offer. FsLab also runs on Mono and .NET and so it is extremely easy to turn your experiments into production-quality code. For many other tasks, you can easily call other tools such as R using the R type provider.

Is FsLab only for F#?
No. Some of the libraries that are a part of FsLab have excellent C# support - most importantly, Deedle, which is the core library for working with data frames and data series has an excellent C# support. The libraries that rely on type providers are F#-only, but you can use them and then expose the functionality to C#, Visual Basic .NET or any other .NET language.

Who is behind FsLab?
FsLab is a community effort with a large number of contributors - both individuals and companies. BlueMountain Capital is funding the development of R type provider and Deedle, F# Data is maintained by Gustavo Guerra and contributors, Math.NET is maintained by Christoph Rüegg and contributors. Finally, I'm the maintainer of the FsLab package. Commercial support and training for FsLab is available from fsharpWorks.

What is the FsLab roadmap?
There is no official roadmap yet. Please help us shape it by joining the discussion! However, there are a couple of things that are coming to FsLab very soon:

  • We're integrating FsLab with XPlot to provide cross-platform HTML5 charting.
  • We're working on FsLab Journal template which lets you generate reports from scripts.
  • We're integrating FsLab with M-Brace, which lets you scale your scripts to the cloud.
  • We're working on BigDeedle, a new backend for Deedle that makes it possible to treat big data as ordinary frames and series.

Demonstrating the FsLab approach

I don't want to turn this announcement into a technical post about FsLab, but since FsLab is very much about technology, I'll give you at least a quick demo. The demo illustrates the 2 key ideas that FsLab follows:

  • Access, analyze, visualize cycle - when doing data science, you typically follow this cycle a number of times. You get some data, try to explore it, visualize the results and then repeat. FsLab gives you great tools for all three steps.

  • Integrate with leading technologies - FsLab has some great libraries and excells in some areas (like data access). For other tasks, it can integrate with other technologies - it lets you call R packages and visualize data using Google Charts.

To start with FsLab, you need to download FsLab package or a template. Then you can write an F# script file that references FsLab and opens all necessary namespaces:

1: 
2: 
3: 
4: 
5: 
#load "packages/FsLab/FsLab.fsx"
open Deedle
open FSharp.Data
open XPlot.GoogleCharts
open XPlot.GoogleCharts.Deedle

The example uses F# Data for data access, Deedle for working with time series and XPlot for producing Google Charts.

We'll use the World Bank type provider to get the population in the largest city of Czech Republic as a time series. When writing the code in F#-enabled editor, you'll get auto-completion offering all countries of the world and thousands of indicators:

1: 
2: 
3: 
4: 
5: 
let wb = WorldBankData.GetDataContext()
let pop = 
  wb.Countries.``Czech Republic``
   .Indicators.``Population in largest city``
  |> series

The |> operator passes the data from World Bank to the series function to create a Deedle series that gives you a nice way to explore the data. When you run the code in F# REPL, you'll see a printout showing the first few years and last few years of the time series (Prague had 1,000,830 inhabitants in 1960 and 1,302,883 inhabitants in 2014).

Next, we'll use the R type provider to call the R stats package to calculate linear regression:

1: 
2: 
3: 
4: 
5: 
6: 
open RProvider
open RProvider.stats

let df = frame [ "pop" => pop ]
df?years <- pop.Keys
df?predict <- R.predict(R.lm("pop~years", df)).GetValue<float[]>()

The first two lines reference the R type provider. Again, thanks to the type provider mechanism, you get auto-completion on RProvider. (with all installed R packages) and on R. (with all available R functions).

The code then creates a Deedle data frame df with columns pop (from World Bank data), years (with the keys of the pop series) and then it uses R.lm and R.predict to calculate linear regression model and use it to predict values for the current range of years.

With three more lines of code, we can build a Google Charts chart comparing the actual data with the data predicted by the linear regression model:

1: 
2: 
3: 
[ df?predict; df?pop ] 
|> Chart.Line
|> Chart.WithOptions (Options(title="Prague Population"))

I embedded the chart below by hand, but you can also use the FsLab Journal template, which produces the HTML automatically from your F# script:

Summary

FsLab is a collection of high quality libraries for doing data science on Mono and .NET. It combines the power of F# type providers for data access, it lets you easily explore ideas, while writing code for a robust platform that is easy to deploy.

Many of the libraries that are included in FsLab have been around for some time, have been used in production and have a large number of contributors, both from the open-source community and from commercial companies.

Even the FsLab package itself existed for some time - but with this announcement, the project reaches a new milestone. We've done a lot of work on making FsLab stable, well documented and truly cross-platform over the last few months and many more things are coming in the near future. So stay tuned, send us feedback, contribute and try FsLab now!

namespace Deedle
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
namespace XPlot
namespace XPlot.GoogleCharts
module Deedle

from XPlot.GoogleCharts
val wb : WorldBankData.ServiceTypes.WorldBankDataService

Full name: Announcing-fslab.wb
type WorldBankData =
  static member GetDataContext : unit -> WorldBankDataService
  nested type ServiceTypes

Full name: FSharp.Data.WorldBankData
WorldBankData.GetDataContext() : WorldBankData.ServiceTypes.WorldBankDataService
val pop : Series<key,'a>

Full name: Announcing-fslab.pop
val series : observations:seq<'a * 'b> -> Series<'a,'b> (requires equality)

Full name: Deedle.FSharpSeriesExtensions.series
namespace RProvider
val df : Frame<key,string>

Full name: Announcing-fslab.df
val frame : columns:seq<'a * #ISeries<'c>> -> Frame<'c,'a> (requires equality and equality)

Full name: Deedle.FSharpFrameExtensions.frame
property Series.Keys: seq<key>
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = System.Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
val pop : Series<key,Frame<'a,'b>> (requires equality and equality)

Full name: Announcing-fslab.pop
type Chart =
  static member Annotation : data:seq<#seq<DateTime * 'V * string * string>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'V :> value)
  static member Annotation : data:seq<DateTime * #value * string * string> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Area : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
  static member Area : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bar : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
  static member Bar : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bubble : data:seq<string * #value * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bubble : data:seq<string * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bubble : data:seq<string * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Calendar : data:seq<DateTime * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  ...

Full name: XPlot.GoogleCharts.Chart
static member Chart.Line : data:Frame<'K,'V> * ?Options:Options -> GoogleChart (requires equality and equality)
static member Chart.Line : data:Series<'K,#value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)
static member Chart.Line : data:seq<Series<'K,#value>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)
static member Chart.Line : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
static member Chart.Line : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Chart.WithOptions : options:Options -> chart:GoogleChart -> GoogleChart
Multiple items
type Options =
  new : unit -> Options
  member ShouldSerializeaggregationTarget : unit -> bool
  member ShouldSerializeallValuesSuffix : unit -> bool
  member ShouldSerializeallowHtml : unit -> bool
  member ShouldSerializealternatingRowStyle : unit -> bool
  member ShouldSerializeanimation : unit -> bool
  member ShouldSerializeannotations : unit -> bool
  member ShouldSerializeannotationsWidth : unit -> bool
  member ShouldSerializeareaOpacity : unit -> bool
  member ShouldSerializeavoidOverlappingGridLines : unit -> bool
  ...

Full name: XPlot.GoogleCharts.Configuration.Options

--------------------
new : unit -> Options

Published: Tuesday, 5 May 2015, 3:55 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: f#, fslab, data science

Contact & about

This site is hosted on GitHub and is generated using F# Formatting and DotLiquid. For more info, see the website source on GitHub.

Please submit issues & corrections on GitHub. Use pull requests for minor corrections only.

  • Twitter: @tomaspetricek
  • GitHub: @tpetricek
  • Email me: tomas@tomasp.net

Blog archives

October 2020 (1),  July 2020 (1),  April 2020 (2),  December 2019 (1),  February 2019 (1),  November 2018 (1),  October 2018 (1),  May 2018 (1),  September 2017 (1),  June 2017 (1),  April 2017 (1),  March 2017 (2),  January 2017 (1),  October 2016 (1),  September 2016 (2),  August 2016 (1),  July 2016 (1),  May 2016 (2),  April 2016 (1),  December 2015 (2),  November 2015 (1),  September 2015 (3),  July 2015 (1),  June 2015 (1),  May 2015 (2),  April 2015 (3),  March 2015 (2),  February 2015 (1),  January 2015 (2),  December 2014 (1),  May 2014 (3),  April 2014 (2),  March 2014 (1),  January 2014 (2),  December 2013 (1),  November 2013 (1),  October 2013 (1),  September 2013 (1),  August 2013 (2),  May 2013 (1),  April 2013 (1),  March 2013 (1),  February 2013 (1),  January 2013 (1),  December 2012 (2),  October 2012 (1),  August 2012 (3),  June 2012 (2),  April 2012 (1),  March 2012 (4),  February 2012 (5),  January 2012 (2),  November 2011 (5),  August 2011 (3),  July 2011 (2),  June 2011 (2),  May 2011 (2),  March 2011 (4),  December 2010 (1),  November 2010 (6),  October 2010 (6),  September 2010 (4),  July 2010 (3),  June 2010 (2),  May 2010 (1),  February 2010 (2),  January 2010 (3),  December 2009 (3),  July 2009 (1),  June 2009 (3),  May 2009 (2),  April 2009 (1),  March 2009 (2),  February 2009 (1),  December 2008 (1),  November 2008 (5),  October 2008 (1),  September 2008 (1),  June 2008 (1),  March 2008 (3),  February 2008 (1),  December 2007 (2),  November 2007 (6),  October 2007 (1),  September 2007 (1),  August 2007 (1),  July 2007 (2),  April 2007 (2),  March 2007 (2),  February 2007 (3),  January 2007 (2),  November 2006 (1),  October 2006 (3),  August 2006 (2),  July 2006 (1),  June 2006 (3),  May 2006 (2),  April 2006 (2),  December 2005 (1),  July 2005 (4),  June 2005 (5),  May 2005 (1),  April 2005 (3),  March 2005 (3),  January 2005 (1),  December 2004 (3),  November 2004 (2), 

License

Unless explicitly mentioned, all articles on this site are licensed under Creative Commons Attribution Share Alike. All source code samples are licensed under the MIT License.

CC License logo