TP

Announcing FsLab Data science package

After over a year of working on FsLab and talking about it at conferences, it is finally time for an official announcement. So, today, I'm excited to announce FsLab - a cross-platform package for doing data science with .NET and Mono.

It is probably not necessary to explain why data science is an important area. We live surrounded by information, but extracting useful knowledge from the vast amounts of data is not an easy task. You have to access data in different formats (JSON-based REST services, XML, CSV files or even HTML tables), you need to deal with missing values, combine and align data from multiple sources and then build visualizations (or reports) to tell the right story.

The goal of FsLab is to make this process easier. FsLab combines the power of F# type providers, the efficiency and robustness of Mono and .NET and the high quality engineering of the open-source ecosystem around F# and C#.

FsLab links and resources

FsLab questions and answers

Rather than writing a long introduction about FsLab, the following tries to answer the most important questions that you might have about FsLab using the Q & A format.

Why should I choose FsLab over X?
There is a couple of things that FsLab does exceptionally well. With F# Data type providers, you get type-safe access to a wide range of external data sources with tooling that no other data science package can offer. FsLab also runs on Mono and .NET and so it is extremely easy to turn your experiments into production-quality code. For many other tasks, you can easily call other tools such as R using the R type provider.

Is FsLab only for F#?
No. Some of the libraries that are a part of FsLab have excellent C# support - most importantly, Deedle, which is the core library for working with data frames and data series has an excellent C# support. The libraries that rely on type providers are F#-only, but you can use them and then expose the functionality to C#, Visual Basic .NET or any other .NET language.

Who is behind FsLab?
FsLab is a community effort with a large number of contributors - both individuals and companies. BlueMountain Capital is funding the development of R type provider and Deedle, F# Data is maintained by Gustavo Guerra and contributors, Math.NET is maintained by Christoph Rüegg and contributors. Finally, I'm the maintainer of the FsLab package. Commercial support and training for FsLab is available from fsharpWorks.

What is the FsLab roadmap?
There is no official roadmap yet. Please help us shape it by joining the discussion! However, there are a couple of things that are coming to FsLab very soon:

Demonstrating the FsLab approach

I don't want to turn this announcement into a technical post about FsLab, but since FsLab is very much about technology, I'll give you at least a quick demo. The demo illustrates the 2 key ideas that FsLab follows:

To start with FsLab, you need to download FsLab package or a template. Then you can write an F# script file that references FsLab and opens all necessary namespaces:

1: 
2: 
3: 
4: 
5: 
#load "packages/FsLab/FsLab.fsx"
open Deedle
open FSharp.Data
open XPlot.GoogleCharts
open XPlot.GoogleCharts.Deedle

The example uses F# Data for data access, Deedle for working with time series and XPlot for producing Google Charts.

We'll use the World Bank type provider to get the population in the largest city of Czech Republic as a time series. When writing the code in F#-enabled editor, you'll get auto-completion offering all countries of the world and thousands of indicators:

1: 
2: 
3: 
4: 
5: 
let wb = WorldBankData.GetDataContext()
let pop = 
  wb.Countries.``Czech Republic``
   .Indicators.``Population in largest city``
  |> series

The |> operator passes the data from World Bank to the series function to create a Deedle series that gives you a nice way to explore the data. When you run the code in F# REPL, you'll see a printout showing the first few years and last few years of the time series (Prague had 1,000,830 inhabitants in 1960 and 1,302,883 inhabitants in 2014).

Next, we'll use the R type provider to call the R stats package to calculate linear regression:

1: 
2: 
3: 
4: 
5: 
6: 
open RProvider
open RProvider.stats

let df = frame [ "pop" => pop ]
df?years <- pop.Keys
df?predict <- R.predict(R.lm("pop~years", df)).GetValue<float[]>()

The first two lines reference the R type provider. Again, thanks to the type provider mechanism, you get auto-completion on RProvider. (with all installed R packages) and on R. (with all available R functions).

The code then creates a Deedle data frame df with columns pop (from World Bank data), years (with the keys of the pop series) and then it uses R.lm and R.predict to calculate linear regression model and use it to predict values for the current range of years.

With three more lines of code, we can build a Google Charts chart comparing the actual data with the data predicted by the linear regression model:

1: 
2: 
3: 
[ df?predict; df?pop ] 
|> Chart.Line
|> Chart.WithOptions (Options(title="Prague Population"))

I embedded the chart below by hand, but you can also use the FsLab Journal template, which produces the HTML automatically from your F# script:

Summary

FsLab is a collection of high quality libraries for doing data science on Mono and .NET. It combines the power of F# type providers for data access, it lets you easily explore ideas, while writing code for a robust platform that is easy to deploy.

Many of the libraries that are included in FsLab have been around for some time, have been used in production and have a large number of contributors, both from the open-source community and from commercial companies.

Even the FsLab package itself existed for some time - but with this announcement, the project reaches a new milestone. We've done a lot of work on making FsLab stable, well documented and truly cross-platform over the last few months and many more things are coming in the near future. So stay tuned, send us feedback, contribute and try FsLab now!

namespace Deedle
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
namespace XPlot
namespace XPlot.GoogleCharts
module Deedle

from XPlot.GoogleCharts
val wb : WorldBankData.ServiceTypes.WorldBankDataService

Full name: Announcing-fslab.wb
type WorldBankData =
  static member GetDataContext : unit -> WorldBankDataService
  nested type ServiceTypes

Full name: FSharp.Data.WorldBankData
WorldBankData.GetDataContext() : WorldBankData.ServiceTypes.WorldBankDataService
val pop : Series<key,'a>

Full name: Announcing-fslab.pop
val series : observations:seq<'a * 'b> -> Series<'a,'b> (requires equality)

Full name: Deedle.FSharpSeriesExtensions.series
namespace RProvider
val df : Frame<key,string>

Full name: Announcing-fslab.df
val frame : columns:seq<'a * #ISeries<'c>> -> Frame<'c,'a> (requires equality and equality)

Full name: Deedle.FSharpFrameExtensions.frame
property Series.Keys: seq<key>
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = System.Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
val pop : Series<key,Frame<'a,'b>> (requires equality and equality)

Full name: Announcing-fslab.pop
type Chart =
  static member Annotation : data:seq<#seq<DateTime * 'V * string * string>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'V :> value)
  static member Annotation : data:seq<DateTime * #value * string * string> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Area : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
  static member Area : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bar : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
  static member Bar : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bubble : data:seq<string * #value * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bubble : data:seq<string * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bubble : data:seq<string * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Calendar : data:seq<DateTime * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  ...

Full name: XPlot.GoogleCharts.Chart
static member Chart.Line : data:Frame<'K,'V> * ?Options:Options -> GoogleChart (requires equality and equality)
static member Chart.Line : data:Series<'K,#value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)
static member Chart.Line : data:seq<Series<'K,#value>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)
static member Chart.Line : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
static member Chart.Line : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Chart.WithOptions : options:Options -> chart:GoogleChart -> GoogleChart
Multiple items
type Options =
  new : unit -> Options
  member ShouldSerializeaggregationTarget : unit -> bool
  member ShouldSerializeallValuesSuffix : unit -> bool
  member ShouldSerializeallowHtml : unit -> bool
  member ShouldSerializealternatingRowStyle : unit -> bool
  member ShouldSerializeanimation : unit -> bool
  member ShouldSerializeannotations : unit -> bool
  member ShouldSerializeannotationsWidth : unit -> bool
  member ShouldSerializeareaOpacity : unit -> bool
  member ShouldSerializeavoidOverlappingGridLines : unit -> bool
  ...

Full name: XPlot.GoogleCharts.Configuration.Options

--------------------
new : unit -> Options

Published: Tuesday, 5 May 2015, 4:55 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: f#, fslab, data science