Announcing FsLab Data science package
After over a year of working on FsLab and talking about it at conferences, it is finally time for an official announcement. So, today, I'm excited to announce FsLab - a cross-platform package for doing data science with .NET and Mono.
It is probably not necessary to explain why data science is an important area. We live surrounded by information, but extracting useful knowledge from the vast amounts of data is not an easy task. You have to access data in different formats (JSON-based REST services, XML, CSV files or even HTML tables), you need to deal with missing values, combine and align data from multiple sources and then build visualizations (or reports) to tell the right story.
The goal of FsLab is to make this process easier. FsLab combines the power of F# type providers, the efficiency and robustness of Mono and .NET and the high quality engineering of the open-source ecosystem around F# and C#.
FsLab links and resources
-
You can find more information about FsLab at our recently launched web site. The web site is hosted on GitHub, so please send corrections and improvements as pull requests!
-
To get started, check out the downloads page. The easiest option is to use one of the two templates also hosted on GitHub. You can get some inspiration from the getting started tutorial.
-
If you want to help us shape the future of FsLab, then please join the F# Data Science group on Google. For GitHub-related topics (projects, etc.) there is also FsLab admin repository.
FsLab questions and answers
Rather than writing a long introduction about FsLab, the following tries to answer the most important questions that you might have about FsLab using the Q & A format.
Why should I choose FsLab over X?
There is a couple of things that FsLab does exceptionally well. With F# Data type
providers, you get type-safe access to a wide range
of external data sources with tooling that no other data science package can offer. FsLab also
runs on Mono and .NET and so it is extremely easy to turn your experiments into production-quality
code. For many other tasks, you can easily call other tools such as R using the R type
provider.
Is FsLab only for F#?
No. Some of the libraries that are a part of FsLab have excellent C# support - most importantly,
Deedle, which is the core library for working with data frames and data series has
an excellent C# support. The
libraries that rely on type providers are F#-only, but you can use them and then expose the
functionality to C#, Visual Basic .NET or any other .NET language.
Who is behind FsLab?
FsLab is a community effort with a large number of contributors - both individuals and companies.
BlueMountain Capital is funding the development of R type
provider and Deedle, F# Data is maintained by Gustavo Guerra and
contributors, Math.NET is maintained by Christoph Rüegg and contributors.
Finally, I'm the maintainer of the FsLab package. Commercial support and training for FsLab is
available from fsharpWorks.
What is the FsLab roadmap?
There is no official roadmap yet. Please help us shape it by joining the discussion! However,
there are a couple of things that are coming to FsLab very soon:
- We're integrating FsLab with XPlot to provide cross-platform HTML5 charting.
- We're working on FsLab Journal template which lets you generate reports from scripts.
- We're integrating FsLab with M-Brace, which lets you scale your scripts to the cloud.
- We're working on BigDeedle, a new backend for Deedle that makes it possible to treat big data as ordinary frames and series.
Demonstrating the FsLab approach
I don't want to turn this announcement into a technical post about FsLab, but since FsLab is very much about technology, I'll give you at least a quick demo. The demo illustrates the 2 key ideas that FsLab follows:
-
Access, analyze, visualize cycle - when doing data science, you typically follow this cycle a number of times. You get some data, try to explore it, visualize the results and then repeat. FsLab gives you great tools for all three steps.
-
Integrate with leading technologies - FsLab has some great libraries and excells in some areas (like data access). For other tasks, it can integrate with other technologies - it lets you call R packages and visualize data using Google Charts.
To start with FsLab, you need to download FsLab package or a template. Then you can write an F# script file that references FsLab and opens all necessary namespaces:
1: 2: 3: 4: 5: |
|
The example uses F# Data for data access, Deedle for working with time series and XPlot for producing Google Charts.
We'll use the World Bank type provider to get the population in the largest city of Czech Republic as a time series. When writing the code in F#-enabled editor, you'll get auto-completion offering all countries of the world and thousands of indicators:
1: 2: 3: 4: 5: |
|
The |>
operator passes the data from World Bank to the series
function to create a
Deedle series that gives you a nice way to explore the data.
When you run the code in F# REPL, you'll see a printout showing the first few years and
last few years of the time series (Prague had 1,000,830 inhabitants in 1960 and 1,302,883
inhabitants in 2014).
Next, we'll use the R type provider to
call the R stats
package to calculate linear regression:
1: 2: 3: 4: 5: 6: |
|
The first two lines reference the R type provider. Again, thanks to the type provider mechanism,
you get auto-completion on RProvider.
(with all installed R packages) and on R.
(with all
available R functions).
The code then creates a Deedle data frame df
with columns pop
(from World Bank data),
years
(with the keys of the pop
series) and then it uses R.lm
and R.predict
to
calculate linear regression model and use it to predict values for the current range of years.
With three more lines of code, we can build a Google Charts chart comparing the actual data with the data predicted by the linear regression model:
1: 2: 3: |
|
I embedded the chart below by hand, but you can also use the FsLab Journal template, which produces the HTML automatically from your F# script:
Summary
FsLab is a collection of high quality libraries for doing data science on Mono and .NET. It combines the power of F# type providers for data access, it lets you easily explore ideas, while writing code for a robust platform that is easy to deploy.
Many of the libraries that are included in FsLab have been around for some time, have been used in production and have a large number of contributors, both from the open-source community and from commercial companies.
Even the FsLab package itself existed for some time - but with this announcement, the project reaches a new milestone. We've done a lot of work on making FsLab stable, well documented and truly cross-platform over the last few months and many more things are coming in the near future. So stay tuned, send us feedback, contribute and try FsLab now!
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.Data
--------------------
namespace Microsoft.FSharp.Data
from XPlot.GoogleCharts
Full name: Announcing-fslab.wb
static member GetDataContext : unit -> WorldBankDataService
nested type ServiceTypes
Full name: FSharp.Data.WorldBankData
Full name: Announcing-fslab.pop
Full name: Deedle.FSharpSeriesExtensions.series
Full name: Announcing-fslab.df
Full name: Deedle.FSharpFrameExtensions.frame
val float : value:'T -> float (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.float
--------------------
type float = System.Double
Full name: Microsoft.FSharp.Core.float
--------------------
type float<'Measure> = float
Full name: Microsoft.FSharp.Core.float<_>
Full name: Announcing-fslab.pop
static member Annotation : data:seq<#seq<DateTime * 'V * string * string>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'V :> value)
static member Annotation : data:seq<DateTime * #value * string * string> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Area : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
static member Area : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Bar : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
static member Bar : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Bubble : data:seq<string * #value * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Bubble : data:seq<string * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Bubble : data:seq<string * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Calendar : data:seq<DateTime * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
...
Full name: XPlot.GoogleCharts.Chart
static member Chart.Line : data:Series<'K,#value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)
static member Chart.Line : data:seq<Series<'K,#value>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)
static member Chart.Line : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
static member Chart.Line : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
type Options =
new : unit -> Options
member ShouldSerializeaggregationTarget : unit -> bool
member ShouldSerializeallValuesSuffix : unit -> bool
member ShouldSerializeallowHtml : unit -> bool
member ShouldSerializealternatingRowStyle : unit -> bool
member ShouldSerializeanimation : unit -> bool
member ShouldSerializeannotations : unit -> bool
member ShouldSerializeannotationsWidth : unit -> bool
member ShouldSerializeareaOpacity : unit -> bool
member ShouldSerializeavoidOverlappingGridLines : unit -> bool
...
Full name: XPlot.GoogleCharts.Configuration.Options
--------------------
new : unit -> Options
Published: Tuesday, 5 May 2015, 4:55 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: f#, fslab, data science