TP

Better F# data science with FsLab and Ionide

At NDC Oslo 2016, I did a talk about some of the recent new F# projects that are making data science with F# even nicer than it used to be. The talk covered a wider range of topics, but one of the nice new thing I showed was the improved F# Interactive in the Ionide plugin for Atom and the integration with FsLab libraries that it provides.

In particular, with the latest version of Ionide for Atom and the latest version of FsLab package, you can run code in F# Interactive and you'll see resulting time series, data frames, matrices, vectors and charts as nicely pretty printed HTML objects, right in the editor. The following shows some of the features (click on it for a bigger version):

In this post, I'll write about how the new Ionide and FsLab integration works, how you can use it with your own libraries and also about some of the future plans. You can also learn more by getting the FsLab package, or watching the NDC talk:

FsLab formatters for Ionide

FsLab is just a NuGet package that references a number of other F# packages for doing data science with F#. The one thing that it adds is an easy to use load script that you can use to load all the packages from F# interactive. This means that when you download the template, the sample script file starts with something like this:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
#load "packages/FsLab/Themes/AtomChester.fsx"
#load "packages/FsLab/FsLab.fsx"

open Deedle
open FSharp.Data
open XPlot.GoogleCharts
open XPlot.GoogleCharts.Deedle

The first line loads a default theme that configures how embedded charts and tables will be formatted. It sets things like float formatting options, colours, fonts etc. You can find and contribute themes in the FsLab.Formatters repository - the current choice covers only one white and one dark theme for Atom. The second line is the more important one, which loads the FsLab dependencies.

The basic template comes with a minimal example that downloads two time series from the World Bank and finds the years when they were the most different:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
let wb = WorldBankData.GetDataContext()

let cz = wb.Countries.``Czech Republic``.Indicators
let eu = wb.Countries.``European Union``.Indicators

let czschool = series cz.``Gross enrolment ratio, tertiary, both sexes (%)``
let euschool = series eu.``Gross enrolment ratio, tertiary, both sexes (%)``

// Get 5 years with the largest difference between EU and CZ
abs (czschool - euschool)
|> Series.sort
|> Series.rev
|> Series.take 5

When you run the code in Atom, a formatter for Deedle series should make it easy to see the result of the last expression - make sure to run the last 4 lines of the snippet as a separate interaction. Ionide will only show the formatted object if the formattable object is the result of the snippet. Alternatively, you can also select czschool or euschool and run Alt+Enter to see one of the source series:

Aside from Deedle series, the FsLab package registers formatters for the charting libraries that it comes with. This includes F# Charting (Windows-only), XPlot Google charts and also XPlot Plotly charts. The following example plots the two time-series using the XPlot wrapper for Google charts:

1: 
2: 
3: 
4: 
[ czschool.[1975 .. 2010]; euschool.[1975 .. 2010] ]
|> Chart.Line
|> Chart.WithOptions (Options(legend=Legend(position="bottom")))
|> Chart.WithLabels ["CZ"; "EU"]

The Google chart is formatted according to the theme that we loaded on the first line of the script, so it looks nicely integrated with the F# Interactive window (but as I mentioned, we need your help with adding more than just the two standard Atom themes).

One of the nice aspects of how the FsLab and Ionide integration works is that it is not ad-hoc integration for just a couple of selected libraries - quite the opposite! All the FsLab formatters live in a separate repository from Ionide and you can create your own formatters that will work in exactly the same way. The following section has more details about the underlying mechanism behind all this.

Creating custom HTML formatters

The latest release of ionide-fsi, which is the F# Interactive plugin for Atom no longer runs fsi.exe in the background (like Visual Studio or all other editors), but instead it is based on the brand new FsInteractiveService. This is a light-weight server that wraps the F# Interactive functionality. It can be consumed by any editor via HTTP and it exposes API for evaluating F# code but also for getting autocompletion and other hints.

The FsInteractiveService extends the standard F# Interactive functionality with the ability to format objects as HTML. The idea is very simple. You call fsi.AddHtmlPrinter and specify a function that turns your object into an HTML string! When you evaluate an expression that returns a value that has a registered formatter, Ionide will then display it using your provided HTML formatter.

Creating HTML formatter for tables

As a basic example, say you have a type that represents a table:

1: 
type Table = Table of string[,]

Now, we want to create a HTML formatter that will render the table as a <table> element. To do this, all you need is to call fsi.AddHtmlPrinter. The FsInteractiveService also defines a symbol HAS_FSI_ADDHTMLPRINTER and so it is a good idea to wrap the following code in a big #if HAS_FSI_ADDHTMLPRINTER block - this way, the code will be compatible with F# Interactive in Visual Studio and other editors that do not support HTML formatters (yet).

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
fsi.AddHtmlPrinter(fun (Table t) ->
  let body =
    [ yield "<table>"
      for i in 0 .. t.GetLength(0)-1 do
        yield "<tr>"
        for j in 0 .. t.GetLength(1)-1 do
          yield "<td>" + t.[i,j] + "</td>"
        yield "</tr>"
      yield "</table>" ]
    |> String.concat ""
  seq [ "style", "<style>table { background:#f0f0f0; }</style>" ],
  body )

The result of the formatter function is actually seq<string * string> * string. The tuple consists of two things:

You can now define a table as follows:

1: 
2: 
3: 
4: 
let table =
  [ [ "Test"; "More"]
    [ "1234"; "5678"] ]
  |> array2D |> Table

In the current version, the value is only formatted when Table is returned as a direct result of an expression. This means that you need to evaluate an expression of type Table rather than, for example, a value binding as above:

1: 
table

When you run the above in Atom, you will see a table formatted as HTML <table> element. (Some more styling is needed to actually make this pretty, but this is a good start. Oh and did you know that Atom supports the <marquee> tag?!)

Themes, parameters and servers

In practice, there are a few other concerns that make formatting objects as HTML harder. For example, some of the HTML formatters can implement lazy loading where they use a simple web server running in the background to provide data to the view (which calls the server using JavaScript). Also, it is nice if all the HTML formatters can share the same visual theme. To make these possible, the FsInteractiveService also defines fsi.HtmlPrinterParameters which is a global value of type IDictionary<string, obj> that can be used for storing various shared configuration.

For example, the html-standalone-output parameter specifies whether the generated HTML code should be stand-alone, or whether it is allowed to use JavaScript to load data lazily (the latter is used for Deedle frames in the talk and it means you can scroll through the data, but you need to hava a server running in the background):

1: 
2: 
3: 
4: 
#if HAS_FSI_ADDHTMLPRINTER
let standaloneHtmlOutput =
  fsi.HtmlPrinterParameters.["html-standalone-output"] :?> bool
#endif

There are a couple of examples of how this dictionary can be used in the standard FsLab formatters:

FsLab Journal and looking ahead

Formatting in FsLab journals

The FsLab downloads page also lets you download a FsLab Journal template. This is something that has been available in FsLab for longer time, but I never wrote much about it. The summary is:

FsLab Journal lets you turn your F# scripts consisting of F# code snippets and Markdown formatted comments into a nice HTML report.

When you download the template, you can just run build run and your script will be turned into a HTML report in the background. When you change your script, the background runner will upadate and reload your report. If you want to produce stand-alone HTML (that does not require background server), you can run build html. The following is an opened journal, running on my machine.

In the latest version of FsLab, the formatting for journals is based on the same fsi.AddHtmlPrinter formatters. This means we get to reuse the code for it, but most importantly, when your write your own formatter, it will work with both Ionide and also with FsLab journals.

Formatting in Jupyter notebooks

One of the related projects in the F# and data science space is the F# bindings for Jupyter Notebooks. This does not yet use the same model for registering HTML formatters via fsi.AddHtmlPrinter. Instead, it has its own mechanism for registering printers, but I expect that it will be possible to merge the two so that you can just write fsi.AddHtmlPrinter once and use it in Ionide, FsLab Journals as well as Jupyter.

namespace Deedle
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
namespace XPlot
namespace XPlot.GoogleCharts
module Deedle

from XPlot.GoogleCharts
val wb : WorldBankData.ServiceTypes.WorldBankDataService

Full name: Fslab-ionide.wb
type WorldBankData =
  static member GetDataContext : unit -> WorldBankDataService
  nested type ServiceTypes

Full name: FSharp.Data.WorldBankData


<summary>Typed representation of WorldBank data. See http://www.worldbank.org for terms and conditions.</summary>
WorldBankData.GetDataContext() : WorldBankData.ServiceTypes.WorldBankDataService
val cz : WorldBankData.ServiceTypes.Indicators

Full name: Fslab-ionide.cz
property WorldBankData.ServiceTypes.WorldBankDataService.Countries: WorldBankData.ServiceTypes.Countries
val eu : WorldBankData.ServiceTypes.Indicators

Full name: Fslab-ionide.eu
val czschool : Series<int,float>

Full name: Fslab-ionide.czschool
val series : observations:seq<'a * 'b> -> Series<'a,'b> (requires equality)

Full name: Deedle.F# Series extensions.series
val euschool : Series<int,float>

Full name: Fslab-ionide.euschool
val abs : value:'T -> 'T (requires member Abs)

Full name: Microsoft.FSharp.Core.Operators.abs
Multiple items
module Series

from Deedle

--------------------
type Series =
  new : ?type:string -> Series
  member ShouldSerializeannotations : unit -> bool
  member ShouldSerializeareaOpacity : unit -> bool
  member ShouldSerializecolor : unit -> bool
  member ShouldSerializecurveType : unit -> bool
  member ShouldSerializefallingColor : unit -> bool
  member ShouldSerializelineWidth : unit -> bool
  member ShouldSerializepointShape : unit -> bool
  member ShouldSerializepointSize : unit -> bool
  member ShouldSerializerisingColor : unit -> bool
  ...

Full name: XPlot.GoogleCharts.Configuration.Series

--------------------
type Series<'K,'V (requires equality)> =
  interface IFsiFormattable
  interface ISeries<'K>
  new : pairs:seq<KeyValuePair<'K,'V>> -> Series<'K,'V>
  new : keys:'K [] * values:'V [] -> Series<'K,'V>
  new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
  new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V>
  member After : lowerExclusive:'K -> Series<'K,'V>
  member Aggregate : aggregation:Aggregation<'K> * observationSelector:Func<DataSegment<Series<'K,'V>>,KeyValuePair<'TNewKey,OptionalValue<'R>>> -> Series<'TNewKey,'R> (requires equality)
  member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality)
  member AsyncMaterialize : unit -> Async<Series<'K,'V>>
  ...

Full name: Deedle.Series<_,_>

--------------------
new : ?type:string -> Series

--------------------
new : pairs:seq<System.Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : keys:'K [] * values:'V [] -> Series<'K,'V>
new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>
val sort : series:Series<'K,'V> -> Series<'K,'V> (requires equality and comparison)

Full name: Deedle.Series.sort
val rev : series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.rev
val take : count:int -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.take
type Chart =
  static member Annotation : data:seq<#seq<DateTime * 'V * string * string>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'V :> value)
  static member Annotation : data:seq<DateTime * #value * string * string> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Area : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
  static member Area : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bar : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
  static member Bar : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bubble : data:seq<string * #value * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bubble : data:seq<string * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Bubble : data:seq<string * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  static member Calendar : data:seq<DateTime * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
  ...

Full name: XPlot.GoogleCharts.Chart
static member Chart.Line : data:Frame<'K,'V> * ?Options:Options -> GoogleChart (requires equality and equality)
static member Chart.Line : data:Series<'K,#value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)
static member Chart.Line : data:seq<Series<'K,#value>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)
static member Chart.Line : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)
static member Chart.Line : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart
static member Chart.WithOptions : options:Options -> chart:GoogleChart -> GoogleChart
Multiple items
type Options =
  new : unit -> Options
  member ShouldSerializeaggregationTarget : unit -> bool
  member ShouldSerializeallValuesSuffix : unit -> bool
  member ShouldSerializeallowHtml : unit -> bool
  member ShouldSerializealternatingRowStyle : unit -> bool
  member ShouldSerializeanimation : unit -> bool
  member ShouldSerializeannotations : unit -> bool
  member ShouldSerializeannotationsWidth : unit -> bool
  member ShouldSerializeareaOpacity : unit -> bool
  member ShouldSerializeavoidOverlappingGridLines : unit -> bool
  ...

Full name: XPlot.GoogleCharts.Configuration.Options

--------------------
new : unit -> Options
Multiple items
type Legend =
  new : unit -> Legend
  member ShouldSerializealignment : unit -> bool
  member ShouldSerializemaxLines : unit -> bool
  member ShouldSerializenumberFormat : unit -> bool
  member ShouldSerializeposition : unit -> bool
  member ShouldSerializetextStyle : unit -> bool
  member alignment : string
  member maxLines : int
  member numberFormat : string
  member position : string
  ...

Full name: XPlot.GoogleCharts.Configuration.Legend

--------------------
new : unit -> Legend
static member Chart.WithLabels : labels:seq<string> -> chart:GoogleChart -> GoogleChart
Multiple items
union case Table.Table: string [,] -> Table

--------------------
type Table = | Table of string [,]

Full name: Fslab-ionide.Table
Multiple items
val string : value:'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = System.String

Full name: Microsoft.FSharp.Core.string
val fsi : Compiler.Interactive.InteractiveSession

Full name: Microsoft.FSharp.Compiler.Interactive.Settings.fsi
module String

from Microsoft.FSharp.Core
val concat : sep:string -> strings:seq<string> -> string

Full name: Microsoft.FSharp.Core.String.concat
Multiple items
val seq : sequence:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Core.Operators.seq

--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>

Full name: Microsoft.FSharp.Collections.seq<_>
val table : Table

Full name: Fslab-ionide.table
val array2D : rows:seq<#seq<'T>> -> 'T [,]

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.array2D

Published: Wednesday, 6 July 2016, 5:03 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: f#, fslab, data science