TP

Accessing loosely structured data from F# and C# (GOTO 2011)

About two weeks ago, I gave a talk at GOTO Conference in Copenhagen at a very interesting .NET session organized by Mark Seemann. In my talk, I focused on the impedance mismatch between the data structures that are used in programming languages (such as classes in C# or records and discriminated unions in F#) and the data structures that we need to access (such as database, XML files and REST services).

Clearly, both of the sides have some structure (otherwise, it wouldn't be possible to write any code against them!). Even an XML file that is returned by a REST service has some structure - although the only way to find out about the structure may be to call the service and then look at the result. In this article, I'll briefly summarize the ideas that I presented in the talk. Here are links to the slides as well as the source code from the talk:

Accessing data at different scales

No matter what technology we use for accessing data, someone, somewhere needs to somehow specify how to map data from the source to a structure that can be used in the programming language. I think there are three options:

The talk included several examples from every category. Unfortunately, F# type providers are not yet publicly available, so the source code for the talk doesn't include this example. The following three sections give a brief summary of some of the examples:

Using dynamic typing in C# and F#

The first approach is to specify the structure locally at the expression level. In C#, this can be done using the dynamic type. In F#, similar thing can be achieved using the ? operator. The compiler translates expressions like obj?Foo to an operator call (?) obj "Foo" where the name of the member becomes a string. There are some interesting differences between the two approaches..

Accessing World Bank in C#

To demonstrate the dynamic typing in C#, I created an example that uses dynamic for accessing data provided by the World Bank. The following snippet gets a list of regions (such as OECD countries, EU countries, Middle East etc.):

dynamic wb = new DynamicWorldBank();
dynamic regions = wb.Region(new { PerPage = 100 });
foreach (var reg in regions.Regions.Region) {
  Console.WriteLine("{0} ({1})", reg.Name.Value, reg.Code.Value);
}

The snippet creates DynamicWorldBank instance, which supports dynamic invocation of operations. It is assigned to a variable of type dynamic, which allows us to write wb.Region even though the compiler cannot verify that there is such member. At runtime, the name Region is mapped to a web service request. The example also uses C# anonymous types to specify additional arguments for the call. Anonymous types are handy, because they can be used to specify both the name of the argument and the value. The call will be translated to a web request to an URL like http://api.worldbank.org/regions?per_page=100.

The result of the call is some XML document that can be also accessed using the dynamic typing. The expression regions.Regions.Region returns a collection of all <region> elements nested in the root <regions> element. The member access reg.Name.Value gets the textual content of a sub-element named <name>. The dynamic access to XML elements is implemented using a fairly simple dynamic wrapper named DynamicXml that is built on top of the LINQ to XML library.

Accessing Database in F# (First Try)

To demonstrate the F# dynamic operator (?), let's look at an example that accesses SQL database using stored procedures. This example is based on my earlier blog post about reading data from SQL database. It is just a first try, because the second method (discussed below) makes this code even nicer. Anyway, the example needs to read the information from database and store them in the following F# record:

1: /// Data returned from the model to a view
2: type PollOption =
3:   { ID : int
4:     Votes : int
5:     Percentage : float
6:     Title : string }

We can use the dynamic operator for two things. The first is to use a nice member access syntax for calling a SQL stored procedure (db.Query?GetOptions calls a procedure named GetOptions with no arguments) and the second is to access columns from the returned data set (row?Title accesses the Title column):

1: /// Call 'GetOptions' procedure to load collection of options
2: let load() = 
3:   let db = new DynamicDatabase(connectionString)
4:   let options = db.Query?GetOptions()
5:   // Loop over the result set and create 'PollOption' values
6:   [ for row in options do
7:       yield { ID = row?ID; Votes = row?Votes;
8:               Title = row?Title; Percentage = row?Percentage } ]

When you perform any operation using the C# dynamic type, the result will always be of type dynamic. The F# dynamic operator works differently - it is statically resolved to some ? operator that has a well-known return type. You can see that by exploring the types in the above snippet (using tooltips). The type of options is seq<Row>, which represents a sequence of rows obtained from the database.

The Row type also provides the ? operator, which can be used to read individual columns. The return type of this operator is a generic type parameter and F# type inference specifies the type argument based on the context. When assigning the result of row?ID to a record field of type int, the operator is called with int as a type argument and so it can cast the column value to the right type. This nicely reduces the boilerplate code that needs to be written, because casting is inserted automatically.

As I mentioned, accessing database data can be even easiers if we use the second approach...

Matching data to a structure

In the previous examples, the structure was specified when accessing individual elements such as methods of the World Bank, XML document elements or database columns. However, we can also specify the structure all at once - by defining classes that represent the data and annotating them (e.g. using .NET attributes) to describe how to build the classes using the data.

This approach is used, for example, by LINQ to SQL - the (generated) classes come with attributes that specify how to map database data to .NET objects. However, the same idea can be used for accessing any data source. In my earlier blog post, I used this for calling PHP code from C# using Phalanger. Anyway, I used two other F# examples in my GOTO talk.

Accessing Database in F# (Second Try)

First, let's revisit the example with accessing databases. Look again at the previous snippet that creates a value of the PollOptions type from the row object loaded from database. The snippet seems quite redundant. It just dynamically assigns columns of a row to fields of record with the same name. In fact, the record type PollOptions fully specifies the structure that we want to get!

Using F# reflection, we can write a library that loads data from the SQL database and loads them into a record type that is specified by the caller. A function to load data and a function to cast a vote can then look like this:

1: /// Call 'GetOptions' and automtically convert result to a collection
2: let load() : seq<PollOption> = 
3:   let db = new DynamicDatabase(connectionString)
4:   db?GetOptions()
5: 
6: /// Call 'Vote' procedure without returning any value
7: let vote(id:int) : unit = 
8:   let db = new DynamicDatabase(connectionString)
9:   db?Vote(id)

The snippet shows two functions. In both of the declarations, the return type is specified explicitly using type annotations. The first function returns a sequence of PollOption values, while the second one just updates the database and returns unit.

If you look at the type of the ? operator implemented by the DynamicDatabase type used in this example, you'll see that it takes string and returns a generic function 'T -> 'R. The type for both of the parameters is automatically provided by the F# compiler. The type of the argument was just unit (in the first example) and int (in the second example). The return type is the same as the return type of the function (specified using type annotations).

The implementation of the ? operator uses the type argument 'R to decide what it should do. When the type is unit, it simply calls the stored procedure without returning a result. The second case is more interesting - when the return type is seq<SomeRecord>, the operator matches the data obtained from the database to this type. It enumerates over the returned data set and creates SomeRecord values from the data.

In this case the record type PollOptions specifies the structure of the data and the library coerces the result set from SQL database to this structure.

Specifying the XML structure

The approach used in the previous example is quite obvious - just fill the fields of an F# record with the data returned from a database. However, the same technique can be used when working with XML data as well. The second example that you can find in the sources uses F# discriminated unions to specify the structure of an XML file. For example, the following types define the structure of an RSS feed:

 1: // Specifies the expected structure of XML document
 2: type Title = Title of string
 3: type Link = Link of string
 4: type Description = Description of string
 5: 
 6: /// Item contains title, link and description
 7: type Item = Item of Title * Link * Description
 8: /// Channel contains information and a list of (nested) items
 9: type Channel = Channel of Title * Link * Description * list<Item>
10: /// Root element is 'rss' containing a channel
11: type Rss = Rss of Channel

The first three declarations specify that title, link and description are simple XML elements (for example <title>) containing text. The name of the discriminated union case corresponds to the name of XML element. The <item> element contains three other elements, which is expressed using union case with multiple arguments. The Channel type demonstrates another feature - it is possible to use the F# list type to express the fact that another item can appear repeatedly as a child element of some XML element.

The above example didn't answer why we used discriminated unions in the first place. The reason is that some elements may contain one of several different elements. For example, a <div> element in XHTML may contain <p>, <h1>, another <div> or many other different elements. These can be represented as multiple cases (see the sample source code for a complete example).

However, back to the RSS example. Once we defined the source data structure (how the data is represented in XML), we can also define a target data structure (how we want to pass the data to the view of a web application). This is a simple F# record type:

1: /// Represents a collection of news (title, 
2: /// description, url) with a media name and a link
3: type Listing = 
4:   { Name : string
5:     Link : string
6:     Items : seq<string * string * string> }

Now we can use the F# library that matches XML data to a specified discriminated union type. Then we can process the data and turn the RSS feed into Listing value:

 1: /// Loads news from the Guardian using RSS feed
 2: let loadGuardian() =
 3:   // Download data and convert them to the 'Rss' structure
 4:   let url = "http://feeds.guardian.co.uk/theguardian/world/rss"
 5:   let doc = StructuralXml.Load(url, LowerCase = true)
 6:   let (Rss(Channel(Title title, Link link, _, items))) = doc.Root 
 7: 
 8:   // Create F# 'Listing' type from the parsed XML
 9:   let items = seq { 
10:     for (Item(Title title, Link link, Description descr)) in items do
11:       yield title, link, stripHtml descr }
12:   { Name = title; Link = link; Items = items }

The parsing of XML documents into a structure defined using F# discriminated unions is implemented using the StructuralXml.Load method. The method has a type argument that specifies the target structure. We didn't write it explicitly, because the result is then assigned to a pattern that involves the Rss(...) constructor, so F# infers the type from the context.

Once the snippet transforms XML document into the required F# Rss type, it is quite easy to transform the value to the Listing type. The snippet iterates over all the Items (corresponding to XML <item> element) and creates a sequence of triples containing article title, link and a description.

Generating structure from the data

The approach discussed in the previous section relies on the fact that the developer defines the expected structure of the data (using some C# or F# types, possibly with .NET attributes or other hints). In some cases (e.g. LINQ to SQL), this structure can be generated by a tool, but it still needs to be there.

There are two problems with this approach. Firstly, explicitly declaring the structure may be a bit tedious. Secondly, some online data sources simply have too many types - a web service dictionary may contain types for thousands of web services and each of them has several types.

The solution that will be available in a future version of F# is called type providers. The idea is simple - instead of declaring the types explicitly, we can create a plugin that tells the F# compiler what would a .NET representation of the data source look like. The Visual Studio IntelliSense can then display these (fake) types as if they were real types and they can be used to write F# program just like ordinary types. When the program is compiled, the F# compiler calls the plugin again to deal with these types. The plugin can replace their uses with some other F# expression or actually generate real .NET types.

Type provider for World Bank data

The following snippet is an example that I demonstrated during the talk. It uses a type provider for accessing the WorldBank API (using a simple REST) service. The snippet also uses the FSharpChart charting library to generate a chart showing the central government debt for several EU countries (screenshot above):

#r @@"WorldBank\Samples.WorldBank.TypeProvider.dll"
#load @@"FSharpChart\FSharpChart.fsx"

open Samples.Charting
open System.Drawing
open System.Windows.Forms.DataVisualization.Charting

// Represents properties of a chart grid
let dashGrid = Grid(LineColor = Color.Gainsboro, LineDashStyle = ChartDashStyle.Dash)

// Create a list of countries we're interested in
let countries = 
  [ WorldBank.Countries.Greece; WorldBank.Countries.Ireland; 
    WorldBank.Countries.Denmark; WorldBank.Countries.``United Kingdom``;
    WorldBank.Countries.``Czech Republic`` ]


// Generate line chart with debt data series for every country 
// then combine the lines into a single combined chart 
FSharpChart.Combine
  [ for country in countries do
      let data = country.``Central government debt, total (% of GDP)``|> Seq.sortBy fst 
      yield upcast FSharpChart.Line(data, Name=unbox country) ]
|> FSharpChart.WithLegend(Docking = Docking.Left)
|> FSharpChart.WithArea.AxisY(MajorGrid = dashGrid) 
|> FSharpChart.WithArea.AxisX(MajorGrid = dashGrid)

The type provider is a .NET assembly (Samples.WorldBank.TypeProvider.dll) that contains a plugin for the compiler. Note that it doesn't actually contain any types that are used in the snippet - instead, it generates them based on the data it downloads from the World Bank. The type provider generates types in the WorldBank namespace. The snippet first uses a type WorldBank.Countries that contains all countries known to the World Bank as static members. The snippet creates a list containing some of the countries.

The country value has a large number of properties that represent individual indicators that the World Bank provides. The number of indicators is incredible (about 4 thousands), so imagine generated (or even handwritten) class that needs to be included in every assembly that uses the World Bank. The type provider used in this example creates a fake type with all the properties (see screenshot below), but when you actually compile your code, the type will be replaced with some simple representation.

After getting the debt data, the snippet uses the FSharpChart library to generate a chart. It creates a list of line charts using FSharpChart.Line and combines them into a single chart using FShparChart.Combine. Then it calls a couple of With functions to configure the chart. In particular, it adds a legend and specifies the color of grid lines.

Summary

In this article, I briefly summarized the key points from my GOTO 2011 talk about accessing loosely structured data from C# and F#. You can find the materials in my GitHub repository and the slides are also available at SlideShare. The talks discussed technologies that can bridge the gap between structure defined in a programming language (classes, records or discriminated unions) and the structure that is present in the data (XML or database schema). I discussed three options:

The source code for examples that demonstrate the first two techniques are available in the GitHub repository. The example showing how to access World Bank data using type providers will be released as soon as Microsoft releases some beta of F# with type providers.

type PollOption =
  {ID: int;
   Votes: int;
   Percentage: float;
   Title: string;}

Full name: FSharp.PollOption

  type: PollOption
  implements: System.IEquatable<PollOption>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<PollOption>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable


Data returned from the model to a view
PollOption.ID: int
Multiple items
val int : 'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------

type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>

  type: int<'Measure>
  implements: System.IComparable
  implements: System.IConvertible
  implements: System.IFormattable
  implements: System.IComparable<int<'Measure>>
  implements: System.IEquatable<int<'Measure>>
  inherits: System.ValueType


--------------------

type int = int32

Full name: Microsoft.FSharp.Core.int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType
PollOption.Votes: int
PollOption.Percentage: float
Multiple items
val float : 'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------

type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

  type: float<'Measure>
  implements: System.IComparable
  implements: System.IConvertible
  implements: System.IFormattable
  implements: System.IComparable<float<'Measure>>
  implements: System.IEquatable<float<'Measure>>
  inherits: System.ValueType


--------------------

type float = System.Double

Full name: Microsoft.FSharp.Core.float

  type: float
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<float>
  implements: System.IEquatable<float>
  inherits: System.ValueType
PollOption.Title: string
Multiple items
val string : 'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------

type string = System.String

Full name: Microsoft.FSharp.Core.string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>
val load : unit -> PollOption list

Full name: FSharp.DynamicDemo.load

Call 'GetOptions' procedure to load collection of options
val db : DynamicDatabase
type DynamicDatabase =
  class
    new : connectionString:string -> DynamicDatabase
    member NonQuery : DatabaseNonQuery
    member Query : DatabaseQuery
  end

Full name: FSharp.Dynamic.DynamicDatabase
val connectionString : string

Full name: FSharp.DynamicDemo.connectionString

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>
val options : seq<Row>

  type: seq<Row>
  inherits: System.Collections.IEnumerable
property DynamicDatabase.Query: DatabaseQuery
val row : Row
val load : unit -> seq<PollOption>

Full name: FSharp.StructuralDemo.load

Call 'GetOptions' and automtically convert result to a collection
Multiple items
val seq : seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Core.Operators.seq

--------------------

type seq<'T> = System.Collections.Generic.IEnumerable<'T>

Full name: Microsoft.FSharp.Collections.seq<_>

  type: seq<'T>
  inherits: System.Collections.IEnumerable
type DynamicDatabase =
  class
    new : connectionString:string -> DynamicDatabase
    member private ConnectionString : string
    static member ( ? ) : x:DynamicDatabase * name:string -> ('T -> 'R)
  end

Full name: FSharpWeb.Core.DynamicDatabase
val connectionString : string

Full name: FSharp.StructuralDemo.connectionString

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>
val vote : int -> unit

Full name: FSharp.StructuralDemo.vote

Call 'Vote' procedure without returning any value
val id : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType
type unit = Unit

Full name: Microsoft.FSharp.Core.unit

  type: unit
  implements: System.IComparable
type Listing =
  {Name: string;
   Link: string;
   Items: seq<string * string * string>;}

Full name: FSharp.Listing

  type: Listing
  implements: System.IEquatable<Listing>
  implements: System.Collections.IStructuralEquatable


Represents a collection of news (title,
 description, url) with a media name and a link

Listing.Name: string
Listing.Link: string
Listing.Items: seq<string * string * string>
Multiple items
union case Title.Title: string -> Title

--------------------

type Title = | Title of string

Full name: FSharp.Model.Title

  type: Title
  implements: System.IEquatable<Title>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<Title>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable
Multiple items
union case Link.Link: string -> Link

--------------------

type Link = | Link of string

Full name: FSharp.Model.Link

  type: Link
  implements: System.IEquatable<Link>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<Link>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable
Multiple items
union case Description.Description: string -> Description

--------------------

type Description = | Description of string

Full name: FSharp.Model.Description

  type: Description
  implements: System.IEquatable<Description>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<Description>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable
Multiple items
union case Item.Item: Title * Link * Description -> Item

--------------------

type Item = | Item of Title * Link * Description

Full name: FSharp.Model.Item

  type: Item
  implements: System.IEquatable<Item>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<Item>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable


Item contains title, link and description
Multiple items
union case Channel.Channel: Title * Link * Description * Item list -> Channel

--------------------

type Channel = | Channel of Title * Link * Description * Item list

Full name: FSharp.Model.Channel

  type: Channel
  implements: System.IEquatable<Channel>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<Channel>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable


Channel contains information and a list of (nested) items
type 'T list = List<'T>

Full name: Microsoft.FSharp.Collections.list<_>

  type: 'T list
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<List<'T>>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.Generic.IEnumerable<'T>
  implements: System.Collections.IEnumerable
Multiple items
union case Rss.Rss: Channel -> Rss

--------------------

type Rss = | Rss of Channel

Full name: FSharp.Model.Rss

  type: Rss
  implements: System.IEquatable<Rss>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<Rss>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable


Root element is 'rss' containing a channel
val loadGuardian : unit -> Listing

Full name: FSharp.Model.loadGuardian

Loads news from the Guardian using RSS feed
val url : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>
val doc : StructuralXml<Rss>
type StructuralXml<'T> =
  class
    private new : url:string * ns:string option * lowerCase:bool -> StructuralXml<'T>
    member Root : 'T
    static member Load : url:string * ?Namespace:string * ?LowerCase:bool -> StructuralXml<'T>
  end

Full name: FSharpWeb.Core.StructuralXml<_>

Provides an easy access to XML data
static member StructuralXml.Load : url:string * ?Namespace:string * ?LowerCase:bool -> StructuralXml<'T>

Load XML data from the specified URI and dynamically match them
 to a structure described by the discriminated union 'T. Optional
 arguments can be used to specify default XML namespace and to
 specify that case names should be treated as lower case.

val title : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>
val link : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>
val items : Item list

  type: Item list
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<List<Item>>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.Generic.IEnumerable<Item>
  implements: System.Collections.IEnumerable
property StructuralXml.Root: Rss

Returns the parsed XML data structure as a value of the user-specified type
val items : seq<string * string * string>

  type: seq<string * string * string>
  inherits: System.Collections.IEnumerable
val descr : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>
val stripHtml : string -> string

Full name: FSharp.Model.stripHtml

Helper function that strips HTML from a
 string and takes first 200 characters

Published: Thursday, 26 May 2011, 10:51 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: c#, presentations, f#