Accessing loosely structured data from F# and C# (GOTO 2011)
About two weeks ago, I gave a talk at GOTO Conference in Copenhagen at a very interesting .NET session organized by Mark Seemann. In my talk, I focused on the impedance mismatch between the data structures that are used in programming languages (such as classes in C# or records and discriminated unions in F#) and the data structures that we need to access (such as database, XML files and REST services).
Clearly, both of the sides have some structure (otherwise, it wouldn't be possible to write any code against them!). Even an XML file that is returned by a REST service has some structure - although the only way to find out about the structure may be to call the service and then look at the result. In this article, I'll briefly summarize the ideas that I presented in the talk. Here are links to the slides as well as the source code from the talk:
- You can browse the slides at SlideShare or you can download them from GitHub in the PPT format.
- The source code is in my GitHub repository and can be downloaded as a single ZIP file.
Accessing data at different scales
No matter what technology we use for accessing data, someone, somewhere needs to somehow specify how to map data from the source to a structure that can be used in the programming language. I think there are three options:
- Expression scale - When using dynamic typing, the program specifies that some object is expected to contain a member (such as database column or XML node) of a specified name. The program also specifies the expected type of the member - for example a primitive type or another object.
- Program scale - The previous technique is local and is scattered around the code that works with data. When the data source is used in multiple places in the program (e.g. database), it makes sense to specify the expected structure and the mapping at once. This is used for example in LINQ to SQL (the structure is given by generated domain model).
- Internet scale - Describing the expected structure is reasonable if the data source is small. However, what if the program wants to access a data source with thousands of types? In that case, we need some automatic way for translating between the language used by the data source and the programming language. This is what F# type providers do.
The talk included several examples from every category. Unfortunately, F# type providers are not yet publicly available, so the source code for the talk doesn't include this example. The following three sections give a brief summary of some of the examples:
Using dynamic typing in C# and F#
The first approach is to specify the structure locally at the expression level. In C#,
this can be done using the dynamic
type. In F#, similar thing can be achieved
using the ?
operator. The compiler translates expressions like obj?Foo
to an operator call (?) obj "Foo"
where the name of the member becomes a string.
There are some interesting differences between the two approaches..
Accessing World Bank in C#
To demonstrate the dynamic typing in C#, I created an example that uses
dynamic
for accessing data provided by the World Bank.
The following snippet gets a list of regions (such as OECD countries, EU countries, Middle East
etc.):
dynamic wb = new DynamicWorldBank();
dynamic regions = wb.Region(new { PerPage = 100 });
foreach (var reg in regions.Regions.Region) {
Console.WriteLine("{0} ({1})", reg.Name.Value, reg.Code.Value);
}
The snippet creates DynamicWorldBank
instance, which supports dynamic invocation
of operations. It is assigned to a variable of type dynamic
, which allows us to
write wb.Region
even though the compiler cannot verify that there is
such member. At runtime, the name Region
is mapped to a web service request.
The example also uses C# anonymous types to specify additional arguments for the
call. Anonymous types are handy, because they can be used to specify both the name of the
argument and the value. The call will be translated to a web request to an URL like
http://api.worldbank.org/regions?per_page=100
.
The result of the call is some XML document that can be also accessed using the dynamic
typing. The expression regions.Regions.Region
returns a collection of all
<region>
elements nested in the root <regions>
element.
The member access reg.Name.Value
gets the textual content of a sub-element named
<name>
. The dynamic access to XML elements is implemented using a
fairly simple dynamic wrapper named DynamicXml
that is built on top of
the LINQ to XML library.
Accessing Database in F# (First Try)
To demonstrate the F# dynamic operator (?
), let's look at an example that
accesses SQL database using stored procedures. This example is based on my earlier blog
post about reading data from SQL database.
It is just a first try, because the second method (discussed below) makes this code even
nicer. Anyway, the example needs to read the information from database and store them in
the following F# record:
1: /// Data returned from the model to a view 2: type PollOption = 3: { ID : int 4: Votes : int 5: Percentage : float 6: Title : string }
We can use the dynamic operator for two things. The first is to use a nice member access
syntax for calling a SQL stored procedure (db.Query?GetOptions
calls a procedure
named GetOptions
with no arguments) and the second is to access columns from the
returned data set (row?Title
accesses the Title
column):
1: /// Call 'GetOptions' procedure to load collection of options 2: let load() = 3: let db = new DynamicDatabase(connectionString) 4: let options = db.Query?GetOptions() 5: // Loop over the result set and create 'PollOption' values 6: [ for row in options do 7: yield { ID = row?ID; Votes = row?Votes; 8: Title = row?Title; Percentage = row?Percentage } ]
When you perform any operation using the C# dynamic
type, the result will always be
of type dynamic
. The F# dynamic operator works differently - it is statically resolved to
some ?
operator that has a well-known return type. You can see that by exploring the types
in the above snippet (using tooltips). The type of options
is seq<Row>
,
which represents a sequence of rows obtained from the database.
The Row
type also provides
the ?
operator, which can be used to read individual columns. The return type of this
operator is a generic type parameter and F# type inference specifies the type argument based on the
context. When assigning the result of row?ID
to a record field of type int
,
the operator is called with int
as a type argument and so it can cast the column value
to the right type. This nicely reduces the boilerplate code that needs to be written, because casting
is inserted automatically.
As I mentioned, accessing database data can be even easiers if we use the second approach...
Matching data to a structure
In the previous examples, the structure was specified when accessing individual elements such as methods of the World Bank, XML document elements or database columns. However, we can also specify the structure all at once - by defining classes that represent the data and annotating them (e.g. using .NET attributes) to describe how to build the classes using the data.
This approach is used, for example, by LINQ to SQL - the (generated) classes come with attributes that specify how to map database data to .NET objects. However, the same idea can be used for accessing any data source. In my earlier blog post, I used this for calling PHP code from C# using Phalanger. Anyway, I used two other F# examples in my GOTO talk.
Accessing Database in F# (Second Try)
First, let's revisit the example with accessing databases. Look again at the previous snippet
that creates a value of the PollOptions
type from the row
object loaded
from database. The snippet seems quite redundant. It just dynamically assigns columns of a row
to fields of record with the same name. In fact, the record type PollOptions
fully specifies the structure that we want to get!
Using F# reflection, we can write a library that loads data from the SQL database and loads them into a record type that is specified by the caller. A function to load data and a function to cast a vote can then look like this:
1: /// Call 'GetOptions' and automtically convert result to a collection 2: let load() : seq<PollOption> = 3: let db = new DynamicDatabase(connectionString) 4: db?GetOptions() 5: 6: /// Call 'Vote' procedure without returning any value 7: let vote(id:int) : unit = 8: let db = new DynamicDatabase(connectionString) 9: db?Vote(id)
The snippet shows two functions. In both of the declarations, the return type is specified
explicitly using type annotations. The first function returns a sequence of PollOption
values, while the second one just updates the database and returns unit
.
If you look at the type of the ?
operator implemented by the DynamicDatabase
type used in this example, you'll see that it takes string
and returns a generic
function 'T -> 'R
. The type for both of the parameters is automatically provided
by the F# compiler. The type of the argument was just unit
(in the first example)
and int
(in the second example). The return type is the same as the return type of
the function (specified using type annotations).
The implementation of the ?
operator uses the type argument 'R
to
decide what it should do. When the type is unit
, it simply calls the stored procedure
without returning a result. The second case is more interesting - when the return type is
seq<SomeRecord>
, the operator matches the data obtained from the database to
this type. It enumerates over the returned data set and creates SomeRecord
values
from the data.
In this case the record type PollOptions
specifies the structure of the data and
the library coerces the result set from SQL database to this structure.
Specifying the XML structure
The approach used in the previous example is quite obvious - just fill the fields of an F# record with the data returned from a database. However, the same technique can be used when working with XML data as well. The second example that you can find in the sources uses F# discriminated unions to specify the structure of an XML file. For example, the following types define the structure of an RSS feed:
1: // Specifies the expected structure of XML document 2: type Title = Title of string 3: type Link = Link of string 4: type Description = Description of string 5: 6: /// Item contains title, link and description 7: type Item = Item of Title * Link * Description 8: /// Channel contains information and a list of (nested) items 9: type Channel = Channel of Title * Link * Description * list<Item> 10: /// Root element is 'rss' containing a channel 11: type Rss = Rss of Channel
The first three declarations specify that title, link and description are simple XML elements
(for example <title>
) containing text. The name of the discriminated union case
corresponds to the name of XML element. The <item>
element contains three other
elements, which is expressed using union case with multiple arguments. The Channel
type
demonstrates another feature - it is possible to use the F# list type to express the fact that
another item can appear repeatedly as a child element of some XML element.
The above example didn't answer why we used discriminated unions in the first place. The reason is
that some elements may contain one of several different elements. For example, a <div>
element in XHTML may contain <p>
, <h1>
, another <div>
or many other different elements. These can be represented as multiple cases (see the sample source code
for a complete example).
However, back to the RSS example. Once we defined the source data structure (how the data is represented in XML), we can also define a target data structure (how we want to pass the data to the view of a web application). This is a simple F# record type:
1: /// Represents a collection of news (title, 2: /// description, url) with a media name and a link 3: type Listing = 4: { Name : string 5: Link : string 6: Items : seq<string * string * string> }
Now we can use the F# library that matches XML data to a specified discriminated union type. Then we
can process the data and turn the RSS feed into Listing
value:
1: /// Loads news from the Guardian using RSS feed 2: let loadGuardian() = 3: // Download data and convert them to the 'Rss' structure 4: let url = "http://feeds.guardian.co.uk/theguardian/world/rss" 5: let doc = StructuralXml.Load(url, LowerCase = true) 6: let (Rss(Channel(Title title, Link link, _, items))) = doc.Root 7: 8: // Create F# 'Listing' type from the parsed XML 9: let items = seq { 10: for (Item(Title title, Link link, Description descr)) in items do 11: yield title, link, stripHtml descr } 12: { Name = title; Link = link; Items = items }
The parsing of XML documents into a structure defined using F# discriminated unions is implemented
using the StructuralXml.Load
method. The method has a type argument that specifies the
target structure. We didn't write it explicitly, because the result is then assigned to a pattern
that involves the Rss(...)
constructor, so F# infers the type from the context.
Once the snippet transforms XML document into the required F# Rss
type, it is quite
easy to transform the value to the Listing
type. The snippet iterates over all the
Items
(corresponding to XML <item>
element) and creates a sequence
of triples containing article title, link and a description.
Generating structure from the data
The approach discussed in the previous section relies on the fact that the developer defines the expected structure of the data (using some C# or F# types, possibly with .NET attributes or other hints). In some cases (e.g. LINQ to SQL), this structure can be generated by a tool, but it still needs to be there.
There are two problems with this approach. Firstly, explicitly declaring the structure may be a bit tedious. Secondly, some online data sources simply have too many types - a web service dictionary may contain types for thousands of web services and each of them has several types.
The solution that will be available in a future version of F# is called type providers. The idea is simple - instead of declaring the types explicitly, we can create a plugin that tells the F# compiler what would a .NET representation of the data source look like. The Visual Studio IntelliSense can then display these (fake) types as if they were real types and they can be used to write F# program just like ordinary types. When the program is compiled, the F# compiler calls the plugin again to deal with these types. The plugin can replace their uses with some other F# expression or actually generate real .NET types.
Type provider for World Bank data
The following snippet is an example that I demonstrated during the talk. It uses a type provider for accessing the WorldBank API (using a simple REST) service. The snippet also uses the FSharpChart charting library to generate a chart showing the central government debt for several EU countries (screenshot above):
#r @@"WorldBank\Samples.WorldBank.TypeProvider.dll"
#load @@"FSharpChart\FSharpChart.fsx"
open Samples.Charting
open System.Drawing
open System.Windows.Forms.DataVisualization.Charting
// Represents properties of a chart grid
let dashGrid = Grid(LineColor = Color.Gainsboro, LineDashStyle = ChartDashStyle.Dash)
// Create a list of countries we're interested in
let countries =
[ WorldBank.Countries.Greece; WorldBank.Countries.Ireland;
WorldBank.Countries.Denmark; WorldBank.Countries.``United Kingdom``;
WorldBank.Countries.``Czech Republic`` ]
// Generate line chart with debt data series for every country
// then combine the lines into a single combined chart
FSharpChart.Combine
[ for country in countries do
let data = country.``Central government debt, total (% of GDP)``|> Seq.sortBy fst
yield upcast FSharpChart.Line(data, Name=unbox country) ]
|> FSharpChart.WithLegend(Docking = Docking.Left)
|> FSharpChart.WithArea.AxisY(MajorGrid = dashGrid)
|> FSharpChart.WithArea.AxisX(MajorGrid = dashGrid)
The type provider is a .NET assembly (Samples.WorldBank.TypeProvider.dll
) that
contains a plugin for the compiler. Note that it doesn't actually contain any types that are used
in the snippet - instead, it generates them based on the data it downloads from the World Bank.
The type provider generates types in the WorldBank
namespace. The snippet first
uses a type WorldBank.Countries
that contains all countries known to the World Bank
as static members. The snippet creates a list containing some of the countries.
The country
value has a large number of properties that represent individual
indicators that the World Bank provides. The number of indicators is incredible (about 4 thousands),
so imagine generated (or even handwritten) class that needs to be included in every assembly
that uses the World Bank. The type provider used in this example creates a fake type with
all the properties (see screenshot below), but when you actually compile your code, the type
will be replaced with some simple representation.
After getting the debt data, the snippet uses the FSharpChart library to generate a chart.
It creates a list of line charts using FSharpChart.Line
and combines them
into a single chart using FShparChart.Combine
. Then it calls a couple of
With
functions to configure the chart. In particular, it adds a legend and
specifies the color of grid lines.
Summary
In this article, I briefly summarized the key points from my GOTO 2011 talk about accessing loosely structured data from C# and F#. You can find the materials in my GitHub repository and the slides are also available at SlideShare. The talks discussed technologies that can bridge the gap between structure defined in a programming language (classes, records or discriminated unions) and the structure that is present in the data (XML or database schema). I discussed three options:
- Using C#
dynamic
type or the F#?
operator to specify the structure locally. - Describing the structure using classes or F# types and mapping data to the structure using reflection.
- Generating types automatically using F# type providers (from the data or external schema).
The source code for examples that demonstrate the first two techniques are available in the GitHub repository. The example showing how to access World Bank data using type providers will be released as soon as Microsoft releases some beta of F# with type providers.
{ID: int;
Votes: int;
Percentage: float;
Title: string;}
Full name: FSharp.PollOption
type: PollOption
implements: System.IEquatable<PollOption>
implements: System.Collections.IStructuralEquatable
implements: System.IComparable<PollOption>
implements: System.IComparable
implements: System.Collections.IStructuralComparable
Data returned from the model to a view
val int : 'T -> int (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int
--------------------
type int<'Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
type: int<'Measure>
implements: System.IComparable
implements: System.IConvertible
implements: System.IFormattable
implements: System.IComparable<int<'Measure>>
implements: System.IEquatable<int<'Measure>>
inherits: System.ValueType
--------------------
type int = int32
Full name: Microsoft.FSharp.Core.int
type: int
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<int>
implements: System.IEquatable<int>
inherits: System.ValueType
val float : 'T -> float (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.float
--------------------
type float<'Measure> = float
Full name: Microsoft.FSharp.Core.float<_>
type: float<'Measure>
implements: System.IComparable
implements: System.IConvertible
implements: System.IFormattable
implements: System.IComparable<float<'Measure>>
implements: System.IEquatable<float<'Measure>>
inherits: System.ValueType
--------------------
type float = System.Double
Full name: Microsoft.FSharp.Core.float
type: float
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<float>
implements: System.IEquatable<float>
inherits: System.ValueType
val string : 'T -> string
Full name: Microsoft.FSharp.Core.Operators.string
--------------------
type string = System.String
Full name: Microsoft.FSharp.Core.string
type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>
Full name: FSharp.DynamicDemo.load
Call 'GetOptions' procedure to load collection of options
class
new : connectionString:string -> DynamicDatabase
member NonQuery : DatabaseNonQuery
member Query : DatabaseQuery
end
Full name: FSharp.Dynamic.DynamicDatabase
Full name: FSharp.DynamicDemo.connectionString
type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>
type: seq<Row>
inherits: System.Collections.IEnumerable
Full name: FSharp.StructuralDemo.load
Call 'GetOptions' and automtically convert result to a collection
val seq : seq<'T> -> seq<'T>
Full name: Microsoft.FSharp.Core.Operators.seq
--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>
Full name: Microsoft.FSharp.Collections.seq<_>
type: seq<'T>
inherits: System.Collections.IEnumerable
class
new : connectionString:string -> DynamicDatabase
member private ConnectionString : string
static member ( ? ) : x:DynamicDatabase * name:string -> ('T -> 'R)
end
Full name: FSharpWeb.Core.DynamicDatabase
Full name: FSharp.StructuralDemo.connectionString
type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>
Full name: FSharp.StructuralDemo.vote
Call 'Vote' procedure without returning any value
type: int
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<int>
implements: System.IEquatable<int>
inherits: System.ValueType
Full name: Microsoft.FSharp.Core.unit
type: unit
implements: System.IComparable
{Name: string;
Link: string;
Items: seq<string * string * string>;}
Full name: FSharp.Listing
type: Listing
implements: System.IEquatable<Listing>
implements: System.Collections.IStructuralEquatable
Represents a collection of news (title,
description, url) with a media name and a link
union case Title.Title: string -> Title
--------------------
type Title = | Title of string
Full name: FSharp.Model.Title
type: Title
implements: System.IEquatable<Title>
implements: System.Collections.IStructuralEquatable
implements: System.IComparable<Title>
implements: System.IComparable
implements: System.Collections.IStructuralComparable
union case Link.Link: string -> Link
--------------------
type Link = | Link of string
Full name: FSharp.Model.Link
type: Link
implements: System.IEquatable<Link>
implements: System.Collections.IStructuralEquatable
implements: System.IComparable<Link>
implements: System.IComparable
implements: System.Collections.IStructuralComparable
union case Description.Description: string -> Description
--------------------
type Description = | Description of string
Full name: FSharp.Model.Description
type: Description
implements: System.IEquatable<Description>
implements: System.Collections.IStructuralEquatable
implements: System.IComparable<Description>
implements: System.IComparable
implements: System.Collections.IStructuralComparable
union case Item.Item: Title * Link * Description -> Item
--------------------
type Item = | Item of Title * Link * Description
Full name: FSharp.Model.Item
type: Item
implements: System.IEquatable<Item>
implements: System.Collections.IStructuralEquatable
implements: System.IComparable<Item>
implements: System.IComparable
implements: System.Collections.IStructuralComparable
Item contains title, link and description
union case Channel.Channel: Title * Link * Description * Item list -> Channel
--------------------
type Channel = | Channel of Title * Link * Description * Item list
Full name: FSharp.Model.Channel
type: Channel
implements: System.IEquatable<Channel>
implements: System.Collections.IStructuralEquatable
implements: System.IComparable<Channel>
implements: System.IComparable
implements: System.Collections.IStructuralComparable
Channel contains information and a list of (nested) items
Full name: Microsoft.FSharp.Collections.list<_>
type: 'T list
implements: System.Collections.IStructuralEquatable
implements: System.IComparable<List<'T>>
implements: System.IComparable
implements: System.Collections.IStructuralComparable
implements: System.Collections.Generic.IEnumerable<'T>
implements: System.Collections.IEnumerable
union case Rss.Rss: Channel -> Rss
--------------------
type Rss = | Rss of Channel
Full name: FSharp.Model.Rss
type: Rss
implements: System.IEquatable<Rss>
implements: System.Collections.IStructuralEquatable
implements: System.IComparable<Rss>
implements: System.IComparable
implements: System.Collections.IStructuralComparable
Root element is 'rss' containing a channel
Full name: FSharp.Model.loadGuardian
Loads news from the Guardian using RSS feed
type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>
class
private new : url:string * ns:string option * lowerCase:bool -> StructuralXml<'T>
member Root : 'T
static member Load : url:string * ?Namespace:string * ?LowerCase:bool -> StructuralXml<'T>
end
Full name: FSharpWeb.Core.StructuralXml<_>
Provides an easy access to XML data
Load XML data from the specified URI and dynamically match them
to a structure described by the discriminated union 'T. Optional
arguments can be used to specify default XML namespace and to
specify that case names should be treated as lower case.
type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>
type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>
type: Item list
implements: System.Collections.IStructuralEquatable
implements: System.IComparable<List<Item>>
implements: System.IComparable
implements: System.Collections.IStructuralComparable
implements: System.Collections.Generic.IEnumerable<Item>
implements: System.Collections.IEnumerable
Returns the parsed XML data structure as a value of the user-specified type
type: seq<string * string * string>
inherits: System.Collections.IEnumerable
type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>
Full name: FSharp.Model.stripHtml
Helper function that strips HTML from a
string and takes first 200 characters
Published: Thursday, 26 May 2011, 10:51 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: c#, presentations, f#