F# Data New type provider library
When F# 3.0 type providers were still in beta version, I wrote a couple of type
providers as examples for talks. These included the WorldBank type provider
(now available on Try F#) and also type provider for
XML that infered the structure from sample.
For some time, these were hosted as part of FSharpX
and the authors of FSharpX also added a number of great features.
When I found some more time earlier this year, I decided to start a new library that would be fully focused on data access in F# and on type providers and I started working on F# Data. The library has now reached a stable state and Steffen also announced that the document type providers (JSON, XML and CSV) are not going to be available in FSharpX since the next version.
This means that if you're interested in accessing data using F# type providers, you should now go to F# Data. Here are the most important links:
Before looking at the details, I would like to thank to Gustavo Guerra who made some amazing contributions to the library! (More contributors are always welcome, so continue reading if you're interested...)
F# Data Overview
The library contains several type providers, a couple of helper functions and it also comes with comprehensive documentation. Here is a quick summary of the key features:
-
Document type providers are providers for JSON, XML and CSV that infer the structure of a file from a provided example and give you a typed access to other data in the same format.
-
WorldBank and Freebase are also hosted as part of the library. They give you access to WorldBank indicators (information about countries) and to the Freebase graph database (this was originally written as a sample by the F# team).
-
Comprehensive documentation the library is using my other project, F# Formatting to automatically generate a nice documentation from
*.fsx
script files with examples. -
HTTP utility the library also contains a very easy to use type for making HTTP requests with just a single line (look for
Http.Request
in the documentation. This is something that I've been missing a lot when working with REST APIs from F#.
F# Data Code Samples
I do not want to spend too much time demonstrating all the awesome features of the F# Data library, but let me include just a few code snippets to demonstrate some interesting features. (You can find more in the documentation).
All the samples assume that we're using an F# Script file, so we start by referencing the
F# Data library using #r
(in a project file, you would add reference as usual). I also
open two namespaces - FSharp.Data
with the data-related API and FSharp.Net
with a
helper type Http
for making HTTP requests:
1: 2: 3: |
|
Now, let's quickly look at a number of examples that demonstrate the F# Data library.
You cannot quite see that in static code sample on a blog, but note that all data access
is done in a typed way. When you type .
, you get a completion and if you make a typo,
you'll get an instantaneous feedback about the error.
Geting government debt from WorldBank
The WorldBankData
type gives you access to the World Bank data
set. For example, we can look at "Czech Republic" and get the government debt for the
most recent year (using Seq.maxBy
to get value for the most recent year available):
1: 2: 3: |
|
Geting religion list from Freebase
The FreebaseData
type gives you access to Freebase. You can just
type .
and explore the data sources available - for example, to look at a list of
religions and print first 10:
1: 2: 3: |
|
Parsing RSS news feed from BBC
If you want to get news using RSS, you can use XmlProvider
. All you need is a sample
file or a string (marked as Literal
) with the RSS data. Then you can pass this string
or file to the provider as a static parameter and you'll get nice types for working with
RSS feeds. Here, we get news from BBC using the Http.Request
helper:
1: 2: 3: 4: 5: 6: 7: |
|
Geting stock prices from Yahoo CSV
Working with CSV files is similar. The CsvProvider
takes static parameter with sample
data (either as a file name or as actual data). Here, we use a file and we also specify
that we only want to use first 10 rows for the inference (for performance reasons). The
provider infers column names and types. Here is how you calculate the average MSFT stock
price over the entire history:
1: 2: 3: |
|
Geting list of F# snippets using REST API
For our last example, we'll use REST API provided by F# Snippets. The
API returns a JSON data set containing information about snippets. We can easily use it by
defining a string Literal
with sample JSON and passing it to JsonProvider
. To get the
data, we use Http.Request
, but this time we specify Content-Type
header. Working
with the results is, again, done in a nice typed way:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: |
|
Summary
Although I started working on F# Data around christmas, this is the first blog post about it. The library had some time to develop and we fixed some of the most important bugs, so if you're interested in data access in F#, F# Data is the right tool for you!
I included a quick overview of some of the type providers that are available in the library - including those for WorldBank, Freebase, CSV, XML and JSON. Of course, I did not cover all the features of the library. You can find more information in the detailed documentation.
Contribute to F# Data
As I mentioned already, the library already had some great contributors. Gustavo Guerra did a great job on making it work in Portable profile and on Silverlight. However, there is always work to be done :-) and contributors are very welcome. If you're interested, check out the list of issues. I also wrote a page on contributing to F# Data with basic information about the library structure.
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.Data
--------------------
namespace Microsoft.FSharp.Data
static member GetDataContext : unit -> WorldBankDataService
nested type ServiceTypes
from Microsoft.FSharp.Collections
static member GetDataContext : unit -> FreebaseService
nested type ServiceTypes
Contains data and types drawn from the web data store. See www.freebase.com for terms and conditions.
type LiteralAttribute =
inherit Attribute
new : unit -> LiteralAttribute
--------------------
new : unit -> LiteralAttribute
<rss version="2.0">
<channel>
<title>BBC News - Home</title>
<link>http://www.bbc.co.uk/news/#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa</link>
<description>The latest stories from the Home section of the BBC News web site.</description>
<lastBuildDate>Thu, 28 Mar 2013 01:10:12 GMT</lastBuildDate>
<item>
<title>Government loses Abu Qatada appeal</title>
<description>Home Secretary Theresa May loses her appeal against a ruling preventing the deportation of radical cleric Abu Qatada.</description>
<link>http://www.bbc.co.uk/news/uk-21955844#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa</link>
<guid isPermaLink="false">http://www.bbc.co.uk/news/uk-21955844</guid>
<pubDate>Wed, 27 Mar 2013 17:37:35 GMT</pubDate>
</item>
<item>
<title>Synchrotron yields 'safer' vaccine</title>
<description>British scientists develop a new way to create an entirely synthetic vaccine which does not rely on using live infectious virus, meaning it is much safer.</description>
<link>http://www.bbc.co.uk/news/health-21958361#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa</link>
<guid isPermaLink="false">http://www.bbc.co.uk/news/health-21958361</guid>
<pubDate>Wed, 27 Mar 2013 22:00:04 GMT</pubDate>
</item>
<item>
<title>Oil firms invest $500m in huge field</title>
<description>Major oil companies announce plans which they hope will boost production from the UK's biggest oilfield.</description>
<link>http://www.bbc.co.uk/news/uk-scotland-scotland-business-21955536#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa</link>
<guid isPermaLink="false">http://www.bbc.co.uk/news/uk-scotland-scotland-business-21955536</guid>
<pubDate>Thu, 28 Mar 2013 00:31:05 GMT</pubDate>
</item>
</channel>
</rss>
"""
<summary>Typed representation of a XML file</summary>
<param name='Sample'>Location of a XML sample file or a string containing a sample XML document</param>
<param name='Global'>If true, the inference unifies all XML elements with the same name</param>
<param name='Culture'>The culture used for parsing numbers and dates.</param>
<param name='SampleList'>If true, the children of the root in the sample document represent individual samples for the inference.</param>
<param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution)</param>
private new : unit -> Http
static member AsyncRequest : url:string -> Async<string>
static member AsyncRequest : url:string * ?query:(string * string) list * ?headers:(string * string) list * ?meth:string * ?body:string -> Async<string>
static member Request : url:string -> string
static member Request : url:string * ?query:(string * string) list * ?headers:(string * string) list * ?meth:string * ?body:string -> string
static member Http.Request : url:string * ?query:(string * string) list * ?headers:(string * string) list * ?meth:string * ?body:string -> string
<summary>Typed representation of a CSV file</summary>
<param name='Sample'>Location of a CSV sample file or a string containing a sample CSV document</param>
<param name='Separator'>Column delimiter</param>
<param name='Culture'>The culture used for parsing numbers and dates.</param>
<param name='InferRows'>Number of rows to use for inference. Defaults to 1000. If this is zero, all rows are used.</param>
<param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution)</param>
CsvProvider<...>.Load(stream: System.IO.Stream) : CsvProvider<...>
"title": "Eurovision - Some(points)",
"description": "The Eurovision final scoring system using records and some higher order functions. (...)",
"likes": 1,
"link": "http://fssnip.net/cg",
"published": "5 months ago"},
{ "author": "Eirik Tsarpalis",
"title": "Codomains through Reflection",
"description": "Any type signature has the form of a curried chain T0 -> T1 -> .... -> Tn, where Tn is not a function type. (...)",
"likes": 2,
"link": "http://fssnip.net/cf",
"published": "5 months ago" } ]"""
<summary>Typed representation of a JSON document</summary>
<param name='Sample'>Location of a JSON sample file or a string containing a sample JSON document</param>
<param name='SampleList'>If true, sample should be a list of individual samples for the inference.</param>
<param name='Culture'>The culture used for parsing numbers and dates.</param>
<param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution)</param>
Published: Thursday, 28 March 2013, 3:23 AM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: open source, f#, f# data, type providers