Building great open-source libraries

F# documentation tools

The hard part about successful open-source development is not putting the first version of your source code on GitHub. The hard part is what comes next. First of all, there are community aspects - making sure that the project fits well with other work in the area, engaging the community and contributors, planing future directions for the project and so on. Secondly, there is an infrastructural side - making sure that there is a package (on NuGet in the F# world), easy to run and useful tests and also up-to-date documentation and tutorials.

In this article, I want to talk about the infrastructural side, which is easier of the two, but nevertheless, difficult to get right!

On the techincal side, I think that every good open-source library needs to have:

  • Unit tests - at least for non-trivial parts of code and to prevent regressions
  • Random testing - for tricky parts of code, it is useful and helps checking unexpected cases
  • NuGet package - or other up-to-date and easy to use release; for F# projects, we might also want to have an easy to download ZIP file for simple interactive scripts
  • Documentation - for public API, at least when the API is not super simple
  • Tutorials & walkthroughs - showing how to call the API in a larger-scale scenarios
  • Automation - when releasing a new version, all of the above should happen with "one click" and documentation with tutorials must be up-to-date and correct.

Ticking all the points is a lot of work, but it is crucial - if you do not have these, your project will be difficult to use, making a new release will take time and documentation with tutorials will become useless. Fortunately for me, the F# community made an amazing progress in this direction, so let's have a look at some of the tools that make this possible...

Before going further, let me say big thanks to Steffen Forkmann, the author of FAKE, and Gustavo Guerra, who wrote most of the automation for F# Data that I'll use as an example.

Automate everything with FAKE

Let me start from the end of the list. FAKE is a F# build automation system that does a lot more than just building. In fact, FAKE can easily call MSBUILD scripts (and build F# projects just using an existing fsproj file). I think the real value is in all the additional tools that it provides.

For example, here is what happens when you run the build script from the F# Data library. It:

  • Parses RELEASE_NOTES.md to get the information about the last version number and release notes (that will be used later to build NuGet package)
  • Generates AssemblyInfo.fs with the right version and project information
  • Builds the project and tests by calling MSBUILD (or xbuild) on sln files
  • Runs the NUnit tests (and stops if there is a failure), but more about testing later...
  • While runing tests, it also checks that your documentation does not contain errors - if you do not believe, continue reading :-)
  • Builds a NuGet package and optionally pushes it to nuget.org
  • Automatically builds documentation using F# Formatting tool that is discussed next
  • As a bonus, it also pushes the documentation to the gh-pages branch and builds a ZIP with the binaries for easy download.

All this means that it is really easy to maintain a project. When you get a pull request (and point the contributor to the right place to add tests and documentation), you can then update everything with just a single command.

And you have a guarantee that your documentation is up-to-date and correct too, which is done using another F# project that I'll discuss next...

Documenting libraries with F# Formatting

F# Formatting is not your good old regular-expression based syntax highlighter. It calls the F# compiler (which is fully open-source, in case you did not know) and uses the actual compiler to colorize code. Aside from that, it also type-checks the code and extracts tooltip information that you'd see in MonoDevelop or Visual Studio. It is used on this blog too, so here is an example (hover over identifiers with mouse pointer to see tool tips):

1: 
2: 
3: 
4: 
5: 
/// Say hello to the specified person
let hello person = 
  printfn "Hello %s!" person

hello "Tomas"

For statically typed langauges with type inference, this is extremely useful. Just remember when you were last looking at C# snippet using var and wondered what the type of a variable is...

To build a great documentation for a project using F# Formatting, you can use two features. I'll use the Deedle data manipulation library as an example:

  • Write tutorials - these can be standard F# script files that you can run, with special comments written using (** .. *) that contain Markdown. F# Formatting turns them into nicely formatted tutorials

  • Generate API reference - if you include /// comments for public functions (written in a simple Markdown style), you can automatically generate API reference from them, for example, like the FakeLib reference.

Does your documentation type-check?

The last thing I mentioned is that the build process checks if your documentation is correct. Obviously, it does not check that your documentation makes sense :-) but it does make sure that code samples your documentation type check. This is done, for example, in the F# Data documentation tests.

Failing documentation tests, after API change

What does this mean? When you change your API (add or remove parameters, change type, or rename function or types) without making corresponding changes to your documentation, you'll get a unit test failure!

This is only possible because F# Formatting can call the compiler to do the actual formatting and checking work - and it does not only work in fsx files. The same is done on md files that contain F# code snippets (using 4 spaces before the snippet).

Testing with FsUnit and FsCheck

Speaking of unit tests, there are a few more things to be written. I'm not an expert when it comes to testing (the chapter by Phil Trelford in our upcoming F# book is a better source!), but tests are clearly important - especially for open-source projects with multiple contributors that need to collaborate on the code base.

Less painful writing and running

There are three things that make writing tests less painful. First, FsUnit is a nice DSL for writing tests in a more readable way. Second, the F# ``backtick`` notation lets you use full description as a test name. And third, you can setup your environment to make tests runnable really quickly from REPL.

Let's look at a sample test for the XML type provder from F# Data:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
#if INTERACTIVE
#r "../../../bin/FSharp.Data.dll"
(*[omit:(other references omitted)]*)
#r "../../../packages/NUnit.2.6.3/lib/nunit.framework.dll"
#r "../../../packages/FsCheck.0.9.1.0/lib/net40-Client/FsCheck.dll"
#load "../../Common/FsUnit.fs"(*[/omit]*)
#else
module FSharp.Data.XmlTests
#endif

type PersonXml = XmlProvider<(...)>
let newXml = """
  <authors>
    <author name="Jane" surname="Doe" age="23" />
  </authors>"""

[<Test>]
let ``Jane should have first name of Jane``() = 
    let firstPerson = PersonXml.Parse(newXml).Author
    firstPerson.Name |> should equal "Jane"

The test is included in an fs file in a project that is compiled into a dll that can be tested with standard NUnit test runners. However, the first 9 lines make the test also runnable in F# Interactive - you can select the entire source code and hit Alt+Enter to load the tests in F# Interactive and run them line-by-line, testing different inputs interactively. When writing tests, this is much easier then changing your code and re-compiling tests to run them.

The test itself uses the backtic notation to include the whole test description in its name SoYouDoNotNeedToDecipherThis! The FsUnit library that is also used here defines a simple readable DSL so that you can write your test in the form <value> |> should <property>. For example, you can say "Hello" |> should startWith "H".

Testing complex logic

Finally, the last great tool that I want to mention in this article is a random testing framework FsCheck. This is particularly useful if you need to test some algorithm or more complex function that has some (mathematical) properties.

For example, I wrote a function binarySearchNearestGreater that performs binary search on a sorted array and returns the index of a specified element, or index of an element that is the nearest greater in the array. The function has a property that the value at the returned index is equal, or greater than the specified key (or, if the function does not find any element, it means that all are smaller).

FsCheck can easily verify that the property holds for randomly generated inputs (and it also generates inputs that cover corner cases):

1: 
2: 
3: 
4: 
5: 
6: 
7: 
[<Test>]
let ``Binary searching for nearest greater value satisfies laws`` () =
  Check.QuickThrowOnFailure(fun (input:int[]) (key:int) -> 
    let input = Array.sort input
    match Array.binarySearchNearestGreater key comparer input with
    | Some idx -> input.[idx] >= key
    | None -> Seq.forall (fun v -> v < key) input )

The operation Check.QuickThrowOnFailure takes a function that specifies the predicate and automatically generates 100 (or more) random inputs for input and key. The above sample uses NUnit, but FsCheck also comes with xUnit integration that makes the testing code even simpler (just write a function with the Property attribute).

Random testing is certainly not useful for all tests, but it is great when you have some property that should hold. This is often the case for algorithms, or when you have a pair of functions for converting "there and back again" (then you can just say that the conversion there and back should return the original thing).

Summary

Building a great open-source library is a difficult thing and I certianly do not claim that I have a recipe for that. But I'm contributing to a few F# libraries and I think I have learned a thing or two from my mistakes.

For me, one of the most difficult things (technically) is keeping libraries up-to-date even when I don't have time for it. The best way to solve this is to automate everything so that you can accept a pull request and run a single command that runs the whole build process, including NuGet release, documentation update and as many sanity checks as possible, both for the code itself and for the documentation.

This article gave a quick overview of the tools that make this amazingly easy with F# - including the awesome FAKE build tool, unit testing tools like FsUnit and FsCheck and documentation tools in F# Formatting that can even be integrated with unit tests to make sure your documentation is correct.

val hello : person:string -> unit

Full name: Great-open-source_.hello


 Say hello to the specified person
val person : string
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
namespace FSharp
namespace FSharp.Data
type PersonXml = XmlProvider<...>

Full name: Great-open-source_.PersonXml
type XmlProvider

Full name: FSharp.Data.XmlProvider


<summary>Typed representation of a XML file</summary>
       <param name='Sample'>Location of a XML sample file or a string containing a sample XML document</param>
       <param name='Global'>If true, the inference unifies all XML elements with the same name</param>
       <param name='Culture'>The culture used for parsing numbers and dates.</param>
       <param name='SampleList'>If true, the children of the root in the sample document represent individual samples for the inference.</param>
       <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution)</param>
"""<authors><author name="Ludwig" surname="Wittgenstein" age="29" /></authors>"""
val newXml : string

Full name: Great-open-source_.newXml
Multiple items
type TestAttribute =
  inherit Attribute
  new : unit -> TestAttribute
  member Description : string with get, set

Full name: NUnit.Framework.TestAttribute

--------------------
TestAttribute() : unit
val ( Jane should have first name of Jane ) : unit -> unit

Full name: Great-open-source_.( Jane should have first name of Jane )
val firstPerson : XmlProvider<...>.DomainTypes.Author
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.DomainTypes.Authors
property XmlProvider<...>.DomainTypes.Author.Name: string
val should : f:('a -> #Constraints.Constraint) -> x:'a -> y:obj -> unit

Full name: FsUnit.TopLevelOperators.should
val equal : x:'a -> Constraints.EqualConstraint

Full name: FsUnit.TopLevelOperators.equal
val ( Binary searching for nearest greater value satisfies laws ) : unit -> unit

Full name: Great-open-source_.( Binary searching for nearest greater value satisfies laws )
type Check =
  static member All : config:Config -> unit
  static member All : config:Config * test:Type -> unit
  static member Method : config:Config * methodInfo:MethodInfo * ?target:obj -> unit
  static member One : config:Config * property:'Testable -> unit
  static member One : name:string * config:Config * property:'Testable -> unit
  static member Quick : property:'Testable -> unit
  static member Quick : name:string * property:'Testable -> unit
  static member QuickAll : unit -> unit
  static member QuickAll : test:Type -> unit
  static member QuickThrowOnFailure : property:'Testable -> unit
  ...

Full name: FsCheck.Check
static member Check.QuickThrowOnFailure : property:'Testable -> unit
val input : int []
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
val key : int
type Array =
  member Clone : unit -> obj
  member CopyTo : array:Array * index:int -> unit + 1 overload
  member GetEnumerator : unit -> IEnumerator
  member GetLength : dimension:int -> int
  member GetLongLength : dimension:int -> int64
  member GetLowerBound : dimension:int -> int
  member GetUpperBound : dimension:int -> int
  member GetValue : params indices:int[] -> obj + 7 overloads
  member Initialize : unit -> unit
  member IsFixedSize : bool
  ...

Full name: System.Array
val sort : array:'T [] -> 'T [] (requires comparison)

Full name: Microsoft.FSharp.Collections.Array.sort
val binarySearchNearestGreater : key:'T -> comparer:Collections.Generic.IComparer<'T> -> array:'T [] -> int option

Full name: FSharp.DataFrame.Internal.Array.binarySearchNearestGreater


 Returns the index of 'key' or the index of immediately following value.
 If the specified key is greater than all keys in the array, None is returned.
val comparer : Collections.Generic.Comparer<int>

Full name: Great-open-source_.comparer
union case Option.Some: Value: 'T -> Option<'T>
val idx : int
union case Option.None: Option<'T>
Multiple items
module Seq

from FSharp.DataFrame.Internal


 This module contains additional functions for working with sequences.
 `FSharp.DataFrame.Internals` is opened, it extends the standard `Seq` module.


--------------------
module Seq

from Microsoft.FSharp.Collections
val forall : predicate:('T -> bool) -> source:seq<'T> -> bool

Full name: Microsoft.FSharp.Collections.Seq.forall
val v : int

Discuss on twitter, .
Send corrections via GitHub pull requests.