Building great open-source libraries
The hard part about successful open-source development is not putting the first version of your source code on GitHub. The hard part is what comes next. First of all, there are community aspects - making sure that the project fits well with other work in the area, engaging the community and contributors, planing future directions for the project and so on. Secondly, there is an infrastructural side - making sure that there is a package (on NuGet in the F# world), easy to run and useful tests and also up-to-date documentation and tutorials.
In this article, I want to talk about the infrastructural side, which is easier of the two, but nevertheless, difficult to get right!
On the technical side, I think that every good open-source library needs to have:
- Unit tests - at least for non-trivial parts of code and to prevent regressions
- Random testing - for tricky parts of code, it is useful and helps checking unexpected cases
- NuGet package - or other up-to-date and easy to use release; for F# projects, we might also want to have an easy to download ZIP file for simple interactive scripts
- Documentation - for public API, at least when the API is not super simple
- Tutorials & walkthroughs - showing how to call the API in a larger-scale scenarios
- Automation - when releasing a new version, all of the above should happen with "one click" and documentation with tutorials must be up-to-date and correct.
Ticking all the points is a lot of work, but it is crucial - if you do not have these, your project will be difficult to use, making a new release will take time and documentation with tutorials will become useless. Fortunately for me, the F# community made an amazing progress in this direction, so let's have a look at some of the tools that make this possible...
Before going further, let me say big thanks to Steffen Forkmann, the author of FAKE, and Gustavo Guerra, who wrote most of the automation for F# Data that I'll use as an example.
Automate everything with FAKE
Let me start from the end of the list. FAKE is a F# build
automation system that does a lot more than just building. In fact, FAKE can easily call
MSBUILD scripts (and build F# projects just using an existing fsproj
file). I think the
real value is in all the additional tools that it provides.
For example, here is what happens when you run the build script from the F# Data library. It:
-
Parses
RELEASE_NOTES.md
to get the information about the last version number and release notes (that will be used later to build NuGet package) - Generates
AssemblyInfo.fs
with the right version and project information - Builds the project and tests by calling MSBUILD (or xbuild) on
sln
files - Runs the NUnit tests (and stops if there is a failure), but more about testing later...
- While running tests, it also checks that your documentation does not contain errors - if you do not believe, continue reading :-)
- Builds a NuGet package and optionally pushes it to nuget.org
- Automatically builds documentation using F# Formatting tool that is discussed next
- As a bonus, it also pushes the documentation to the gh-pages branch and builds a ZIP with the binaries for easy download.
All this means that it is really easy to maintain a project. When you get a pull request (and point the contributor to the right place to add tests and documentation), you can then update everything with just a single command.
And you have a guarantee that your documentation is up-to-date and correct too, which is done using another F# project that I'll discuss next...
Documenting libraries with F# Formatting
F# Formatting is not your good old regular-expression based syntax highlighter. It calls the F# compiler (which is fully open-source, in case you did not know) and uses the actual compiler to colorize code. Aside from that, it also type-checks the code and extracts tooltip information that you'd see in MonoDevelop or Visual Studio. It is used on this blog too, so here is an example (hover over identifiers with mouse pointer to see tool tips):
1: 2: 3: 4: 5: |
|
For statically typed languages with type inference, this is extremely useful. Just
remember when you were last looking at C# snippet using var
and wondered what
the type of a variable is...
To build a great documentation for a project using F# Formatting, you can use two features. I'll use the Deedle data manipulation library as an example:
-
Write tutorials - these can be standard F# script files that you can run, with special comments written using
(** .. *)
that contain Markdown. F# Formatting turns them into nicely formatted tutorials -
Generate API reference - if you include
///
comments for public functions (written in a simple Markdown style), you can automatically generate API reference from them, for example, like the FakeLib reference.
Does your documentation type-check?
The last thing I mentioned is that the build process checks if your documentation is correct. Obviously, it does not check that your documentation makes sense :-) but it does make sure that code samples your documentation type check. This is done, for example, in the F# Data documentation tests.
What does this mean? When you change your API (add or remove parameters, change type, or rename function or types) without making corresponding changes to your documentation, you'll get a unit test failure!
This is only possible because F# Formatting can call
the compiler to do the actual formatting and checking work - and it does not only
work in fsx
files. The same is done on md
files that contain F# code snippets
(using 4 spaces before the snippet).
Testing with FsUnit and FsCheck
Speaking of unit tests, there are a few more things to be written. I'm not an expert when it comes to testing (the chapter by Phil Trelford in our upcoming F# book is a better source!), but tests are clearly important - especially for open-source projects with multiple contributors that need to collaborate on the code base.
Less painful writing and running
There are three things that make writing tests less painful. First, FsUnit
is a nice DSL for writing tests in a more readable way. Second, the F# ``backtick``
notation lets you use full description as a test name. And third, you can setup your
environment to make tests runnable really quickly from REPL.
Let's look at a sample test for the XML type provider from F# Data:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: |
|
The test is included in an fs
file in a project that is compiled into a dll
that can
be tested with standard NUnit test runners. However, the first 9 lines make the test also
runnable in F# Interactive - you can select the entire source code and hit Alt+Enter
to
load the tests in F# Interactive and run them line-by-line, testing different inputs
interactively. When writing tests, this is much easier then changing your code and re-compiling
tests to run them.
The test itself uses the backtic notation to include the whole test description in its
name SoYouDoNotNeedToDecipherThis
! The FsUnit library that is also used here defines a
simple readable DSL so that you can write your test in the form <value> |> should <property>
.
For example, you can say "Hello" |> should startWith "H"
.
Testing complex logic
Finally, the last great tool that I want to mention in this article is a random testing framework FsCheck. This is particularly useful if you need to test some algorithm or more complex function that has some (mathematical) properties.
For example, I wrote a function binarySearchNearestGreater
that performs binary search
on a sorted array and returns the index of a specified element, or index of an element
that is the nearest greater in the array. The function has a property that the value
at the returned index is equal, or greater than the specified key (or, if the function
does not find any element, it means that all are smaller).
FsCheck can easily verify that the property holds for randomly generated inputs (and it also generates inputs that cover corner cases):
1: 2: 3: 4: 5: 6: 7: |
|
The operation Check.QuickThrowOnFailure
takes a function that specifies the predicate
and automatically generates 100 (or more) random inputs for input
and key
.
The above sample uses NUnit, but FsCheck also comes with xUnit integration that makes the
testing code even simpler (just write a function with the Property
attribute).
Random testing is certainly not useful for all tests, but it is great when you have some property that should hold. This is often the case for algorithms, or when you have a pair of functions for converting "there and back again" (then you can just say that the conversion there and back should return the original thing).
Summary
Building a great open-source library is a difficult thing and I certainly do not claim that I have a recipe for that. But I'm contributing to a few F# libraries and I think I have learned a thing or two from my mistakes.
For me, one of the most difficult things (technically) is keeping libraries up-to-date even when I don't have time for it. The best way to solve this is to automate everything so that you can accept a pull request and run a single command that runs the whole build process, including NuGet release, documentation update and as many sanity checks as possible, both for the code itself and for the documentation.
This article gave a quick overview of the tools that make this amazingly easy with F# - including the awesome FAKE build tool, unit testing tools like FsUnit and FsCheck and documentation tools in F# Formatting that can even be integrated with unit tests to make sure your documentation is correct.
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.Data
--------------------
namespace Microsoft.FSharp.Data
Full name: Great-open-source.comparer
member Compare : x:'T * y:'T -> int
static member Default : Comparer<'T>
Full name: System.Collections.Generic.Comparer<_>
val int : value:'T -> int (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int
--------------------
type int = int32
Full name: Microsoft.FSharp.Core.int
--------------------
type int<'Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
Full name: Great-open-source.hello
Say hello to the specified person
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
Full name: Great-open-source.PersonXml
Full name: FSharp.Data.XmlProvider
<summary>Typed representation of a XML file</summary>
<param name='Sample'>Location of a XML sample file or a string containing a sample XML document</param>
<param name='Global'>If true, the inference unifies all XML elements with the same name</param>
<param name='Culture'>The culture used for parsing numbers and dates.</param>
<param name='SampleList'>If true, the children of the root in the sample document represent individual samples for the inference.</param>
<param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution)</param>
Full name: Great-open-source.newXml
type TestAttribute =
inherit Attribute
new : unit -> TestAttribute
member Description : string with get, set
Full name: NUnit.Framework.TestAttribute
--------------------
TestAttribute() : unit
Full name: Great-open-source.( Jane should have first name of Jane )
Full name: FsUnit.TopLevelOperators.should
Full name: FsUnit.TopLevelOperators.equal
Full name: Great-open-source.( Binary searching for nearest greater value satisfies laws )
static member All : config:Config -> unit
static member All : config:Config * test:Type -> unit
static member Method : config:Config * methodInfo:MethodInfo * ?target:obj -> unit
static member One : config:Config * property:'Testable -> unit
static member One : name:string * config:Config * property:'Testable -> unit
static member Quick : property:'Testable -> unit
static member Quick : name:string * property:'Testable -> unit
static member QuickAll : unit -> unit
static member QuickAll : test:Type -> unit
static member QuickThrowOnFailure : property:'Testable -> unit
...
Full name: FsCheck.Check
member Clone : unit -> obj
member CopyTo : array:Array * index:int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
member GetLength : dimension:int -> int
member GetLongLength : dimension:int -> int64
member GetLowerBound : dimension:int -> int
member GetUpperBound : dimension:int -> int
member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads
member Initialize : unit -> unit
member IsFixedSize : bool
...
Full name: System.Array
Full name: Microsoft.FSharp.Collections.Array.sort
Full name: FSharp.DataFrame.Internal.Array.binarySearchNearestGreater
Returns the index of 'key' or the index of immediately following value.
If the specified key is greater than all keys in the array, None is returned.
module Seq
from FSharp.DataFrame.Internal
This module contains additional functions for working with sequences.
`FSharp.DataFrame.Internals` is opened, it extends the standard `Seq` module.
--------------------
module Seq
from Microsoft.FSharp.Collections
Full name: Microsoft.FSharp.Collections.Seq.forall
Published: Thursday, 31 October 2013, 2:29 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: open source, f#, fake, f# formatting