Building great open-source libraries
The hard part about successful open-source development is not putting the first version of your source code on GitHub. The hard part is what comes next. First of all, there are community aspects - making sure that the project fits well with other work in the area, engaging the community and contributors, planing future directions for the project and so on. Secondly, there is an infrastructural side - making sure that there is a package (on NuGet in the F# world), easy to run and useful tests and also up-to-date documentation and tutorials.
In this article, I want to talk about the infrastructural side, which is easier of the two, but nevertheless, difficult to get right!
On the technical side, I think that every good open-source library needs to have:
- Unit tests - at least for non-trivial parts of code and to prevent regressions
- Random testing - for tricky parts of code, it is useful and helps checking unexpected cases
- NuGet package - or other up-to-date and easy to use release; for F# projects, we might also want to have an easy to download ZIP file for simple interactive scripts
- Documentation - for public API, at least when the API is not super simple
- Tutorials & walkthroughs - showing how to call the API in a larger-scale scenarios
- Automation - when releasing a new version, all of the above should happen with "one click" and documentation with tutorials must be up-to-date and correct.
Ticking all the points is a lot of work, but it is crucial - if you do not have these, your project will be difficult to use, making a new release will take time and documentation with tutorials will become useless. Fortunately for me, the F# community made an amazing progress in this direction, so let's have a look at some of the tools that make this possible...
Before going further, let me say big thanks to Steffen Forkmann, the author of FAKE, and Gustavo Guerra, who wrote most of the automation for F# Data that I'll use as an example.
Automate everything with FAKE
Let me start from the end of the list. FAKE is a F# build
automation system that does a lot more than just building. In fact, FAKE can easily call
MSBUILD scripts (and build F# projects just using an existing
fsproj file). I think the
real value is in all the additional tools that it provides.
For example, here is what happens when you run the build script from the F# Data library. It:
RELEASE_NOTES.mdto get the information about the last version number and release notes (that will be used later to build NuGet package)
AssemblyInfo.fswith the right version and project information
- Builds the project and tests by calling MSBUILD (or xbuild) on
- Runs the NUnit tests (and stops if there is a failure), but more about testing later...
- While running tests, it also checks that your documentation does not contain errors - if you do not believe, continue reading :-)
- Builds a NuGet package and optionally pushes it to nuget.org
- Automatically builds documentation using F# Formatting tool that is discussed next
- As a bonus, it also pushes the documentation to the gh-pages branch and builds a ZIP with the binaries for easy download.
All this means that it is really easy to maintain a project. When you get a pull request (and point the contributor to the right place to add tests and documentation), you can then update everything with just a single command.
And you have a guarantee that your documentation is up-to-date and correct too, which is done using another F# project that I'll discuss next...
Documenting libraries with F# Formatting
F# Formatting is not your good old regular-expression based syntax highlighter. It calls the F# compiler (which is fully open-source, in case you did not know) and uses the actual compiler to colorize code. Aside from that, it also type-checks the code and extracts tooltip information that you'd see in MonoDevelop or Visual Studio. It is used on this blog too, so here is an example (hover over identifiers with mouse pointer to see tool tips):
1: 2: 3: 4: 5:
For statically typed languages with type inference, this is extremely useful. Just
remember when you were last looking at C# snippet using
var and wondered what
the type of a variable is...
To build a great documentation for a project using F# Formatting, you can use two features. I'll use the Deedle data manipulation library as an example:
Generate API reference - if you include
///comments for public functions (written in a simple Markdown style), you can automatically generate API reference from them, for example, like the FakeLib reference.
Does your documentation type-check?
The last thing I mentioned is that the build process checks if your documentation is correct. Obviously, it does not check that your documentation makes sense :-) but it does make sure that code samples your documentation type check. This is done, for example, in the F# Data documentation tests.
What does this mean? When you change your API (add or remove parameters, change type, or rename function or types) without making corresponding changes to your documentation, you'll get a unit test failure!
This is only possible because F# Formatting can call
the compiler to do the actual formatting and checking work - and it does not only
fsx files. The same is done on
md files that contain F# code snippets
(using 4 spaces before the snippet).
Testing with FsUnit and FsCheck
Speaking of unit tests, there are a few more things to be written. I'm not an expert when it comes to testing (the chapter by Phil Trelford in our upcoming F# book is a better source!), but tests are clearly important - especially for open-source projects with multiple contributors that need to collaborate on the code base.
Less painful writing and running
There are three things that make writing tests less painful. First, FsUnit
is a nice DSL for writing tests in a more readable way. Second, the F#
notation lets you use full description as a test name. And third, you can setup your
environment to make tests runnable really quickly from REPL.
Let's look at a sample test for the XML type provider from F# Data:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
The test is included in an
fs file in a project that is compiled into a
dll that can
be tested with standard NUnit test runners. However, the first 9 lines make the test also
runnable in F# Interactive - you can select the entire source code and hit
load the tests in F# Interactive and run them line-by-line, testing different inputs
interactively. When writing tests, this is much easier then changing your code and re-compiling
tests to run them.
The test itself uses the backtic notation to include the whole test description in its
SoYouDoNotNeedToDecipherThis! The FsUnit library that is also used here defines a
simple readable DSL so that you can write your test in the form
<value> |> should <property>.
For example, you can say
"Hello" |> should startWith "H".
Testing complex logic
Finally, the last great tool that I want to mention in this article is a random testing framework FsCheck. This is particularly useful if you need to test some algorithm or more complex function that has some (mathematical) properties.
For example, I wrote a function
binarySearchNearestGreater that performs binary search
on a sorted array and returns the index of a specified element, or index of an element
that is the nearest greater in the array. The function has a property that the value
at the returned index is equal, or greater than the specified key (or, if the function
does not find any element, it means that all are smaller).
FsCheck can easily verify that the property holds for randomly generated inputs (and it also generates inputs that cover corner cases):
1: 2: 3: 4: 5: 6: 7:
Check.QuickThrowOnFailure takes a function that specifies the predicate
and automatically generates 100 (or more) random inputs for
The above sample uses NUnit, but FsCheck also comes with xUnit integration that makes the
testing code even simpler (just write a function with the
Random testing is certainly not useful for all tests, but it is great when you have some property that should hold. This is often the case for algorithms, or when you have a pair of functions for converting "there and back again" (then you can just say that the conversion there and back should return the original thing).
Building a great open-source library is a difficult thing and I certainly do not claim that I have a recipe for that. But I'm contributing to a few F# libraries and I think I have learned a thing or two from my mistakes.
For me, one of the most difficult things (technically) is keeping libraries up-to-date even when I don't have time for it. The best way to solve this is to automate everything so that you can accept a pull request and run a single command that runs the whole build process, including NuGet release, documentation update and as many sanity checks as possible, both for the code itself and for the documentation.
This article gave a quick overview of the tools that make this amazingly easy with F# - including the awesome FAKE build tool, unit testing tools like FsUnit and FsCheck and documentation tools in F# Formatting that can even be integrated with unit tests to make sure your documentation is correct.
Full name: Great-open-source.comparer
member Compare : x:'T * y:'T -> int
static member Default : Comparer<'T>
Full name: System.Collections.Generic.Comparer<_>
val int : value:'T -> int (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int
type int = int32
Full name: Microsoft.FSharp.Core.int
type int<'Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
Full name: Great-open-source.hello
Say hello to the specified person
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
Full name: Great-open-source.PersonXml
Full name: FSharp.Data.XmlProvider
<summary>Typed representation of a XML file</summary>
<param name='Sample'>Location of a XML sample file or a string containing a sample XML document</param>
<param name='Global'>If true, the inference unifies all XML elements with the same name</param>
<param name='Culture'>The culture used for parsing numbers and dates.</param>
<param name='SampleList'>If true, the children of the root in the sample document represent individual samples for the inference.</param>
<param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution)</param>
Full name: Great-open-source.newXml
type TestAttribute =
new : unit -> TestAttribute
member Description : string with get, set
Full name: NUnit.Framework.TestAttribute
TestAttribute() : unit
Full name: Great-open-source.( Jane should have first name of Jane )
Full name: FsUnit.TopLevelOperators.should
Full name: FsUnit.TopLevelOperators.equal
Full name: Great-open-source.( Binary searching for nearest greater value satisfies laws )
static member All : config:Config -> unit
static member All : config:Config * test:Type -> unit
static member Method : config:Config * methodInfo:MethodInfo * ?target:obj -> unit
static member One : config:Config * property:'Testable -> unit
static member One : name:string * config:Config * property:'Testable -> unit
static member Quick : property:'Testable -> unit
static member Quick : name:string * property:'Testable -> unit
static member QuickAll : unit -> unit
static member QuickAll : test:Type -> unit
static member QuickThrowOnFailure : property:'Testable -> unit
Full name: FsCheck.Check
member Clone : unit -> obj
member CopyTo : array:Array * index:int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
member GetLength : dimension:int -> int
member GetLongLength : dimension:int -> int64
member GetLowerBound : dimension:int -> int
member GetUpperBound : dimension:int -> int
member GetValue : [<ParamArray>] indices:int -> obj + 7 overloads
member Initialize : unit -> unit
member IsFixedSize : bool
Full name: System.Array
Full name: Microsoft.FSharp.Collections.Array.sort
Full name: FSharp.DataFrame.Internal.Array.binarySearchNearestGreater
Returns the index of 'key' or the index of immediately following value.
If the specified key is greater than all keys in the array, None is returned.
This module contains additional functions for working with sequences.
`FSharp.DataFrame.Internals` is opened, it extends the standard `Seq` module.
Full name: Microsoft.FSharp.Collections.Seq.forall