Data exploration through dot-driven development
Tomas Petricek
In proceedings of ECOOP 2017
Data literacy is becoming increasingly important in the modern world. While spreadsheets make simple data analytics accessible to a large number of people, creating transparent scripts that can be checked, modified, reproduced and formally analyzed requires expert programming skills. In this paper, we describe the design of a data exploration language that makes the task more accessible by embedding advanced programming concepts into a simple core language.
The core language uses type providers, but we employ them in a novel way -- rather than providing types with members for accessing data, we provide types with members that allow the user to also compose rich and correct queries using just member access (``dot''). This way, we recreate functionality that usually requires complex type systems (row polymorphism, type state and dependent typing) in an extremely simple object-based language.
We formalize our approach using an object-based calculus and prove that programs constructed using the provided types represent valid data transformations. We discuss a case study developed using the language, together with additional editor tooling that bridges some of the gaps between programming and spreadsheets. We believe that this work provides a pathway towards democratizing data science -- our use of type providers significantly reduce the complexity of languages that one needs to understand in order to write scripts for exploring data.
Draft and more information
- Download pre-print of the paper (PDF)
- Watch Fellow Short Talk from Alan Turing Institute
Watch the talk
My talk about the paper has been pre-recorded at the Alan Turing Institute and so you can watch it below. If you are looking for a more general introduction to The Gamma project, then consider watching the Fellow Short Talk that is available for a more general paper about the project.
Bibtex
If you want to cite the paper, you can use the following BibTeX information.
1: 2: 3: 4: 5: 6: 7: 8: |
|
If you have any comments, suggestions or related ideas, I'll be happy to hear from you! Send me an email at tomas@tomasp.net or get in touch via Twitter at @tomaspetricek.
Published: Wednesday, 12 April 2017, 12:00 AM
Author: Tomas Petricek
Typos: Send me a pull request!