TP

CLinq - LINQ support for the C++/CLI language

Introduction

The LINQ project which will be part of the next version of Visual Studio (codename "Orcas") is set of extensions that make it possible to query data sources directly from the C# or VB.NET languages. LINQ extends .NET Framework with classes to represent queries and both C# and VB.NET language with features that make it possible to write these queries easily. It also includes libraries for using queries with the most common types of data sources like SQL database, DataSets and XML files. This article requires some basic knowledge of LINQ and C# 3.0, so I recommend looking at the LINQ Overview available from the official project web site before reading the article.

LINQ includes extensions for the C# and VB.NET, but there are no plans for supporting LINQ in C++/CLI. The goal of CLinq project is to allow using part of LINQ functionality from C++/CLI. Thanks to very powerful operator overloading mechanism in C++/CLI it is possible to enable using LINQ to SQL for accessing SQL databases in C++/CLI as well as some other LINQ uses.

I will first demonstrate how the same database query looks in C# 3.0 and C++/CLI and then we will look at CLinq in more detail. The following query (written in C# 3.0) returns uses the Northwind database and returns name of contact and company for all customers living in London:

// create connection to database
NorthwindData db = new NorthwindData(".. connection string ..");

// declare database query
var q = 
  from cvar in db.Customers
  where cvar.City == "London"
  select cvar.ContactName + ", " + cvar.CompanyName;

// execute query and output results
foreach(string s in q)
  Console.WriteLine(s);

Now, let's look at the same query written in C++/CLI using CLinq. It is a bit more complex, but this is the price for implementing it as a library instead of modifying the language:

// create connection to database
NorthwindData db(".. connection string ..");

// declare database query
Expr<Customers^> cvar = Var<Customers^>("c");
CQuery<String^>^ q = db.QCustomers
  ->Where(clq::fun(cvar, cvar.City == "London"))
  ->Select(clq::fun(cvar, 
      cvar.ContactName + Expr<String^>(", ") + cvar.CompanyName));

// execute query and output results
for each(String^ s in q->Query)
  Console::WriteLine(s);

LINQ & C++/CLI overview

In this section I'll very shortly recapitulate a few LINQ and C++/CLI features that are important for understanding how CLinq works. If you're familiar with LINQ and C++/CLI you can safely skip this section.

Important LINQ features

Probably the most important extension in C# that makes LINQ possible are lambda expressions. Lambda expressions are similar to anonymous delegates (but the syntax is even simpler). Lambda expressions can be used for declaring functions inline and you can pass them as a parameter to methods. There is however one important difference to anonymous delegates - lambda expressions can be either compiled as executable code (like anonymous delegate) or as a data structure that represents the lambda expression source code. The structure is called expression tree. Expression tree can be also compiled at runtime, so you can convert this representation to executable code.

What LINQ to SQL does is that it takes the expression tree representing the query that contains lambda expressions and converts it to the SQL query, which is sent to the SQL Server. LINQ to SQL also contain tool called sqlmetal.exe, which generates objects that represent the database structure, so when you're writing the queries you can work with these type-safe objects instead of having to specify database tables or columns by name.

Important C++/CLI features

Now I'd like to mention a few from the rich set of C++/CLI features. LINQ itself is available for .NET, so we'll use the ability to work with managed classes a lot in the whole project. We'll also use the ability to work with both C++ templates and .NET generics. CLinq benefits from the fact that .NET generics can be compiled and exported from an assembly, while C++ templates are interesting thanks to their template specialization support, which means that if you have SomeClass<T> template, you can write special version for specified type parameter (for example SomeClass<int>) and add modify behavior of this class, including the possibility to add methods, etc.

Basic CLinq features

In the previous example we used Expr<Customers^> and Var<Customers^> classes. This two classes are typed wrappers and are declared using C++/CLI templates. We use templates instead of generics, because templates allow us to use template specialization - this means that there are basic Expr<> and Var<> classes and these can be specialized, so for example Expr<Customers^> can contain some additional properties. Using these additional properties you can express operations with Customers class. These template specializations can be generated using the clinqgen.exe tool, which will be described later. CLinq also supports a bit more complex syntax that you can use for manipulating with classes that don't have template specializations.

Before we start, I'll explain how the CLinq library organized is. It consists from two parts. First part is the EeekSoft.CLinq.dll assembly which contains core CLinq classes. You'll need to reference this assembly from your project either using project settings or with the #using statement. Second part is the clinq.h header file (and 2 other headers) which contain C++/CLI templates and you'll need to include this header in every CLinq project. The header files are used because CLinq relies on C++/CLI templates. The classes from the core library can be used if you want to share CLinq objects across more .NET projects.

I already mentioned the Expr<> class. This class is written using templates and it is included (together with Var<>) from the clinq.h file. These two are inherited from classes in the CLinq assembly, namely Expression<> (there are some other classes in the assembly, but this one is the most important). This class can be shared in multiple projects and it is written using .NET generics. It is recommended to use this class as a type for parameters of any public methods from your project that can be called from other .NET assembly.

Expr and Var classes

Let's look at some sample code. As you can see from the previous paragraph the Expr<> and Var<> classes are key structures of the CLinq project, so we'll use them in the following example. The example works with two specialized versions of these classes - one for the int and second for the String^ type:

// Declare variable of type int called 'x'
Expr<int> x = Var<int>("x");
// Declare expression of type String initialized with literal
Expr<String^> str("Hello world!");

// Expression representing addition of the x variable and 
// result of the method call to 'IndexOf' method.
Expr<int> expr = str.IndexOf("w") + x;

If you look at the code you could think that the IndexOf method and other operations are executed after the code is invoked, but this isn't true! This is important fact to note - the code only builds internal structures that represent the expression, but the expression is not executed! This provides you with a similar behavior to the C# 3.0 lambda expressions which can be also used for building representation of the written expression instead of building executable code. You can also convert the expression represented by the Expr<> class to the structures used by LINQ as demonstrated in the following example:

// Convert to LINQ expression
System::Expressions::Expression^ linqExpr = expr.ToLinq();
// Print string representation of LINQ expression
Console::WriteLine(linqExpr);

The result printed to the console window will be:

Add("Hello world!".IndexOf("w"), x)

Lambda expressions

Let's now look at the syntax for writing lambda expressions in CLinq. Lambda expressions are represented by generic Lambda<> class. The type parameter of this class should be one of the Func delegates declared by LINQ in the System::Query namespace. For declaring lambda expressions you can use the fun function in the EeekSoft::CLinq::clq namespace. Assuming that you included using namespace EeekSoft::CLinq; directive (which is recommended), the source code will look like this:

// Declare parameter (variable) and method body (expression)
Expr<int> var = Var<int>("x");
Expr<int> expr = Expr<String^>("Hello world!").IndexOf("w") + var;

// First argument for the clq::fun function is lambda expression
// parameter, the last argument is the lambda expression body
Lambda<Func<int, int>^>^ lambda = clq::fun(var, expr);

// Print string representation of lambda..
Console::WriteLine(lambda->ToLinq());

// Compile & execute lambda
Func<int, int>^ compiled = lambda->Compile();
Console::WriteLine(compiled(100));

After executing this example you should see the following output in console window (first line represents the lambda expression and the second line is the result of lambda expression invocation):

x => Add("Hello world!".IndexOf("w"), x)
106

Similarly to the LINQ you can compile the CLinq expression at runtime (actually CLinq internally uses LINQ). This was done in the previous example using the Compile method. The returned type is one of the Func<> delegates and this delegate can be directly invoked.

As in LINQ you can use only up to 4 parameters in lambda expressions (due to the limitations of Func<> delegates declared in LINQ assemblies). Accordingly to this limitation, the clq::fun function has the same number of overloads. Also note that you don't have to specify type arguments to this function in most of the situations, because C++/CLI type inference algorithm can infer the types for you. Let's look at one more example that demonstrates declaring lambda expression with more than one parameter:

Expr<int> x = Var<int>("x");
Expr<int> y = Var<int>("y");
Lambda<Func<int, int, int>^>^ lambda2 = 
  clq::fun(x, y, 2 * (x + y) );
Console::WriteLine(lambda2->Compile()(12, 9));

In this example the body of the lambda expression isn't declared earlier as another variable, but composed directly in the clq::fun function. We also used overloaded operators (namely * and +) in the body of lambda expression. If you run this code, the result will be (12 + 9) * 2 which is 42.

Supported types and operators

In the previous example I used two overloaded operators. These operators are declared in the Expr<int> template specialization (so you can use them when working with an expression representing integer). CLinq includes template specializations with overloaded operators for the following standard types:

Type Supported operators & methods
bool Comparison: !=, ==; Logical: &&, ||, !
int Comparison: !=, ==, <, >, <=, >=; Math: +, *, /, -; Modulo: %; Shifts: <<, >>
Other integral types Comparison: !=, ==, <, >, <=, >=; Math: +, *, /, -; Modulo: %
float, double, Decimal Comparison: !=, ==, <, >, <=, >=; Math: +, *, /, -
wchar_t Comparison: !=, ==
String^ Comparison: !=, ==; Concatenation: +; Standard string methods (IndexOf, Substring, etc..)

For complete list of supported types with list of methods and operators see generated documentation [^]. The following example demonstrates using overloaded operators with expressions representing double and float (mixing different type is another interesting problem, so that's why we use two different floating point types here):

// Declare 'float' variable and 'double' literal
Expr<float> fv = Var<float>("f");
Expr<double> fc(1.2345678);

// Function taking 'float' and returning 'float'
Lambda<Func<float, float>^>^ foo4 = clq::fun(fv, 
  clq::conv<float>(Expr<Math^>::Sin(fv * 3.14) + fc)  );

You can see that we're using another function from the clq namespace - the clq::conv. This function is used for converting types when implicit conversion is not available. In the sample we're using a Sin function which accepts Expr<double> as a parameter. The variable of type float is converted to the expression of type double implicitly, but when conversion in the opposite direction is not possible, so we have to use clq::conv function. CLinq allows implicit conversion only from smaller floating point data type to larger (float to double) or from smaller integral type to larger (for example short to int). This example also uses the Expr<Math^> class, which is another interesting template specialization. This specialization represents the .NET System::Math class and contains most of the methods from this class.

Working with classes

I already demonstrated how you can work with basic data types, like int or float, but I mentioned only a few about working with another classes. There are two possible approaches - you can either use template specialization (if it exists) which includes properties and methods that represent members of underlying class. These specializations exist for some standard types (like String^) and can be generated for LINQ to SQL database mappings. If the template specialization isn't available you have to use common methods that can be used for invoking method (or property) by its name.

Typed wrappers

Using class if the corresponding template specialization exists is fairly simple. The following example declares expression working with the String^ type:

// 'String^' variable
Expr<String^> name = Var<String^>("name");

// Expression that uses 'IndexOf' and 'Substring' methods
Expr<String^> sh("Hello Tomas");
Expr<int> n = sh.IndexOf('T');
Lambda<Func<String^, String^>^>^ foo = 
  clq::fun(name, sh.Substring(0, n) + name);

// Print LINQ representation and execute
Console::WriteLine(foo->ToLinq());
Console::WriteLine(foo->Compile()("world!"));

In this example we used two methods that are declared in the Expr<String^> class. These methods were IndexOf and Substring and they represent calls to according methods of the String^ type. If you look at the program output, you can see that it contains calls to these two methods (there is also a call to the Concat method which was generated by CLinq when we used + operator for string concatenation):

name => Concat(new [] {"Hello Tomas".
  Substring(0, "Hello Tomas".IndexOf(T)), name})
Hello world!

Indirect member access

To demonstrate the second approach, we'll first define new class with sample property, method and static method (you can also invoke static properties):

// Sample class that we'll work with
ref class DemoClass
{
  int _number;
public:
  DemoClass(int n) { 
    _number = n; 
  }  
  // Property
  property int Number {
    int get() { return _number;  }
  }
  // Standard method
  int AddNumber(int n) {
    return _number = _number + n;
  }
  // Static method
  static int Square(int number) {
    return number * number;
  }
};

Now let's get to the more interesting part of the example. We will first declare variable of type DemoClass^ and later we'll use the Prop method to read property by its name, Invoke to call member method and InvokeStatic to invoke static method of this class (the AddNumber method could be a bit tricky, because it increments number stored in the class as a side-effect, which means that the value of expression depends on the order in which members of the expression are evaluated):

// Construct the lambda expression
Expr<DemoClass^> var = Var<DemoClass^>("var");
Lambda<Func<DemoClass^,int>^>^ foo = clq::fun(var, 
  var.Prop<int>("Number") + 
  var.Invoke<int>("AddNumber", Expr<int>(6)) + 
  Expr<DemoClass^>::InvokeStatic<int>("Square", Expr<int>(100) ) );

// Compile the lambda and pass instance of 'DemoClass' as a parameter
DemoClass^ dcs = gcnew DemoClass(15);
int ret = foo->Compile()(dcs);

Console::WriteLine("{0}\n{1}", foo->ToLinq(), ret);

And the output of this example will be:

var => Add(Add(var.Number, var.AddNumber(6)), Square(100))
10036

I included the output because I wanted to point out one interesting fact. You can see that there is no difference in the output if you use generated template specialization or invoke by name. This is because if you're using invoke by name, the method (or property) that should be invoked is found using reflection before the LINQ expression tree is generated. This also means that if you execute the compiled lambda expression, it will call the method (or property) directly and not by its name.

Calling constructors in projection

So far we looked at calling methods and reading property values, but there is one more interesting problem that I didn't write about. Sometimes you may want to create an instance of a class and return it from lambda expression. CLinq doesn't support anything like C# 3.0 anonymous methods, but you can invoke class constructor and pass parameters to it using clq::newobj function. The following sample assumes that you have a class called DemoCtor with constructor taking String^ and int as parameters:

// Arguments of the lambda expression
Expr<String^> svar = Var<String^>("s");
Expr<int> nvar = Var<int>("n");

DemoCtor^ d = clq::fun(svar, nvar, clq::newobj<DemoCtor^>(svar, nvar) )
  ->Compile()("Hello world!", 42);

After executing this code, he d variable will contain instance of DemoCtor class created using constructor that I wrote about earlier. You should be very careful when using the newobj method, because there is no compile-time checking so if the required constructor doesn't exist or has incompatible types, the code will end with run-time error.

Using LINQ

You're now familiar with all CLinq features that you need to start working with data using LINQ in C++/CLI! The key for working with data is the CQuery class which serves as a CLinq wrapper for the IQueryable interface, which represents query in LINQ. This class has several methods for constructing queries, including Where, Select, Average and other. You can construct this class if you already have an instance of class implementing the IQueryable interface, but for working with database, you can use a tool to generate code which makes it simpler. CQuery class also has a property called Query that returns the underlying IQueryable interface - we'll need this property later for accessing the results of the query.

Working with SQL Database

LINQ to SQL: Introduction

We will use two tools to generate CLinq header file with classes that will represent the database structure. First tool is shipped as part of LINQ and is called sqlmetal. This tool can generate C# or VB.NET code, but it can also be used to generate XML description of the database structure. We will use the third option - the following example demonstrates how to generate XML description (northwind.xml) for database northwind at SQL server running at localhost:

sqlmetal /server:localhost /database:northwind /xml:northwind.xml

Once we have the XML file, we can use the clinqgen tool which is part of CLinq. This tool generates C++/CLI header file with classes that represent database tables, according Expr<> template specializations and also the class that will represent the entire database. You can customize the name and namespace of this class. If you want to automate this task, you can include the XML file generated by sqlmetal in your project and set its custom build tool to the following command (Hacker note: You can also use pipe (|) to get these two tools working together):

clinqgen /namespace:EeekSoft.CLinq.Demo 
  /class:NorthwindData /out:Northwind.h $(InputPath)

Now you'll need to include the generated header file and we can start working with database. We'll first create instance of the generated NorthwindData class, which represents the database (note, that the example uses C++/CLI stack semantics, but you can also use gcnew if you want instead). Once we have instance of this class, we can use it's properties that represent data tables. The properties with the Q prefix return the CQuery class, so we'll use these properties instead of properties without this prefix, which are designed for using from C# 3.0 or VB.NET. The following example demonstrates some basic CQuery methods:

// Create database context
NorthwindData db(".. connection string ..");

// (1) Count employees
Console::WriteLine("Number of employees: {0}",
  db.QEmployees->Count());

// (2) Calculate average 'UnitPrice' value
Expr<Products^> p = Var<Products^>("p");
Nullable<Decimal> avgPrice = 
  db.QProducts->Average( clq::fun(p, p.UnitPrice) );
Console::WriteLine("Average unit price: {0}", avgPrice);

// (3) Get first employee whose 'ReportsTo' column is NULL
Expr<Employees^> e = Var<Employees^>("e");
Employees^ boss = db.QEmployees->
  Where( clq::fun(e, e.ReportsTo == nullptr) )->First();
Console::WriteLine("The boss: {0} {1}",
  boss->FirstName, boss->LastName);

In the first example we simply called the Count method which returns number of rows in the table. In second example we use Average method which requires one argument, which is a lambda expression that returns some numeric type for every row in the table. Since UnitPrice column can contain NULL values we're working with Nullable<Decimal> type which can contain either real value or NULL which is represented using nullptr in C++/CLI. Third example uses Where method to filter only rows matching specified predicate (lambda expression). The result of this call is also CQuery class, so we can easily concatenate multiple operations. In this example we append call to the First method, which returns first row from the result set.

LINQ to SQL: Filtering & projection

Let's look at the more interesting sample that cover filtering (Where method) and projection (Select method). The result of the query will be collection containing instances of custom class called CustomerInfo, so let's first look at this class:

ref class CustomerInfo
{
  String^ _id;
  String^ _name;

public:
  CustomerInfo([PropMap("ID")] String^ id, 
      [PropMap("Name")] String^ name) { 
    _id=id; _name=name; 
  }
  CustomerInfo() { }

  property String^ ID { 
    String^ get() { return _id; }
    void set(String^ value) { _id = value; }
  }

  property String^ Name {
    String^ get() { return _name; }
    void set(String^ value) { _name = value; }
  }
};

The class has two properties (ID and Name), one parameter-less constructor and one constructor that needs further explanation. The constructor takes two parameters, which are used to initialize both two fields of the class. There is also an attribute PropMap attached to every parameter, which describes how does the constructor initialize the properties of the class (for example attribute [PropMap("ID")] attached to the id parameter means, that the value of ID property will be set to value of the id parameter in the constructor.

Why is this information important? First, it will not be used in the following query, but you could write a query that constructs collection of CustomerInfo objects and later filters this collection using the Where method. The whole query will be passed to LINQ for conversion to SQL and if you use the ID property for the filtering, LINQ needs to know what the value that was assigned to this property earlier is. For this reason, CLinq has PropMap attribute which maps property values to parameters passed to the constructor earlier. In C# 3.0 the behavior is a bit different, because you can use anonymous types and you don't need to pass values directly to the constructor.

// DB context & variable.. 
NorthwindData db(".. connection string ..");
Expr<Customers^> cvar = Var<Customers^>("c");

// Query: select some information about customers living
//   in country whose name starts with the letter "U"
CQuery<CustomerInfo^>^ q = db.QCustomers
  ->Where(clq::fun(cvar, cvar.Country.IndexOf("U") == 0))
  ->Select(clq::fun(cvar, clq::newobj<CustomerInfo^>(
      cvar.CustomerID, cvar.ContactName + 
      Expr<String^>(" from ") + cvar.Country)));

// Print SQL command sent to SQL server
Console::WriteLine("\nQuery:\n{0}\n\nResults:", 
  q->Query->ToString());

// Print returned rows
for each(CustomerInfo^ c in q->Query)
  Console::WriteLine(" * {0},  {1}", c->ID, c->Name);

This code is quite similar to the code that you usually write when working with LINQ in C# 3.0. In this sample we first create the database context and declare variable that will be used in the query. The query itself takes QCustomers property (representing the Customers table in database), than filters (Where method) customers from country starting with the letter "U" and finally, it performs projection (Select method) where it selects only information that we're interested in and creates the CustomerInfo object.

The sample also prints the SQL command that will be generated from the query - LINQ returns the SQL command if you call the ToString method on the IQueryable representing the query. As I mentioned earlier, the underlying IQueryable of the CQuery class can be accessed using the Query property, so the code q->Query->ToString() returns the SQL command. The last thing that the code does is that it executes the query and prints information about all returned customers. The query is executed automatically when you start enumerating over the collection, which is done in the for each statement.

LINQ to SQL: Joins & tuples

For the last example, I wrote very a more complex query. It first performs GroupJoin operation on customers and orders, which means that it returns collection of tuples containing the customer and all her orders. After this join, it performs Where filtering and returns only customers who have at least one order that will be shipped to USA (the customers are still kept together with their orders). The last operation done by the query is projection where it generates string with the name of company and number of orders associated with it.

This query also demonstrates a few more interesting things that we didn't need earlier. The example starts with two typedefs to make the code more readable. First just defines shortcut for the collection of orders, but the second uses the Tuple class, which is part of CLinq and I didn't talk about it so far. Tuple is very simple generic class with two type parameters which contains two properties (called First and Second) that have the type determined by the type parameters. You can use this class if you want to return two different values from projection or join without declaring your own class.

The query returns the Tuple type from the projection and later uses Where operation to filter the customers. This reveals one advantage of using the predefined Tuple class - the co variable whose type is the expression representing the tuple (Expr<Tuple<>^>) is passed as a parameter to lambda expression and in the lambda expression we can directly use its properties (First and Second). Because we're manipulating with expressions, we're not working with the Tuple class directly, but we're working with template specialization of the Expr class, in which the Expr<Tuple<>^> is expanded to contain these two properties.

I'll comment other interesting features used in this example later, so let's look at the query now:

// First declare type for storing Customer and her Orders
typedef IEnumerable<Orders^> OrdersCollection;
typedef Tuple<Customers^, OrdersCollection^> CustomerOrders;

// Connect to DB and declare variables
NorthwindData db(".. connection string ..");
Expr<Customers^> c = Var<Customers^>("c");
Expr<Orders^> o = Var<Orders^>("o");
Expr<OrdersCollection^> orders
  = Var<OrdersCollection^>("orders");
Expr<CustomerOrders^> co = Var<CustomerOrders^>("co");

// The Query
CQuery<String^>^ q = db.QCustomers
  // Group customers and their orders and 
  // produce collection of 'CustomerOrders'
  ->GroupJoin(db.QOrders,
    clq::fun(c, c.CustomerID),
    clq::fun(o, o.CustomerID),
    clq::fun<Customers^, OrdersCollection^, CustomerOrders^>
      ( c, orders, clq::newobj<CustomerOrders^>(c, orders) ))
  // Filter only customers with order shipped to USA 
  // Note: 'Second' is the collection with orders
  ->Where( clq::fun(co, co.Second.Where( 
      clq::fun(o, o.ShipCountry == "USA" )).Count() > 0) )
  // Projection - string concatenation
  ->Select( clq::fun(co, co.First.CompanyName + Expr<String^>(", #orders = ") + 
      Expr<Convert^>::ToString(co.Second.Count()) ) );

Let's focus on the Where clause. The lambda expression accepts expression of type Tuple, which I explained earlier, as a parameter and it accesses its second value (co.Second). Type of this parameter is expression representing collection (Expr<IEnumerable<>^>) - this is another specialization of the Expr<> class and you can discover using the InteliSense that this class has a lot of methods for working with collections! These methods correspond to the methods available in the CQuery class, but are designed for working with expressions representing queries, instead of working with queries directly. In this example we use the Where method, which returns expression representing query again and also the Count method.

Second class that wasn't mentioned earlier is Expr<Convert>, which is just another template specialization similar to the Expr<Math> that contains several methods for type conversions. In this example we use the ToString method for converting number of orders to string.

Project summary

Currently, the project is in very early phase, which means it needs more testing and also review from other people. If you find any bugs or if you think that CLinq is missing some important LINQ functionality, let me know. The project currently uses May 2006 CTP version of LINQ, but it will be updated to support Visual Studio "Orcas", once more stable beta versions will be available. The project is available at CodePlex [^] so you can download the latest version of the source code and binaries from the project site. Because I'm not C++/CLI expert I'm very interested in your comments and suggestions and if you're willing to participate in the project, let me know!

Version & updates

Published: Friday, 2 March 2007, 5:11 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: .net, academic, c#