Sunday, 30 October 2011

LINQ in C++


What is LINQ?


LINQ stands for Language Integrated Query - a .NET component which comes with .NET 3.5 and above. It provides facilities for querying data, in a very elegant and programmer-friendly manner. For example, if we have a list of persons, and we want to filter all teenagers and then sort them by their name (alphabetically), then without LINQ we would do this task in the following way probably:
var results = new List<Person>();
foreach(Person p in persons)
{
    if ( p.Age >= 13 && p.Age <= 19 ) //by definition of teenager!
        results.Add(p);
}
results.Sort(new NameComparer());
where NameComparer is a class defined as:
public class NameComparer  : IComparer<Person>
{
    public int Compare(Person x, Person y)
    {
        return String.Compare(x.Name, y.Name);
    }
 }
That is too much of coding which is one of the drawbacks of imperative programming style; the programmer has to tell how the task should be done, rather than what task should be done. Well, I admit that an experienced .NET programmer might make it a bit shorter. Lets give it try: instead of a full-fledged class NameComparer, if we could use lambda to compare the items, then the entire task would reduce to this:
var results = new List<Person>();
foreach(Person p in persons)
{
    if ( p.Age >= 13 && p.Age <= 19 )
        results.Add(p);
}
results.Sort((x,y) => String.Compare(x.Name, y.Name));
That is a lot of improvement, but still not as elegant as the one which uses LINQ. Or can it be even shortened? I've no idea. See how LINQ version looks like:
 var results = persons.Where(p=> p.Age >= 13 && p.Age <= 19).OrderBy(p=> p.Age); 
Cool, isn't it?

Working towards C++ syntax


Now, the question is, can we write this query in C++ in a similar way? First and foremost, although the new Standard C++11 have introduced lambda into the language, it doesn't have any feature like Extension Methods without which LINQ in C# cannot be even imagined. Without extension methods, the above query would look like this:
var results = Enumerable.OrderBy(Enumerable.Where(persons, p=> p.Age >= 13 && p.Age <= 19), p=> p.Age); 
which looks pretty much ugly. Extension methods and lambda expressions are heart of LINQ. So as noted earlier, while C++11 have lamda, it doesn't have extension method, then how are we going to implement LINQ in C++? If we implement where, orderby, select as free functions, and we don't do any workaround for extension methods, then in C++, we can write query of this form only:
auto tmp1 = where(entities, lambda1); //lambda1 is some lambda expression
auto tmp2 = select(tmp1, lambda2); //lambda2 is some lambda expression
auto results = orderby(tmp2, lambda3); //lambda3 is some lambda expression
Not so elegant. In fact it is a shame, even if we make it short as,
auto results = orderby(select(where(entities, lambda1), lambda2), lambda3);
Little bit better, but still not elegant. We need to look for a way so that we can write the query as something which should look like this, or close to it:
auto results = entities.where(lambda1).select(lambda2).orderby(lambda3);
Those who have worked in Linux are probably familiar with pipe. Pipe allows one to combine multiple commands, in such a way that output of one command feeds as input to the next command, and so on. For example,
[snawaz@localhost]$ cat file.txt | wc 
      2     13     102
In this example, pipe combines two commands cat and wc, separated by | which is called pipe-character. The command cat displays the content of the input file, and wc prints the number of line, word and character in the input. Since cat has been combined with wc, the output of  cat feeds as input to wc, and therefore wc prints 2 13 102 as the number of line, word and character respectively, on the console which is final output of the pipeline. If we write this,
[snawaz@localhost]$ cat file.txt | wc | mail -s "Lines, Words, Character" snawaz@example.com
Now, cat outputs the content of file.txt which feeds as input to wc which outputs the count of line, words, and characters, and which in turn feeds as input to mail command, which simply sends a mail to snawaz@example.com, the subject of mail is Lines, Words, Character and the content of the mail is 2 13 102.

Coming back to LINQ, if we see carefully, there is a similarity between a typical linq query and linux pipelining; in both, the output of one command/query feeds as input to the next. That means, if we overload the operator | in C++, then we could write,
auto results = entities | where(lambda1) | select(lambda2) | orderby(lambda3);
That looks quite interesting syntax. Now all we need to implement where, select and orderby in such a way that we could make it work without changing the syntax at all.

Implementation and Examples


So here you go. I've implemented linq in C++, the following code snippet shows some example of its usage.
int main() 
{
     using namespace linq;
 
     //create a vector of persons and print them
     std::vector<person> persons = { 
         {1, "Aleeza"}, {3, "Nayab"}, {13, "Ashraf"}, {16, "Adil"}, 
         {20, "Ahsan"}, {21, "Zartab"}, {17, "Nadim"}, {19, "Nafis"}, 
         {13, "Neha"}, {15, "Zeeshan" }, {11, "Farah"}, {13, "Fauzia"},
         {23, "Nilufer"}, {24, "Talat"}, {7, "Tufail"}, {8, "Sana"} 
     };
     std::cout << "\tOriginal\n" << persons << std::endl;

     //orderby age and print the result 
     auto v1 = persons | orderby([](const person & p) { return p.age; });
     std::cout << "\tOrder by Age\n" << v1 << std::endl;

     //filter all teenagers then orderby name, and print the result
     auto v2 = persons 
               | where([](const person & p) { return p.age >=13 && p.age <= 19 ; }) 
               | orderby([](const person & p) { return p.age; });
     std::cout << "\tFilter Teenagers, and Order by Age\n" << v2 << std::endl;

     //filter all teenagers then orderby name, and print the result
     auto v3 = persons 
               | where([](const person & p) { return p.age >=13 && p.age <= 19 ; }) 
               | orderby([](const person & p) { return p.name; });
     std::cout << "\tFilter Teenagers, and Order by Name\n" << v3 << std::endl;

     //groupby kids, teenagers, adults and print the result
     auto groups = persons | groupby([](const person & p) 
                   {  
                       if ( p.age < 13 ) return "Kids";
                       else if ( p.age < 20 ) return "Teenagers";
                       else return "Adults";
                   });
     std::cout << "\tGroupby Maturity" << std::endl;
     //Note that the type of groups is std::map!
     for(auto i = groups.begin(); i != groups.end(); i++ )
     {
           std::cout << i->first << std::endl;
           std::cout << i->second << std::endl;
     }
}
Operator `<<` has been overloaded for `person` and `vector`, so the output of above code looks like this:
         Original
(1,Aleeza)
(3,Nayab)
(13,Ashraf)
(16,Adil)
(20,Ahsan)
(21,Zartab)
(17,Nadim)
(19,Nafis)
(13,Neha)
(15,Zeeshan)
(11,Farah)
(13,Fauzia)
(23,Nilufer)
(24,Talat)
(7,Tufail)
(8,Sana)

        Order by Age
(1,Aleeza)
(3,Nayab)
(7,Tufail)
(8,Sana)
(11,Farah)
(13,Ashraf)
(13,Neha)
(13,Fauzia)
(15,Zeeshan)
(16,Adil)
(17,Nadim)
(19,Nafis)
(20,Ahsan)
(21,Zartab)
(23,Nilufer)
(24,Talat)

         Filter Teenagers, and Order by Age
(13,Ashraf)
(13,Neha)
(13,Fauzia)
(15,Zeeshan)
(16,Adil)
(17,Nadim)
(19,Nafis)

         Filter Teenagers, and Order by Name
(16,Adil)
(13,Ashraf)
(13,Fauzia)
(17,Nadim)
(19,Nafis)
(13,Neha)
(15,Zeeshan)

          Groupby Maturity
Kids
(1,Aleeza)
(3,Nayab)
(11,Farah)
(7,Tufail)
(8,Sana)

Teenagers
(13,Ashraf)
(16,Adil)
(17,Nadim)
(19,Nafis)
(13,Neha)
(15,Zeeshan)
(13,Fauzia)

Adults
(20,Ahsan)
(21,Zartab)
(23,Nilufer)
(24,Talat)
The complete implementation of linq in C++, along with the test code, is uploaded at github. Have a look at linqcpp. Enjoy!