Create Your Own Query Language

How to bake delicious home-made query languages for in-mmemory data filtering.

Posted by Heikki Kupiainen/Oppikone on 02.07.2017 12:01:05

Creating Your Language Integrated Query (LinQ) Solution

In this article I will demonstrate how to implement your own miniature query language for filtering datasets. If you are familiar with C# and .NET coding, you probably know how wildly useful tool LinQ is. But sadly, for many other programming languages there are no similar native implementations. But no worries, I will demonstrate that creating a home-made LinQ for your favorite programming language is not necessarily difficult at all!

If you have ever used SQL (Structured Query Language) you probably know to respect the leverage it offers. SQL is a great tool for finding information from a big database. However, doing SQL-based queries into datasets usually requires that data to be placed into a database first. This will cause a great deal of complexity if your code maintains some dataset in dynamic memory and you want to filter out that data. Ideally, you will filter your dataset using a flexible query language similar to SQL. But in order to use SQL, you must place the data into a database first - only after that you can take advantage of the SQL implemantation that this database offers. It requires often a lot of configuration - especially if your application does not already use any database. Not to mention the overhead it takes to store your in-memory data into database - just for filtering purpose only! And even more bad news: web-based applications typically cannot directly access any database at all - browsers do not come with a database included!

However, creating a miniature query language is quite easy if you know what to do. Basically, you can create your own query language following same guidelines and principles in most programming languages. Next, I will reveal the black art of implementing home-made in-memory query languages!

Evaluation Function Is The Key

Most scripting languages have an evaluation function, often called eval as in Ruby and JavaScript. Though not often used by developers, this function is greatly useful when creating your own query languages. Consider following code in JavaScript:

var item = { forname: 'John', surname: 'Doe' }
eval("var result = item.forname ")
console.log(result)

With eval function, you can construct executable code on the fly in dynamic fashion and run it! But what is the big deal here? Okay, I'll explain! Let's say that we have an array of items representing contact data. I have a use case that I want to find those contacts who are younger than 40 and whose forname starts with "J". Here is how you implement this in JavaScript:

Quite nice. This code does exactly what I was looking for. But the problem is that the conditions ( name starts with 'J' and younger than 40 years ) are hard coded. This code is useless if I want to find anybody whose name starts with 'E'. Next, I'll be showing how eval function steps in to help us here! I am going to turn our filter into a dynamic function that allows you to pass condition code snippet as parameter:

The change I introduced above allows you to make all kinds of conditions for your search. Let's use our newly generated filter function to find all contacts whose name is either Gabriel or Mikael:

In the example above, we define conditions dynamically in a string and pass them to filter function we just implemented. This certainly is a bold step toward implementing a true query language. But we are not done yet! the conditions we defined in a string are still using JavaScript syntax, which - while certainly logical - definitely does not look and feel as intuitive as similar condition expressed in SQL - a language that was intended to mimic human language as much as plausible.

Why Natural Language Queries?

Now that we have a basic filtering engine, it's time to get down to secret art of baking query languages. I'll show how to convert an innocent natural language query into a bad-ass machine-readable condition expression that JavaScript engine understands. Let's say that I want to find all contacts whose

forname starts with 'J' and surname contains 'nen' and age is less than 40 and zip code is either '88200' or '00610'

- that would be our nearly natural language query that will also be the syntax of our query language. It certainly sounds more intuitive than its translation in JavaScript:

contact.forname.indexOf('J') == 0 && contact.surname.indexOf('nen') > -1 && contact.age < 40 && ( contact.zip_code == '88200' || contact.zip_code == '00610')

And if we want to make sure null values in string attributes won't cause the execution to crash, we need to buffer them with toString() function which will turn the condition even more ugly:

contact.forname.toString().indexOf('J') == 0 && contact.surname.toString().indexOf('nen') > -1 && contact.age < 40 && ( contact.zip_code.toString() == '88200' || contact.zip_code.toString() == '00610')

If you asked before why not forget the headache of using query language alltogether and define the conditions in native programming language right away instead, maybe you are having some second thoughts by now. At least if you are not some hyper geek!

Implementing Natural Language Parser

There are dozens of parser implementations that are intended to turn human-readable queries into executable condition expressions - no new innovations here. For comparison, check some UI libraries with nice filter implementations such as Kendo UI. Putting together new query parsers is therefore rather part of common cultural heritage, such as alphabets or algebra.

Let's analyze the morphology of query. There is a clear pattern featuring [PROPERTY] [RELATIONAL_OPERATOR] [VALUE]. Queries then are combined using AND or OR operator. So we clearly must split combined query into atomic conditions before parsing them. When we split our combined query into conditions, we'll get following instances:

  • forname starts with 'J'
  • surname contains 'nen'
  • age is less than 40
  • zip code is either '88200' or '00610'

How can we get there? It's pretty standard, really! Just split the query by and separator:

var conditions = query.split(' and ')

Now we can proceed to next step - convert each condition into JavaScript language. Let's create a loop for iterating all conditions!

In the code above I iterate given queries and execute the actual conversion function for each of them. The conversion function is wrapped in exception handling block - convertQueryIntoJavaScript will throw an exception if there is a syntax error. Next, I'll show how to process different kinds of conditions!

To be continued...