How to design generic filtering operators in the query string of an API? How to design generic filtering operators in the query string of an API? sql sql

How to design generic filtering operators in the query string of an API?


I might not answer the "which one is best" question, but I can at least give you some insights and other examples to consider.

First, you are talking about "generic API with content and a schema that can be user-defined".

That sound a lot like solr / elasticsearch which are both hi level wrappers over Apache Lucene which basically indexes and aggregates documents.

Those two took totally different approaches to their rest API, I happened to work with both of them.

Elasticsearch :

They made entire JSON based Query DSL, which currently looks like this :

GET /_search{  "query": {     "bool": {       "must": [        { "match": { "title":   "Search"        }},         { "match": { "content": "Elasticsearch" }}        ],      "filter": [         { "term":  { "status": "published" }},         { "range": { "publish_date": { "gte": "2015-01-01" }}}       ]    }  }}

Taken from their current doc. I was surprised that you can actually put data in GET...It actually looks better now, in earlier versions it was much more hierarchical.

From my personal experience, this DSL was powerful, but rather hard to learn and use fluently (especially older versions). And to actually get some result you need more than just play with URL. Starting with the fact that many clients don't even support data in GET request.

SOLR :

They put everything into query params, which basically looks like this (taken from the doc) :

q=*:*&fq={!cache=false cost=5}inStock:true&fq={!frange l=1 u=4 cache=false cost=50}sqrt(popularity)

Working with that was more straightforward. But that's just my personal taste.


Now about my experiences. We were implementing another layer above those two and we took approach number #4. Actually, I think #4 and #5 should be supported at the same time. Why? Because whatever you pick people will be complaining, and since you will be having your own "micro-DSL" anyway, you might as well support few more aliases for your keywords.

Why not #2? Having single filter param and query inside gives you total control over DSL. Half a year after we made our resource, we got "simple" feature request - logical OR and parenthesis (). Query parameters are basically a list of AND operations and logical OR like city=London OR age>25 don't really fit there. On the other hand parenthesis introduced nesting into DSL structure, which would also be a problem in flat query string structure.

Well, those were the problems we stumbled upon, your case might be different. But it is still worth to consider, what future expectations from this API will be.


Matomo Analytics has an other approach to deal with segment filter and its syntaxe seems to be more readable and intuitive, e.g:developer.matomo.org/api-reference/reporting-api-segmentation

OperatorBehaviorExample
==Equals&segment=countryCode==IN Return results where the country is India
!=Not equals&segment=actions!=1 Return results where the number of actions (page views, downloads, etc.) is not 1
<=Less than or equal to&segment=actions<=4 Return results where the number of actions (page views, downloads, etc.) is 4 or less
<Less than&segment=visitServerHour<12 Return results where the Server time (hour) is before midday.
=@Contains&segment=referrerName=@piwik Return results where the Referer name (website domain or search engine name) contains the word "piwik".
!@Does not contain&segment=referrerKeyword!@yourBrand Return results where the keyword used to access the website does not contain word "yourBrand".
=^Starts with&segment=referrerKeyword=^yourBrand Return results where the keyword used to access the website starts with "yourBrand" (requires at least Matomo 2.15.1).
=$Ends with&segment=referrerKeyword=$yourBrand Return results where the keyword used to access the website ends with "yourBrand" (requires at least Matomo 2.15.1).

and you can have a close look at how they parse the segment filter here: https://github.com/matomo-org/matomo/blob/4.x-dev/core/Segment/SegmentExpression.php


#4

I like how Google Analytics filter API looks like, easy to use and easy to understand from a client's point of view.

They use a URL encoded form, for example:

  • Equals: %3D%3D filters=ga:timeOnPage%3D%3D10
  • Not equals: !%3D filters=ga:timeOnPage!%3D10

Although you need to check documentation but it still has its own advantages. IF you think that the users can get accustomed to this then go for it.


#2

Using operators as key suffixes also seems like a good idea (according to your requirements).

However I would recommend to encode the + sign so that it isn't parsed as a space. Also it might be slightly harder to parse as mentioned but I think you can write a custom parser for this one. I stumbled across this gist by jlong some time back. Perhaps you'll find it useful to write your parser.