How to implement ACL on an ElasticSearch-based system? How to implement ACL on an ElasticSearch-based system? elasticsearch elasticsearch

How to implement ACL on an ElasticSearch-based system?


I would suggest having a separate Elasticsearch index for the ACLs, which should be much smaller than your main document index. This will allow you to tune the ACL index settings appropriately, e.g. (1) with a number of shards lower than your main document index, (2) auto_expand_replicas set to 0-all in case you'd like to use terms query (example: load all documents owned by a user), and (3) enforce different retention/GDPR policies.

The ACL index can then contain a document for each ACL rule, e.g. userId=1,docId=123,opType=POST. Note that this approach will allow you to define ACL rules for other types of principals and resources in the future. Moreover, this can support ACLs that can match new documents dynamically, e.g. userId=1,opType=POST,pattern="*" will allow user with userId=1 to post any document, effectively being a sysadmin. Decoupling ACLs from the documents/users will allow you to update ACLs without having to update corresponding documents, which will perform better in Elasticsearch which doesn't do an in-place update and instead deletes and re-creates the document. Moreover, you'd be able to replace (PUT) the entire document without worrying about preserving the associated ACLs. However, you may want to clean up ACLs when documents or users are deleted, which can be done during the deletion or as a separate scheduled cleanup process.

Now that the ACLs are separate from the documents themselves, they can be cached in memcached or Redis cluster without requiring too much memory. In a typical OLTP system only a small subset of users is active at any point in time, so you can configure your LRU cache appropriately to increase the hit rate. It's hard to provide further recommendations without knowing what kind of access patterns are characteristic of your system.

One last point to consider is what generates the ACLs. If some ACLs are generated automatically, e.g. based on some pattern, then maybe you could use this pattern in your system to avoid having an ACL rule per user per document. For example, if some ACLs are generated from directory service, then you might be able to cache (and periodically refresh) LDAP rules in your ACL management system.


For anyone who is going through the same problem here is the conclusion we draw on the case: being ACLs in microservices REST granular to the point of resources represent challenges similar to a multi-tentant system.

They are business logic and every service knows "how" someone owns a resource (and what are the possible privileges). To standardize how the data on these rules are stored is something that goes against precisely the knowledge of the logic of each service.

The point we can standardize is the endpoints of the ACL's of each microservice (routes that assume the same contract and signature). And if you really want to isolate ACLs in the private environment of the APIs (services), since we have a microservice that is responsible for user control and privileges, the entire architecture can be turned to event-sourcing.

Example without ACL's private API isolation:

  1. We have 3 services: "S (A)" which is responsible for the control of users and privileges, "S (B)" and "S (C)" that do any ordinary task.

  2. The frontend application will have to understand the endpoint of S (A), S (B) and S (C) and make individual requests to control ACL policies of each service.

Example with private API isolation and event-sourcing:

  1. The same microservices are present.

  2. The frontend application makes a request to S (A) applying some ACL policy to S (B) and S (C).

  3. S (A) records the policy change request and triggers an event in a broker notifying the policy change.

  4. S (B) and S (C) capture the event and apply the policies in their logic.

  5. S (B) and S (C) publish results of policy implementation (grant or revocation).

  6. S (A) captures the result event of applying policies and records this result.

I'll choose the answer from @alecswan as the correct one as it was a "starting point" to come to that conclusion.

Thanks also @xeye, which alerted us to the business logic part.