Android (distributed application) primary key strategy Android (distributed application) primary key strategy sqlite sqlite

Android (distributed application) primary key strategy


This is more questions then answers...

It does make things easier if you can auto-generate all your id's, so you don't have to fetch them from the server and worry about whether you have a connection. You mention that you can't take the common approach (UUID or ANDROID_ID) because you will be using a long "as suggested by the Android platform".

Are you referring to the fact that Android assumes that your SQLite tables will have a long _id primary key?

Are you using a datastore or an SQL database on your server?

If you are using a datastore with hierarchical keys (e.g. google datastore) then how about if you use UUID/ANDROID_ID as client id, and then a long as data item id. Then on the client you just store the long, and on the server your entities are stored with a key path of UUID/long.

Why do you write that the "high id must be a unique value over the database"? Since it is prepended with the client id, perhaps you mean that it must be unique on the local database?

To handle your problem that the user could uninstall and reinstall the app, why not pursue your idea of "save the current high id on the server to be able to restore it on loss or on reinstallation". Since you already plan to retrieve the client id on first run (and can't assign id's until you have it) you might as well also ask the server for the next available high id.

Do your entities have some other key material such that you could generate a 32bit hash from that material for your high id? Assuming that the high id only need to be unique on a particular client (and assuming you won't have a massive # of entities on a client) then I think you would never get a collision if you have decent key material and use a hash function that minimizes collisions.


From my experience: use local IDs on the device and separate IDs on the server. Every time you communicate data over the wire, convert from one to the other. This will actually clarify the process and ease debugging. The conversion routines stay small, are well isolated and represent a natural element in the application architecture. The data travelling over the wire is expected to be relatively small, anyway, and ID conversion will not be a big overhead. Also, the amount of data being kept on the mobile device is, presumably, small (the bulk is on the server).

I propose to do conversion on the device with a simple table local_ID<->server_ID. The server should only provide one procedure: generate a batch of keys, say 444 new keys, which, presumably, the mobile device then will assign to its local IDs and send data to the server with server_IDs only. The conversion table can be occasionally purged of unused IDs, and local IDs can be reused, 32-bit integers will definitely suffice.

Motivation

The tables stay small, implementation stays optimal to the native device architecture, isolated from unpredictable architectural changes elsewhere and there is a nice point for debugging and tracing, through which all data passes.

I had an application regenerate all IDs on every data file save and load. It was unexpectedly simple, fast and opened up elegant other possibilities like ID-space defragmentation and consolidation.

In your case, you can change the server technology with minimal changes to the client application. Since the client can operate offline anyway, it could use only the local IDs in most functions. Only the synchronization module would fetch and convert the server-IDs.


I offered two bounties on this question and didn't find the answer I am looking for. But I spent some time on thinking about the best solution and maybe the question was not open enough and focused to much on the solution I had in mind.

However there are a lot of different strategies available, now (after the second bounty) I think the first question to answer is which data model(s) do you have in your distributed environment? You might have

  1. the same (or a subset) data model on client and server
  2. differnet client data model and server data model

If you answer with 1) then you can choose for your key strategy from

  • using GUID
  • using my approach High/Low
  • mapping keys as @user3603546 suggested

If you answer with 2) then only the following comes in my mind

  • composite id

I never liked composite id's, but when I think about it (and don't call it composite id's anyway) then it could be a possible solution. Following I want to sketch this solution:

Glossary:

  • <client key> ... primary key generated at the client side, so the client chooses the implementation (long _id for Android)
  • <server key> ... primary key generated at the server side, so the server chooses the implementation
  • <client id> ... ID for identifying the client
  • <device id> ... ID for identifying the device, there is a 1-n relation between client and device

Solution:

  • Use it only if you have a client data model and a server data model
  • The client data model has the fields
    • <client key> primary key
    • <server key> nullable data field
  • The server data model has the fields
    • <server key> as primary key
    • <client key> nullable data field
    • <client id> as mandatory data field to distinguish the client
  • When synchronizing from server to client, generate missing <client key> on the client and mark entry as dirty (so that the client id comes to the server at the end of the day)
  • When synchronizing from client to server, generate missing <server key> on the server before saving it
  • The mapping between client and server data model can be handled by specialised frameworks like dozer or Orika, however the key generation must be integrated when performing the mapping.

I never liked this solution because I always thought in server data model terms. I have entities which live only on the server and I always wanted to create these entities on the client which would not be possible. But when I think in client data model I might have one entity eg. Product which results in two entities (Product and a ClientProduct) on the server.