How to efficient insert and fetch UUID in Core Data How to efficient insert and fetch UUID in Core Data sqlite sqlite

How to efficient insert and fetch UUID in Core Data


Store them as a ASCII string, and make the field an index.

EDIT

Egads, I happened to be doing some poking about, and came across this. What a shameful answer. I must have been in a bit of a mood that day. If I could, I'd just delete it and move on. However, that's not possible, so I'll provide a snip of an update.

First, the only way to know what is "efficient" is to measure, considering program time and space as well as source code complexity and programmer effort.

Fortunately, this one is pretty easy.

I wrote a very simple OSX application. The model consists of a single attribute: identifier.

None of this matters, if you do not mark your attribute as an index. It will take a whole lot more time when creating the store, but it will make queries much faster.

Also, note that creating a predicate for a binary attribute is exactly the same as creating one for a string:

fetchRequest.predicate =    [NSPredicate predicateWithFormat:@"identifier == %@", identifier];

The application is very simple. First, it creates N objects, and assigns a UUID to the identifier attribute. It saves the MOC every 500 objects. We then store all identifiers into an array and randomly shuffle them. The whole CD stack is then torn down completely to remove it all from memory.

Next, we build the stack again, and then iterate over the identifiers, and do a simple fetch. The fetch object is constructed, with a simple predicate to fetch that one object. All of this is done inside an autoreleasepool to keep each fetch as pristine as possible (I acknowledge that there will be some interaction with the CD caches). That's not so important, as we are just comparing the different techniques.

Binary identifier is the 16-bytes for the UUID.

UUID String is a 36-byte string, the result of calling [uuid UUIDString], and it looks like this (B85E91F3-4A0A-4ABB-A049-83B2A8E6085E).

Base64 String is a 24-byte string, the result of base-64 encoding the 16-byte UUID binary data, and it looks like this (uF6R80oKSrugSYOyqOYIXg==) for the same UUID.

Count is the number of objects for that run.

SQLite size is the size of the actual sqlite file.

WAL size is how big the WAL (write-ahead-logging) file gets - just FYI...

Create is the number of seconds to create the database, including saving.

Query is the number of seconds to query each object.

Data Type     | Count (N) | SQLite Size | WAL Size  | Create  | Query--------------+-----------+-------------+-----------+---------+---------Binary        |   100,000 |   5,758,976 | 5,055,272 |  2.6013 |  9.2669Binary        | 1,000,000 |  58,003,456 | 4,783,352 | 59.0179 | 96.1862UUID String   |   100,000 |  10,481,664 | 4,148,872 |  3.6233 |  9.9160UUID String   | 1,000,000 | 104,947,712 | 5,792,752 | 68.5746 | 93.7264Base64 String |   100,000 |   7,741,440 | 5,603,232 |  3.0207 |  9.2446Base64 String | 1,000,000 |  77,848,576 | 4,931,672 | 63.4510 | 94.5147

The first thing to note here is that the actual database size is much larger than the bytes stored (1,600,000 and 16,000,000) - which is to be expected for a database. The amount of extra storage will be somewhat relative to the size of your actual objects... this one only stores the identifier so the percentage of overhead will be higher).

Second, on the speed issues, for reference, doing the same 1,000,000 object query, but using the object-id in the fetch took about 82 seconds (note the stark difference between that and calling existingObjectWithID:error: which took a whopping 0.3065 seconds).

You should profile your own database, including a judicious use of instruments on the running code. I imagine the numbers would be somewhat different if I did multiple runs, but they are so close that it's not necessary for this analysis.

However, based on these numbers, let's look at efficiency measurements for the code execution.

  • As expected, storing the raw UUID binary data is more efficient in terms of space.
  • The creation time is pretty close (the difference appearing to be based on the time to create the strings and the extra storage space required).
  • The query times seem almost identical, with the binary string appearing to be a tiny bit slower. I think this was the original concern -- doing a query on a binary attribute.

Binary wins space by a lot, and it can be considered a close draw on both creation time and query time. If we just consider those, storing the binary data is the clear winner.

How about source code complexity and programmer time?

Well, if you are using a modern version of iOS and OSX, there is virtually no difference, especially with a simple category on NSUUID.

However, there is one consideration for you, and that's ease of using the data in the database. When you store binary data, it's hard to get a good visual on the data.

So, if, for some reason, you want the data in the database to be stored in a more efficient manner for humans, then storing it as a string is a better choice. So, you may want to consider a base64 encoding (or some other encoding -- though remember it's already in base-256-encoding).

FWIW, here's an example category to provide easier access to the UUID as both NSData and base64 string:

- (NSData*)data{    uuid_t rawuuid;    [self getUUIDBytes:rawuuid];    return [NSData dataWithBytes:rawuuid length:sizeof(rawuuid)];}- (NSString*)base64String{    uuid_t rawuuid;    [self getUUIDBytes:rawuuid];    NSData *data = [NSData dataWithBytesNoCopy:rawuuid length:sizeof(rawuuid) freeWhenDone:NO];    return [data base64EncodedStringWithOptions:0];}- (instancetype)initWithBase64String:(NSString*)string{    NSData *data = [[NSData alloc] initWithBase64EncodedString:string options:0];    if (data.length == sizeof(uuid_t)) {        return [self initWithUUIDBytes:data.bytes];    }    return self = nil;}- (instancetype)initWithString:(NSString *)string{    if ((self = [self initWithUUIDString:string]) == nil) {        self = [self initWithBase64String:string];    }    return self;}


Since this post seems to be fairly popular, it's worth noting that things changed a bit since 2012.

You can now use NSUUIDA/UUID attribute type (UUIDAttributeType) instead of mapping it manually either to string or binary data (added in iOS 11). UUID will be stored as binary automatically, which according the other answer is the fastest, most optimal way to store UUIDs in CoreData.

WWDC17: What's New in Core Data

[20:21] We've added NSUUIDA Attribute Type and a NSURL Attribute type backed by the UUID and URL value classes respectively.