dispatch_sync vs. dispatch_async on main queue

objective-c cocoa core-data grand-central-dispatch objective-c-blocks

This is a common issue related to disk I/O and GCD. Basically, GCD is probably spawning one thread for each file, and at a certain point you've got too many threads for the system to service in a reasonable amount of time.

Every time you call dispatch_async() and in that block you attempt to to any I/O (for example, it looks like you're reading some files here), it's likely that the thread in which that block of code is executing will block (get paused by the OS) while it waits for the data to be read from the filesystem. The way GCD works is such that when it sees that one of its worker threads is blocked on I/O and you're still asking it to do more work concurrently, it'll just spawn a new worker thread. Thus if you try to open 50 files on a concurrent queue, it's likely that you'll end up causing GCD to spawn ~50 threads.

This is too many threads for the system to meaningfully service, and you end up starving your main thread for CPU.

The way to fix this is to use a serial queue instead of a concurrent queue to do your file-based operations. It's easy to do. You'll want to create a serial queue and store it as an ivar in your object so you don't end up creating multiple serial queues. So remove this call:

dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);

Add this in your init method:

taskQ = dispatch_queue_create("com.yourcompany.yourMeaningfulLabel", DISPATCH_QUEUE_SERIAL);

Add this in your dealloc method:

dispatch_release(taskQ);

And add this as an ivar in your class declaration:

dispatch_queue_t taskQ;

objective-c cocoa core-data grand-central-dispatch objective-c-blocks

I believe Ryan is on the right path: there are simply too many threads being spawned when a project has 1,500 files (the amount I decided to test with.)

So, I refactored the code above to work like this:

- (void) establishImportLinksForFilesInProject:(LPProject *)aProject{        dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);     dispatch_async(taskQ,      ^{     // Create a new Core Data Context on this thread using the same persistent data store         // as the main thread. Pass the objectID of aProject to access the managedObject     // for that project on this thread's context:     NSManagedObjectID *projectID = [aProject objectID];     for (LPFile *fileToCheck in [backgroundContext objectWithID:projectID] memberFiles])     {        if (//Some condition is met)        {                // Here, we do the scanning for @import statements.                 // When we find a valid one, we put the whole path to the                 // imported file into an array called 'verifiedImports'.                 // Pass this ID to main thread in dispatch call below to access the same                // file in the main thread's context                NSManagedObjectID *fileID = [fileToCheck objectID];                // go back to the main thread and update the model                 // (Core Data is not thread-safe.)                dispatch_async(dispatch_get_main_queue(),                 ^{                    for (NSString *import in verifiedImports)                    {                         LPFile *targetFile = [mainContext objectWithID:fileID];                       // Add the relationship to targetFile.                     }                 });//end block         }    }    // Easy way to tell when we're done processing all files.    // Could add a dispatch_async(main_queue) call here to do something like UI updates, etc    });//end block    }

So, basically, we're now spawning one thread that reads all the files instead of one-thread-per-file. Also, it turns out that calling dispatch_async() on the main_queue is the correct approach: the worker thread will dispatch that block to the main thread and NOT wait for it to return before proceeding to scan the next file.

This implementation essentially sets up a "serial" queue as Ryan suggested (the for loop is the serial part of it), but with one advantage: when the for loop ends, we're done processing all the files and we can just stick a dispatch_async(main_queue) block there to do whatever we want. It's a very nice way to tell when the concurrent processing task is finished and that didn't exist in my old version.

The disadvantage here is that it's a bit more complicated to work with Core Data on multiple threads. But this approach seems to be bulletproof for projects with 5,000 files (which is the highest I've tested.)

objective-c cocoa core-data grand-central-dispatch objective-c-blocks

I think it is more easy to understand with diagram:

For the situation the author described:

|taskQ| ***********start|

|dispatch_1 ***********|---------

|dispatch_2 *************|---------

|dispatch_n ***************************|----------

|main queue(sync)|**start to dispatch to main|

*************************|--dispatch_1--|--dispatch_2--|--dispatch3--|*****************************|--dispatch_n|,

which make the sync main queue so busy that finally fail the task.

CodeHunter

dispatch_sync vs. dispatch_async on main queue

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last