How can I get breakpoints/logs/increased visibility when my Main Thread blocks? How can I get breakpoints/logs/increased visibility when my Main Thread blocks? multithreading multithreading

How can I get breakpoints/logs/increased visibility when my Main Thread blocks?


So I set forth to answer my own question this weekend. For the record, this endeavor turned into something pretty complex, so like Kendall Helmstetter Glen suggested, most folks reading this question should probably just muddle through with Instruments. For the masochists in the crowd, read on!

It was easiest to start by restating the problem. Here's what I came up with:

I want to be alerted to long periods of time spent in syscalls/mach_msg_trap that are not legitimate idle time. "Legitimate idle time" is defined as time spent in mach_msg_trap waiting for the next event from the OS.

Also importantly, I didn't care about user code that takes a long time. That problem is quite easy to diagnose and understand using Instruments' Time Profiler tool. I wanted to know specifically about blocked time. While it's true that you can also diagnose blocked time with Time Profiler, I've found it considerably harder to use for that purpose. Likewise, the System Trace Instrument is also useful for investigations like this, but is extremely fine grained and complex. I wanted something simpler -- more targetted to this specific task.

It seemed evident from the get-go that the tool of choice here would be Dtrace. I started out by using a CFRunLoop observer that fired on kCFRunLoopAfterWaiting and kCFRunLoopBeforeWaiting. A call to my kCFRunLoopBeforeWaiting handler would indicate the beginning of a "legitimate idle time" and the kCFRunLoopAfterWaiting handler would be the signal to me that a legitimate wait had ended. I would use the Dtrace pid provider to trap on calls to those functions as a way to sort legitimate idle from blocking idle.

This approach got me started, but in the end proved to be flawed. The biggest problem is that many AppKit operations are synchronous, in that they block event processing in the UI, but actually spin the RunLoop lower in the call stack. Those spins of the RunLoop are not "legitimate" idle time (for my purposes), because the user can't interact with the UI during that time. They're valuable, to be sure -- imagine a runloop on a background thread watching a bunch of RunLoop-oriented I/O -- but the UI is still blocked when this happens on the main thread. For example, I put the following code into an IBAction and triggered it from a button:

NSMutableURLRequest *req = [NSMutableURLRequest requestWithURL: [NSURL URLWithString: @"http://www.google.com/"]                                                    cachePolicy: NSURLRequestReloadIgnoringCacheData                                               timeoutInterval: 60.0];    NSURLResponse* response = nil;NSError* err = nil;[NSURLConnection sendSynchronousRequest: req returningResponse: &response error: &err];

That code doesn't prevent the RunLoop from spinning -- AppKit spins it for you inside the sendSynchronousRequest:... call -- but it does prevent the user from interacting with the UI until it returns. This is not "legitimate idle" to my mind, so I needed a way to sort out which idles were which. (The CFRunLoopObserver approach was also flawed in that it requred changes to the code, which my final solution does not.)

I decided that I would model my UI/main thread as a state machine. It was in one of three states at all times: LEGIT_IDLE, RUNNING or BLOCKED, and would transition back and forth between those states as the program executed. I needed to come up with Dtrace probes that would allow me to catch (and therefore measure) those transitions. The final state machine I implemented was quite a bit more complicated than just those three states, but that's the 20,000 ft view.

As described above, sorting out legitimate idle from bad idle was not straightforward, since both cases end up in mach_msg_trap() and __CFRunLoopRun. I couldn't find one simple artifact in the call stack that I could use to reliably tell the difference; It appears that a simple probe on one function is not going to help me. I ended up using the debugger to look at the state of the stack at various instances of legitimate idle vs. bad idle. I determined that during legitimate idle, I'd (seemingly reliably) see a call stack like this:

#0  in mach_msg#1  in __CFRunLoopServiceMachPort#2  in __CFRunLoopRun#3  in CFRunLoopRunSpecific#4  in RunCurrentEventLoopInMode#5  in ReceiveNextEventCommon#6  in BlockUntilNextEventMatchingListInMode#7  in _DPSNextEvent#8  in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]#9  in -[NSApplication run]#10 in NSApplicationMain#11 in main

So I endeavored to set up a bunch of nested/chained pid probes that would establish when I had arrived at, and subsequently left, this state. Unfortunately, for whatever reason Dtrace's pid provider doesn't seem to be universally able to probe both entry and return on all arbitrary symbols. Specifically, I couldn't get probes on pid000:*:__CFRunLoopServiceMachPort:return or on pid000:*:_DPSNextEvent:return to work. The details aren't important, but by watching various other goings on, and keeping track of certain state, I was able to establish (again, seemingly reliably) when I was entered and left the legit idle state.

Then I had to determine probes for telling the difference between RUNNING and BLOCKED. That was a bit easier. In the end, I chose to consider BSD system calls (using Dtrace's syscall probe) and calls to mach_msg_trap() (using the pid probe) not occurring in periods of legit idle to be BLOCKED. (I did look at Dtrace's mach_trap probe, but it did not seem to do what I wanted, so I fell back to using the pid probe.)

Initially, I did some extra work with the Dtrace sched provider to actually measure real blocked time (i.e. time when my thread had been suspended by the scheduler) but that added considerable complexity, and I ended up thinking to myself, "If I'm in the kernel, what do I care if the thread is actually asleep or not? It's all the same to the user: it's blocked." So the final approach just measures all time in (syscalls || mach_msg_trap()) && !legit_idle and calls that the blocked time.

At this point, catching single kernel calls of long duration (like a call to sleep(5) for instance) is rendered trivial. However, more often UI thread latency comes from many little latencies accumulating over multiple calls into the kernel (think of hundreds of calls to read() or select()), so I thought it would also be desirable to dump SOME call stack when the overall amount of syscall or mach_msg_trap time in a single pass of the event loop exceeded a certain threshold. I ended up setting up various timers and logging accumulated time spent in each state, scoped to various states in the state machine, and dumping alerts when we happened to be transitioning out of the BLOCKED state, and had gone over some threshold. This method will obviously produce data that is subject to misinterpretation, or might be a total red herring (i.e. some random, relatively quick syscall that just happens to push us over the alert threshold), but I feel like it's better than nothing.

In the end, the Dtrace script ends up keeping a state machine in D variables, and uses the described probes to track transitions between the states and gives me the opportunity to do things (like print alerts) when the state machine is transitioning state, based on certain conditions. I played around a bit with a contrived sample app that does a bunch of disk I/O, network I/O, and calls sleep(), and was able to catch all three of those cases, without distractions from data pertaining to legitimate waits. This was exactly what I was looking for.

This solution is obviously quite fragile, and thoroughly awful in almost every regard. :) It may or may not be useful to me, or anyone else, but it was a fun exercise, so I thought I'd share the story, and the resulting Dtrace script. Maybe someone else will find it useful. I also must confess being a relative n00b with respect to writing Dtrace scripts so I'm sure I've done a million things wrong. Enjoy!

It was too big to post in line, so it's kindly being hosted by @Catfish_Man over here: MainThreadBlocking.d


Really this is the kind of job for the Time Profiler instrument. I believe you can see where time is spent in code per thread, so you'd go see what code was taking a while to perform and get the answer as to what was potentially blocking UI.