How does a process know that semaphore is available How does a process know that semaphore is available unix unix

How does a process know that semaphore is available


The operating system schedules the process once more when the operating system is told by another process that it has done with the semaphore.

Semaphores are just one of the ways of interacting with the OS scheduler.

The kernel doesn't poll the semaphore; it doesn't need to. Every time a process calls sem_post() (or equivalent), that involves interaction with the kernel. What the kernel does during the sem_post() is look up whatever processes have previously called sem_wait() on the same semaphore. If one or more processes have called sem_wait(), it picks the process with the highest priority and schedules it. This shows up as that sem_wait() finally returning and that process carries on executing.

How This is Implemented Under the Hood

Fundamentally the kernel needs to implement something called an "atomic test and set". That is an operation where by the value of some variable can be tested and, if a certain condition is met (such as the value == 0) the variable value is altered (e.g. value = 1). If this succeeds, the kernel will do one thing, (like schedule a process), if this does not (because the condition value==0 was false) the kernel will do something difference (like put a process on the do-not-schedule list). The 'atomic' part is that this decision is made without anything else being able to look at and change the same variable at the same time.

There's several ways of doing this. One is to suspend all processes (or at least all activity within the kernel) so that nothing else is testing the value of the variable at the same time. That's not very fast.

For example, the Linux kernel once had something called the Big Kernel Lock. I don't know if this was used to process semaphore interactions, but that's the kind of thing that OSes used to have for atomic test & sets.

These days CPUs have atomic test & set op codes, which is a lot faster. The good ole' Motorola 68000 had one of these a long time ago; it took CPUs like the PowerPC and the x86 many, many years to get the same kind of instruction.

If you root around inside linux you'll find mention of futexes. a futex is a fast mutex - it relies on a CPU's test/set instruction to implement a fast mutex semaphore.

Post a Semaphore in Hardware

A variation is a mailbox semaphore. This is a special variation on a semaphore that is extremely useful in some system types where hardware needs to wake up a process at the end of a DMA transfer. A mailbox is a special location in memory which when written to will cause an interrupt to be raised. This can be turned into a semaphore by the kernel because when that interrupt is raised, it goes through the same motions as it would had something called sem_post().

This is incredibly handy; a device can DMA a large amount of data to some pre-arranged buffer, and top that off with a small DMA transfer to the mail box. The kernel handles the interrupt, and if a process has previously called sem_wait() on the mailbox semaphore the kernel schedules it. The process, which also knows about this pre-arranged buffer, can then process the data.

On a real time DSP systems this is very useful, because it's very fast and very low latency; it allows a process to receive data from some device with very little delay. The alternative, to have a full up device driver stack that uses read() / write() to transfer data from the device to the process is incredibly slow by comparison.

Speed

The speed of semaphore interactions depends entirely on the OS.

For OSes like Windows and Linux, the context switch time is fairly slow (in the order of several microseconds, if not tens of microseconds). Basically this means that when a process calls something like sem_post(), the kernel is doing a lot of different things whilst it has the opportunity before finally returning control to the process(es). What it's doing during this time could be, well, almost anything!

If a program has made use of a lot threads, and they're all rapidly interacting between themselves using semaphores, quite a lot of time is lost to the sem_post() and sem_wait(). This places an emphasis on doing a decent amount of work once a process has returned from sem_wait() before calling the next sem_post().

However on OSes like VxWorks, the context switch time is lightning fast. That is there's very little code in the kernel that gets run when sem_post() is called. The result is that a semaphore interaction is a lot more efficient. Moreover, and OS like VxWorks is written in such a way so as to guarantee that the time take to do all this sem_post() / sem_wait() work is constant.

This influences the architecture of one's software on these systems. On VxWorks, where a context switch is cheap, there's very little penalty in having a large number of threads all doing quite small tasks. On Windows / Linux there's more of an emphasis on the opposite.

This is why OSes like VxWorks are excellent for hard real time applications, and Windows / Linux are not.

The Linux PREEMPT_RT patch set in part aims to improve the latency of the linux kernel during operations like this. For example, it pushes a lot of device interrupt handlers (device drivers) up into kernel threads; these are scheduled almost just like any other thread. The idea is to reduce the amount of work that is being done by the kernel (and have more done by kernel threads), so that the work it still has to do itself (such as handling sem_post() / sem_wait()) takes less time and is more consistent about how long this takes. It still not a hard guarantee of latency, but it's a pretty good improvement. This is what we call a soft-realtime kernel. The impact though is that overall throughput of the machine can be lower.

Signals

Signals are nasty, horrible things that really get in the way of using things like sem_post() and sem_wait(). I avoid them like the plague.

If you are on a Linux platform and you do have to use signals, take a serious long look at signalfd (man page). This is a far better way of dealing with signals because you can choose to accept them at a convenient time (simply by called read()), instead of having to handle them as soon as they occur. Certainly if you're using epoll() or select() anywhere at all in a program then signalfd is the way to go.