Is it possible to create threads without system calls in Linux x86 GAS assembly? Is it possible to create threads without system calls in Linux x86 GAS assembly? multithreading multithreading

Is it possible to create threads without system calls in Linux x86 GAS assembly?


The short answer is that you can't. When you write assembly code it runs sequentially (or with branches) on one and only one logical (i.e. hardware) thread. If you want some of the code to execute on another logical thread (whether on the same core, on a different core on the same CPU or even on a different CPU), you need to have the OS set up the other thread's instruction pointer (CS:EIP) to point to the code you want to run. This implies using system calls to get the OS to do what you want.

User threads won't give you the threading support that you want, because they all run on the same hardware thread.

Edit: Incorporating Ira Baxter's answer with Parlanse. If you ensure that your program has a thread running in each logical thread to begin with, then you can build your own scheduler without relying on the OS. Either way, you need a scheduler to handle hopping from one thread to another. Between calls to the scheduler, there are no special assembly instructions to handle multi-threading. The scheduler itself can't rely on any special assembly, but rather on conventions between parts of the scheduler in each thread.

Either way, whether or not you use the OS, you still have to rely on some scheduler to handle cross-thread execution.


"Doctor, doctor, it hurts when I do this". Doctor: "Don't do that".

The short answer is you can do multithreaded programming withoutcalling expensive OS task management primitives. Simply ignore the OS for threadscheduling operations. This means you have to write your own threadscheduler, and simply never pass control back to the OS.(And you have to be cleverer somehow about your thread overheadthan the pretty smart OS guys).We chose this approach precisely because windows process/thread/fiber calls were all too expensive to support computationgrains of a few hundred instructions.

Our PARLANSE programming langauge is a parallel programming language:See http://www.semdesigns.com/Products/Parlanse/index.html

PARLANSE runs under Windows, offers parallel "grains" as the abstract parallelismconstruct, and schedules such grains by a combination of a highlytuned hand-written scheduler and scheduling code generated by thePARLANSE compiler that takes into account the context of grainto minimimze scheduling overhead. For instance, the compilerensures that the registers of a grain contain no information at the pointwhere scheduling (e.g., "wait") might be required, and thusthe scheduler code only has to save the PC and SP. In fact,quite often the scheduler code doesnt get control at all;a forked grain simply stores the forking PC and SP,switches to compiler-preallocated stack and jumps to the graincode. Completion of the grain will restart the forker.

Normally there's an interlock to synchronize grains, implementedby the compiler using native LOCK DEC instructions that implementwhat amounts to counting semaphores. Applicationscan fork logically millions of grains; the scheduler limitsparent grains from generating more work if the work queuesare long enough so more work won't be helpful. The schedulerimplements work-stealing to allow work-starved CPUs to grabready grains form neighboring CPU work queues. This hasbeen implemented to handle up to 32 CPUs; but we're a bit worriedthat the x86 vendors may actually swamp use with more thanthat in the next few years!

PARLANSE is a mature langauge; we've been using it since 1997,and have implemented a several-million line parallel application in it.


Implement user-mode threading.

Historically, threading models are generalised as N:M, which is to say N user-mode threads running on M kernel-model threads. Modern useage is 1:1, but it wasn't always like that and it doesn't have to be like that.

You are free to maintain in a single kernel thread an arbitrary number of user-mode threads. It's just that it's your responsibility to switch between them sufficiently often that it all looks concurrent. Your threads are of course co-operative rather than pre-emptive; you basically scatted yield() calls throughout your own code to ensure regular switching occurs.