What is the difference between fork and thread? What is the difference between fork and thread? multithreading multithreading

What is the difference between fork and thread?


A fork gives you a brand new process, which is a copy of the current process, with the same code segments. As the memory image changes (typically this is due to different behavior of the two processes) you get a separation of the memory images (Copy On Write), however the executable code remains the same. Tasks do not share memory unless they use some Inter Process Communication (IPC) primitive.

One process can have multiple threads, each executing in parallel within the same context of the process. Memory and other resources are shared among threads, therefore shared data must be accessed through some primitive and synchronization objects (like mutexes, condition variables and semaphores) that allow you to avoid data corruption.


Fork

Fork is nothing but a new process that looks exactly like the old or the parent process but still it is a different process with different process ID and having its own memory. The parent process creates a separate address space for the child. Both parent and child process possess the same code segment, but execute independently from each other.

The simplest example of forking is when you run a command on shell in Unix/Linux. Each time a user issues a command, the shell forks a child process and the task is done.

When a fork system call is issued, a copy of all the pages corresponding to the parent process is created, loaded into a separate memory location by the OS for the child process, but in certain cases, this is not needed. Like in ‘exec’ system calls, there is no need to copy the parent process pages, as execv replaces the address space of the parent process itself.

Few things to note about forking are:

  • The child process will be having its own unique process ID.
  • The child process shall have its own copy of the parent’s file descriptor.
  • File locks set by parent process shall not be inherited by child process.
  • Any semaphores that are open in the parent process shall also be open in the child process.
  • Child process shall have its own copy of the parent's message queue descriptors.
  • Child will have its own address space and memory.

Threads

Threads are Light Weight Processes (LWPs). Traditionally, a thread is just a CPU (and some other minimal state) state with the process containing the rest (data, stack, I/O, signals). Threads require less overhead than “forking” or spawning a new process because the system does not initialize a new system virtual memory space and environment for the process. While most effective on a multiprocessor system where the process flow can be scheduled to run on another processor thus gaining speed through parallel or distributed processing, gains are also found on uniprocessor systems which exploit latency in I/O and other system functions which may halt process execution.

Threads in the same process share:

  • process instructions
  • most data
  • open files (descriptors)
  • signals and signal handlers
  • current working directory
  • user and group id

More details can be found here.


Dacav's answer is excellent, I just wanted to add that not all threading models give you true multi-processing.

For example, Ruby's default threading implementation doesn't use true OS / kernel threads. Instead it mimics having multiple threads by switching between the Thread objects within a single kernel thread / process.

This is important on multiprocessor / multi-core systems, because these types of lightweight threads can only run on a single core - you don't get much in the way of performance boost from having multiple threads.

The other place this makes a difference is when one thread blocks (waiting on I/O or calling a driver's IOCTL), all Threads block.

This isn't very common nowadays - most threading implementations use kernel threads which don't suffer from these issues - but its worth mentioining for completeness.

By contrast, fork gives you another process which is runnable simultaneously on another physical CPU while the original process is executing. Some people find IPC more suitable for their app, others prefer threading.

Good luck and have fun! Multi-threading is both challenging and rewarding.