Slides: 04 - Threads
Multi-Threading
Section titled “Multi-Threading”Motivation Thread Creation/Manipulation/Termination Threading models
Traditional Process Model: One Thread per Process
Section titled “Traditional Process Model: One Thread per Process”So far, each process had only one “thread of control” A thread is a “unit of execution” (hardware execution state: PC, general purpose registers, stack and SP)
For major OSs such as Unix and Windows, this has been the process model until mid-1990s This model is not efficient & is difficult to use for many non-trivial tasks!
Stack SP DATA (static data & heap)
PC
Multi-threading: Motivation
Section titled “Multi-threading: Motivation”Assume you need to compute the sum of N numbers in a parallel machine with 4 CPU cores. How to implement this?
Have the main application read the numbers into an array
fork() 4 copies of itself, each computing the sum of N/4 numbers
Merge the results
N/4
N/4
N/4
N/4
data
Fork 4 worker child processes
Main
Application
Worker3
Worker4
Worker1
Worker2
Send the partial sum to the main application (master)
Pseudocode for parallel sum
Section titled “Pseudocode for parallel sum”

Parallel Sum Process & IPC Model
Section titled “Parallel Sum Process & IPC Model”Computer running a POSIX OS
Worker1 Worker2 Worker3 Worker4 Master Process
Stack Stack Stack Stack Stack SP SP SP SP SP DATA DATA DATA DATA DATA PC
PC PC PC PC
Kernel partialSum Pipe
Parallel sum code with a pipe
Section titled “Parallel sum code with a pipe”04-Threads/ex1_pipe.c


Pros & Cons of this Solution
Section titled “Pros & Cons of this Solution”It works (was the model of computation for a long time), BUT:
Creating a new heavy-weight process using fork() is a very time-consuming operation. For each child process, fork() must create:
A new PCB with resources shared with the parent (files, sockets, etc.)
A new copy of the address space (code & data)
A new execution state (PC, registers, stack and stack pointer)
Processes must explicitly share data via shared memory or message passing
Can we do better?
Section titled “Can we do better?”What’s similar in these heavy-weight processes? they all share the same code and data (address space) they all share the same resources (files, sockets, etc.) they all share the same privileges
What’s different? each has its own hardware execution state PC, registers, stack pointer, and stack
Key idea: Separate the concept of a process (code, address space, etc.) from that of a “unit of execution” (hardware execution state: PC, registers, etc.) this “unit of execution” is usually called a thread, or a lightweight process
The Thread Model
Section titled “The Thread Model”
Multi-threaded Parallel Sum
Section titled “Multi-threaded Parallel Sum”DATA partialSum T4 (doSum) T3 (doSum) T1 (doSum) T2 (doSum) Main
PC
PC PC PC PC Stack (T4) Stack (T1) Stack (T2) Stack (T3) Stack (Main) Observe that threads share everything (static & dynamic data) No need for explicit utilities to share data
Recall: A single-threaded process’s address space
Section titled “Recall: A single-threaded process’s address space”0xFFFFFFFF stack (func. params & local vars)
SP
heap (dynamic allocated mem) address space
static data (data segment) code (text segment) PC
0x00000000
A multi-threaded process’s address space
Section titled “A multi-threaded process’s address space”0xFFFFFFFF thread 1 stack SP (T1)
thread 2 stack SP (T2)
thread 3 stack SP (T3)
address space
heap (dynamic allocated mem)
static data (data segment) code (text segment) PC (T2)
PC (T1)
PC (T3)
0x00000000
Process vs. Thread
Section titled “Process vs. Thread”Most modern OS’s (Mach, Chorus, Windows, Solaris, Linux, MacOS) therefore support two entities: the process, which defines the address space and general process resources (such as open files, sockets, etc.) the thread, which defines a sequential unit of execution within a process
A thread is bound to a single process processes, however, can have multiple threads executing within them All threads within a process share the same resources: code, data & heap
A thread is the unit of scheduling processes are just containers in which threads execute
POSIX pthreads
Section titled “POSIX pthreads”Defined by IEEE in 1995 Provides a standardized API (specification) for: thread creation, management, and synchronization
| Function | Description |
|---|---|
| pthread_create | Creates a new thread |
| pthread_exit | Terminates the calling thread |
| pthread_join | Waits for a specific thread to terminate |
| pthread_detach | Marks a thread for automatic resource cleanup upon termination (no need to call pthread_join for cleanup) |
| pthread_self | Returns a handle to the calling thread |
| pthread_cancel | Request the cancelation of another thread |
Thread Example (1)
Section titled “Thread Example (1)”04-Threads/ex2.c

T1 (printMessage) Main
PC PC Stack (Main) Stack (T1)
Thread Example (2)
Section titled “Thread Example (2)”04-Threads/ex3.c

gv
T1 (func1) T2 (func1) T3 (func2) Main
PC
PC PC PC
Stack (T1) Stack (Main) Stack (T2) Stack (T3)
Thread Example (3): Parallel Sum
Section titled “Thread Example (3): Parallel Sum”04-Threads/ex4.c

Thread Termination/Multi-threaded Process Termination
Section titled “Thread Termination/Multi-threaded Process Termination”To terminate, the thread calls pthread_exit()
The thread becomes a Zombie
Only cleaned up after another thread calls pthread_join() on it
The thread can call pthread_detach(), which makes it a detached thread, which means that when the thread terminates, all resources are deleted automatically without the need to call pthread_join() on the thread
A multi-threaded process terminates under two conditions:
All threads of the process terminate by calling pthread_exit()
That is, the process will terminate only after the last thread calls pthread_exit()
One of the threads invokes exit() system call terminating the process
Thread Cancelation
Section titled “Thread Cancelation”You can also call pthread_cancel() to signal a thread to terminate
A thread only responds to a cancellation request when it reaches a cancellation point or explicitly checks for cancellation
Examples of standard cancellation points include: read, write and other blocking I/O operations, pthread_mutex_lock, pthread_cond_wait
Windows Threads
Section titled “Windows Threads”
handle = CreateThread(...)
Create a new thread
WaitForSingleObject(handle)
Wait for thread termination
Why multi-threading even in a uniprocessor?
Section titled “Why multi-threading even in a uniprocessor?”Multi-threading is still very useful in a uniprocessor system even if only one of the threads can be running in the CPU at any given time Handling concurrent events (e.g., web servers, web browsers, editors, etc.) One thread to handle each incoming request (Web server) One thread downloading a page, one thread refreshing the GUI (Web browser) One thread reading user input, one thread refreshing the GUI, one thread saving the file to the disk in the background (Microsoft Word) One thread reading data from the keyboard, another thread reading data from a network socket, one thread updating the GUI (Chat application)
Thus, multi-threading greatly improves and simplifies program structure This is especially true when you have to deal with multiple blocking I/O operations
Concurrency vs Parallelism (1)
Section titled “Concurrency vs Parallelism (1)”| Concurrency | Parallelism |
|---|---|
| Ability of a system to handle multiple tasks or threads, often by interleaving their execution | Ability of the system to execute multiple threads or tasks simultaneously |
| Does NOT necessarily mean that threads are executed at the same time; it can involve a single-core CPU switching between threads quickly | Parallelism requires multiple execution units, such as multi-core CPUs, where each task/thread can run on a different core |
| is about structure (dealing with many things at once) | is about execution (doing many things at once) |
Concurrency vs Parallelism (2)
Section titled “Concurrency vs Parallelism (2)”Concurrent execution of 4 threads T1, T2, T3 and T4 on a single-core system


Thread Design Space
Section titled “Thread Design Space”Legends:
older UNIXes address space
MS/DOS
one thread/process one thread/process
one process many processes thread
Mach, Chorus, Windows, Linux, MacOS
Java
many threads/process many threads/process one process many processes
Threading Models (Windows, Linux): One-to-One
Section titled “Threading Models (Windows, Linux): One-to-One”Each user-level thread maps to a kernel thread Creating a user-level thread creates a kernel thread Thread creation and management requires system calls (clone to create) Adv: When a thread blocks for I/O, the kernel can schedule another thread leading to true concurrency Also: The kernel CAN schedule multiple threads in parallel

Threading Models: Many-to-One
Section titled “Threading Models: Many-to-One”Many user-level threads mapped to single kernel thread A user-level library manages thread creation/management But: When a user-level thread blocks for I/O, all the other user-level threads get blocked with it Also: The kernel can NOT schedule multiple user-level threads in parallel Rarely used (Solaris Green Threads, GNU Portable Threads)

Threading Models: Many-to-Many
Section titled “Threading Models: Many-to-Many”Many user-level threads mapped to many kernel thread OS allocates a sufficient number of kernel threads to run many user-level threads concurrently and in parallel Windows with the ThreadFiber package Difficult to implement and is NOT very common

Representing Threads in the Kernel: Windows
Section titled “Representing Threads in the Kernel: Windows”Threads are implemented as a separate DS (Thread Control Block - TCB), which is then linked to the Process Control Block (PCB)

Representing Threads in the Kernel: Linux
Section titled “Representing Threads in the Kernel: Linux”The basic unit, task, (task_struct) actually represents a thread!
The task structure has pointers to common process resources
Thus, two threads in the same process will point to the same resource structure instance

Semantics of fork() and exec()
Section titled “Semantics of fork() and exec()”Does fork() duplicate only the calling thread or all threads?
In Linux, the forked child process will have only one thread by default, which is the thread that called fork()
It is possible to clone all existing threads of the parent process when creating a new process using the clone system call, but it is very low-level and requires careful handling of synchronization issues
exec() works as expected: replace the running process including all threads and have a single thread that starts executing the “main” function