Slides: 04 - Threads

Multi-Threading

Motivation Thread Creation/Manipulation/Termination Threading models

Traditional Process Model: One Thread per Process

So far, each process had only one “thread of control” A thread is a “unit of execution” (hardware execution state: PC, general purpose registers, stack and SP)

For major OSs such as Unix and Windows, this has been the process model until mid-1990s This model is not efficient & is difficult to use for many non-trivial tasks!

Stack SP DATA (static data & heap)

Multi-threading: Motivation

Assume you need to compute the sum of N numbers in a parallel machine with 4 CPU cores. How to implement this? Have the main application read the numbers into an array fork() 4 copies of itself, each computing the sum of N/4 numbers Merge the results N/4 N/4 N/4 N/4 data Fork 4 worker child processes Main Application Worker3 Worker4 Worker1 Worker2 Send the partial sum to the main application (master)

Pseudocode for parallel sum

Parallel Sum Process & IPC Model

Computer running a POSIX OS

Worker1 Worker2 Worker3 Worker4 Master Process

Stack Stack Stack Stack Stack SP SP SP SP SP DATA DATA DATA DATA DATA PC

PC PC PC PC

Kernel partialSum Pipe

Parallel sum code with a pipe

04-Threads/ex1_pipe.c

Pros & Cons of this Solution

It works (was the model of computation for a long time), BUT:

Creating a new heavy-weight process using fork() is a very time-consuming operation. For each child process, fork() must create: A new PCB with resources shared with the parent (files, sockets, etc.) A new copy of the address space (code & data) A new execution state (PC, registers, stack and stack pointer)

Processes must explicitly share data via shared memory or message passing

Can we do better?

What’s similar in these heavy-weight processes? they all share the same code and data (address space) they all share the same resources (files, sockets, etc.) they all share the same privileges

What’s different? each has its own hardware execution state PC, registers, stack pointer, and stack

Key idea: Separate the concept of a process (code, address space, etc.) from that of a “unit of execution” (hardware execution state: PC, registers, etc.) this “unit of execution” is usually called a thread, or a lightweight process

The Thread Model

2-8 Each thread has a separate hardware execution state: PC, general purpose registers, stack, stack pointer

Multi-threaded Parallel Sum

DATA partialSum T4 (doSum) T3 (doSum) T1 (doSum) T2 (doSum) Main

PC PC PC PC Stack (T4) Stack (T1) Stack (T2) Stack (T3) Stack (Main) Observe that threads share everything (static & dynamic data) No need for explicit utilities to share data

Recall: A single-threaded process’s address space

0xFFFFFFFF stack (func. params & local vars)

heap (dynamic allocated mem) address space

static data (data segment) code (text segment) PC

0x00000000

A multi-threaded process’s address space

0xFFFFFFFF thread 1 stack SP (T1)

thread 2 stack SP (T2)

thread 3 stack SP (T3)

address space

heap (dynamic allocated mem)

static data (data segment) code (text segment) PC (T2)

PC (T1)

PC (T3)

0x00000000

Process vs. Thread

Most modern OS’s (Mach, Chorus, Windows, Solaris, Linux, MacOS) therefore support two entities: the process, which defines the address space and general process resources (such as open files, sockets, etc.) the thread, which defines a sequential unit of execution within a process

A thread is bound to a single process processes, however, can have multiple threads executing within them All threads within a process share the same resources: code, data & heap

A thread is the unit of scheduling processes are just containers in which threads execute

POSIX pthreads

Defined by IEEE in 1995 Provides a standardized API (specification) for: thread creation, management, and synchronization

Function	Description
pthread_create	Creates a new thread
pthread_exit	Terminates the calling thread
pthread_join	Waits for a specific thread to terminate
pthread_detach	Marks a thread for automatic resource cleanup upon termination (no need to call pthread_join for cleanup)
pthread_self	Returns a handle to the calling thread
pthread_cancel	Request the cancelation of another thread

Thread Example (1)

04-Threads/ex2.c

T1 (printMessage) Main

PC PC Stack (Main) Stack (T1)

Thread Example (2)

04-Threads/ex3.c

T1 (func1) T2 (func1) T3 (func2) Main

PC PC PC

Stack (T1) Stack (Main) Stack (T2) Stack (T3)

Thread Example (3): Parallel Sum

04-Threads/ex4.c

Thread Termination/Multi-threaded Process Termination

To terminate, the thread calls pthread_exit() The thread becomes a Zombie Only cleaned up after another thread calls pthread_join() on it The thread can call pthread_detach(), which makes it a detached thread, which means that when the thread terminates, all resources are deleted automatically without the need to call pthread_join() on the thread

A multi-threaded process terminates under two conditions: All threads of the process terminate by calling pthread_exit() That is, the process will terminate only after the last thread calls pthread_exit() One of the threads invokes exit() system call terminating the process

Thread Cancelation

You can also call pthread_cancel() to signal a thread to terminate A thread only responds to a cancellation request when it reaches a cancellation point or explicitly checks for cancellation Examples of standard cancellation points include: read, write and other blocking I/O operations, pthread_mutex_lock, pthread_cond_wait

Windows Threads

handle = CreateThread(...) Create a new thread

WaitForSingleObject(handle) Wait for thread termination

Why multi-threading even in a uniprocessor?

Multi-threading is still very useful in a uniprocessor system even if only one of the threads can be running in the CPU at any given time Handling concurrent events (e.g., web servers, web browsers, editors, etc.) One thread to handle each incoming request (Web server) One thread downloading a page, one thread refreshing the GUI (Web browser) One thread reading user input, one thread refreshing the GUI, one thread saving the file to the disk in the background (Microsoft Word) One thread reading data from the keyboard, another thread reading data from a network socket, one thread updating the GUI (Chat application)

Thus, multi-threading greatly improves and simplifies program structure This is especially true when you have to deal with multiple blocking I/O operations

Concurrency vs Parallelism (1)

Concurrency	Parallelism
Ability of a system to handle multiple tasks or threads, often by interleaving their execution	Ability of the system to execute multiple threads or tasks simultaneously
Does NOT necessarily mean that threads are executed at the same time; it can involve a single-core CPU switching between threads quickly	Parallelism requires multiple execution units, such as multi-core CPUs, where each task/thread can run on a different core
is about structure (dealing with many things at once)	is about execution (doing many things at once)

Concurrency vs Parallelism (2)

Concurrent execution of 4 threads T1, T2, T3 and T4 on a single-core system

Parallel execution of 4 threads T1, T2, T3 and T4 on a dual-core system

Thread Design Space

Legends:

older UNIXes address space

MS/DOS

one thread/process one thread/process

one process many processes thread

Mach, Chorus, Windows, Linux, MacOS

Java

many threads/process many threads/process one process many processes

Threading Models (Windows, Linux): One-to-One

Each user-level thread maps to a kernel thread Creating a user-level thread creates a kernel thread Thread creation and management requires system calls (clone to create) Adv: When a thread blocks for I/O, the kernel can schedule another thread leading to true concurrency Also: The kernel CAN schedule multiple threads in parallel

Threading Models: Many-to-One

Many user-level threads mapped to single kernel thread A user-level library manages thread creation/management But: When a user-level thread blocks for I/O, all the other user-level threads get blocked with it Also: The kernel can NOT schedule multiple user-level threads in parallel Rarely used (Solaris Green Threads, GNU Portable Threads)

Threading Models: Many-to-Many

Many user-level threads mapped to many kernel thread OS allocates a sufficient number of kernel threads to run many user-level threads concurrently and in parallel Windows with the ThreadFiber package Difficult to implement and is NOT very common

Representing Threads in the Kernel: Windows

Threads are implemented as a separate DS (Thread Control Block - TCB), which is then linked to the Process Control Block (PCB)

Representing Threads in the Kernel: Linux

The basic unit, task, (task_struct) actually represents a thread! The task structure has pointers to common process resources Thus, two threads in the same process will point to the same resource structure instance

Semantics of fork() and exec()

Does fork() duplicate only the calling thread or all threads? In Linux, the forked child process will have only one thread by default, which is the thread that called fork()

It is possible to clone all existing threads of the parent process when creating a new process using the clone system call, but it is very low-level and requires careful handling of synchronization issues

exec() works as expected: replace the running process including all threads and have a single thread that starts executing the “main” function