Python: Threads and Multiprocessing

In computer science, a process is a unit of resources, and a thread is a unit of execution and scheduling.

A process can have one/more threads and always have at least one thread.

A process can spawn many threads to take advantage of the multi-cores in the system.

A process as its a unit of resources, has its own address space, while threads don’t have their own address space, it uses the process address space, however, thread has its own stack and registers.

Since the threads in a process share the address space, programmers must be very careful while dealing with shared resources, as it can cause data corruption and wrong results.

Threads get CPU time based on the underlying operating system scheduling algorithm.

In this post we will discuss how to use python threading and multiprocessing module for parallel computing.

Threads

Python has a threading module that allows you to work with threads. The threading module provides a Thread class that you can subclass to create your own threads.

To start a new thread, you create a Thread object and call its start() method. The start() method runs the target function passed to the Thread in a separate thread. For example:

import threading


def factorial(n):
  if n == 0:
    return 1
  else:
    return n * factorial(n - 1)


def print_factorial_output(n):
  print(factorial(n))


def main():
  threads = []
  for i in range(5):
    thread = threading.Thread(target=print_factorial_output, args=(i,))
    threads.append(thread)
    thread.start()

  for thread in threads:
    thread.join()


if __name__ == "__main__":
  main()

Some key points about Python threads:

  • Threads run concurrently within the same process and share memory space and resources. This makes communication between threads efficient but also requires synchronization mechanisms like locks to prevent data corruption.
  • The GIL (Global Interpreter Lock) limits a python process to only one thread executing Python bytecode at a time even on multi-core machines. So python threads alone don’t allow true parallelism on multiple CPUs.
  • Threads are useful when some parts of your program can run independently and in parallel without synchronization overhead. Examples include making IO calls in background threads. Threads are well-suited for building responsive graphical user interfaces (GUIs), as they prevent the main application loop from blocking while waiting for I/O operations.

Detailed example walkthrough

Lets take the following program used to download a few files from github.com

import threading
import urllib.request
import time

# URLs to download
urls = [
    "https://github.com/tensorflow/tensorflow/blob/master/README.md",
    "https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md",
    "https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md",
]


def download_file(url):
  print(f"Starting download: {url}")
  urllib.request.urlretrieve(url, f"{url.split('/')[-1]}")
  print(f"Finished download: {url}")


start = time.time()
threads = []

for url in urls:
  thread = threading.Thread(target=download_file, args=(url,))
  thread.start()
  threads.append(thread)

for thread in threads:
  thread.join()

end = time.time()
print(f"Downloaded {len(urls)} files in {end-start} seconds")

The urlretrieve blocks to download each file before returning. By running it in a separate thread, we can download multiple files concurrently.

Now as we know python GIL only let a single thread run at a time, how does the above code achieve parallelism ?

Even though we create multiple threads, because of the GIL only one Python thread can actually run Python bytecode at a time. So the threads are not executing truly in parallel.

However, the GIL only applies to pure Python threads. When a Python thread is waiting on I/O (like the urlretrieve call in the example), it releases the GIL.

Here is a breakdown of how the Python program runs when threads release the GIL:

  1. The main thread starts and acquires the GIL. It creates and starts worker threads (3 in the above example).
  2. Worker thread 1 starts up and acquires the GIL. It executes some Python code like the download function.
  3. Thread 1 reaches the I/O operation (urllib.urlretrieve). This releases the GIL.
  4. Thread 1 is now blocked waiting on the I/O, allowing thread 2 to acquire the GIL.
  5. Thread 2 runs Python code until it also hits an I/O call, releasing the GIL
  6. Thread 3 runs Python code until it also hits an I/O call, releasing the GIL.
  7. Now none of the threads holds the GIL. The Python interpreter can do internal housekeeping, or C Python code can run parallelly on multiple CPUs if needed.
  8. Thread 1’s I/O completes. It tries to reacquire the GIL by seeking it when available.
  9. Once thread 1 reacquires the GIL, it can resume executing Python code after the I/O call.
  10. Meanwhile, thread 2 and thread 3 are still waiting on I/O. The kernel schedules between the threads based on I/O readiness.
  11. This cycle continues with threads releasing the GIL during I/O and acquiring it when they need to run Python code again.

So in summary:

  • Thread releases GIL when it starts waiting on I/O
  • GIL allows another thread to run or interpreter housekeeping
  • Thread reacquires GIL once I/O is done to resume Python execution
  • Kernel handles thread scheduling based on I/O wait/ready status

So while the threads are not parallel for CPU bound work, they can still provide concurrency by allowing both I/O bound threads to progress.

The overall download still happens faster than serially waiting for each one. But performance gains are limited by the GIL releasing and acquiring overhead.

To get true parallelism in Python for CPU bound work requires multiprocessing rather than threads. But for I/O bound tasks, threads can provide good concurrency in Python despite the GIL limitations.

Here are some ways to identify or determine if an operation in Python code is I/O bound:

  • Check the function name or library - Functions related to filesystem, network, web APIs, databases, etc are likely I/O bound. For example, open(), read(), write(), socket(), urllib.request, pymongo queries.
  • Look for sleep or wait - If a function or method explicitly sleeps/waits on something, it is waiting on I/O. time.sleep(), asyncio.sleep(), await, threading.Event.wait() etc.
  • Examine the code - Trace through the code logic to see if it is waiting on external resources to respond. This could be file/socket reads, network communication, user input, accessing hardware devices etc.
  • Profile the code - Use Python profilers like cProfile or libraries like line_profiler to examine where time is spent. Functions that show high external time (not cpu time) are likely I/O bound.
  • Add instrumentation - Can add simple timers or logs around sections of code to measure time spent. High times indicate I/O waits.
  • Measure resource usage - Use OS tools like top, ps to check threads/processes blocked on I/O when running the code. High IOWAIT% indicates I/O blocking.
  • Check for syscalls - Tools like strace can show syscalls like read, write, recv, send indicating I/O operations.

So in summary, identifying I/O bound operations involves looking for interaction with external resources, measurements that show waiting/blocking, examining profiles and traces, and checking OS level resource usage. Combining these can reliably determine if code is CPU or I/O bound.

Multiprocessing

The multiprocessing module allows Python code to run in parallel on multiple processors and cores. It provides a Process class similar to Thread that can be used to create processes.

The key difference from threads is that processes have their own separate memory/address space. This avoids issues with shared state and the GIL locking in threads. For CPU-bound tasks that can run in parallel, multiprocessing allows full utilization of multiple CPUs.

import multiprocessing


def factorial(n):
  if n == 0:
    return 1
  else:
    return n * factorial(n - 1)


def print_factorial_output(n):
  print(factorial(n))


def main():
  processes = []
  for i in range(5):
    process = multiprocessing.Process(target=print_factorial_output, args=(i,))
    processes.append(process)
    process.start()

  for process in processes:
    process.join()


if __name__ == "__main__":
  main()

Some key points about multiprocessing:

  • Processes have separate memory so communication between them requires inter-process mechanisms like queues, pipes, shared memory. This adds overhead compared to threads.
  • Multiprocessing avoids GIL limitations and allows true parallellism on multi-core machines. But there is process start up overhead.
  • Multiprocessing is best suited for CPU-bound tasks and parallelizing data processing. It is ideal for tasks that require intense computation, such as mathematical calculations, image processing, or data manipulation. For IO-bound tasks, threads are often better.

Detailed example walkthrough

from multiprocessing import Process, Queue


def worker(q1, q2):
  while True:
    item = q1.get()
    if item is None:
      break
    print(f'Processing {item}')
    # Save worker output in another queue
    q2.put(item + 10)


if __name__ == '__main__':

  q1 = Queue()
  q2 = Queue()
  p1 = Process(target=worker, args=(q1, q2))
  p2 = Process(target=worker, args=(q1, q2))
  p1.start()
  p2.start()

  for i in range(10):
    q1.put(i)

  # Wait for workers to finish
  q1.put(None)
  q1.put(None)

  p1.join()
  p2.join()

  # Print the outputs
  while not q2.empty():
    print(q2.get())

In this example:

  • A Queue is used to share work items between the main process and worker processes.
  • The workers get items from the queue and process them, store outputs in another queue.
  • The main process puts work items into the queue to distribute.
  • None is used as a sentinel value to tell workers to exit once the work is done.
  • The processes coordinate and synchronize via the Queue without needing other shared state.

Some uses cases:

  • Producer-consumer parallelism - main thread produces data, workers consume
  • Asynchronous processing - send work to task queue instead of blocking
  • Multistage pipelines - each stage pulls from previous step, pushes results