Python: Coroutine and AsyncIO

ishant_mrinal · October 1, 2023, 1:09pm

Coroutines and asyncio are two powerful tools in Python that can be used to write asynchronous code. Asynchronous programming allows multiple tasks to be executed concurrently, improving performance and scalability.

Asynchronous programming is used to deal with I/O-bound tasks, web scraping, or building responsive applications.

Coroutines in Python are functions that can be suspended and resumed. They are often used to implement asynchronous programming, which allows multiple tasks to be executed concurrently.

Coroutines are defined using the async keyword. Once a coroutine is defined, it can be suspended and resumed by using the await keyword. When a coroutine is suspended, it yields control back to the caller. The caller can then resume the coroutine at a later time.

Asynchronous programming is a programming paradigm that allows multiple tasks to be executed concurrently. This can be useful for applications that respond to multiple events simultaneously, such as web servers and network applications.

asyncio is a Python library that provides support for asynchronous programming. asyncio provides a number of features that make it easy to write asynchronous code, including coroutines, tasks, and event loops.

Here are some key concepts and usages of coroutines:

Cooperative Multitasking: Coroutines allow you to write code that cooperatively multitasks, meaning it can yield control to other coroutines when it’s waiting for certain operations to complete, without blocking the entire program. This enables efficient utilization of resources and can lead to more responsive and scalable applications.
Asynchronous Programming: Coroutines are commonly used for asynchronous programming, where you can write non-blocking code that performs I/O operations (like reading from files or making network requests) without blocking the main thread. This is especially important in user interfaces, web servers, and other situations where responsiveness is critical.
Concurrency: Coroutines can be used to perform concurrent tasks concurrently. This means you can write code that runs multiple tasks simultaneously, but not necessarily in parallel threads. This is useful for tasks like parallelizing IO-bound operations, running background tasks, or managing concurrent access to shared resources.
Generators and Iterators: Coroutines can be used to implement generators and iterators, which allow you to produce and consume values lazily. This can save memory and improve performance when working with large datasets or streams of data.
Stateful Execution: Coroutines can maintain their own state between pauses and resumes, which makes them suitable for scenarios where maintaining context or session information is necessary.
Error Handling: Coroutines typically provide mechanisms for handling errors asynchronously, making it easier to propagate and handle exceptions in asynchronous code.

Parallelism and concurrency

Parallelism is the ability to execute multiple tasks simultaneously. This can be achieved by using multiple processors or cores.

Concurrency is the ability to handle multiple tasks simultaneously, even if they are not executed simultaneously. This can be achieved by using a single processor or core and switching between tasks quickly.

In Python, parallelism and concurrency can be achieved using different libraries and techniques.

Parallelism can be achieved using the multiprocessing library. This library allows you to create multiple processes, each of which can run a different task.

Concurrency can be achieved using the threading library or the asyncio library. The threading library allows you to create multiple threads, each of which can run a different task. The asyncio library is a more modern concurrency library that provides a number of features that make it easier to write asynchronous code.

IO controller

IO controllers, such as disk controllers and network interface controllers (NICs), are responsible for performing input/output (I/O) tasks. These controllers are specialized hardware components designed to manage and control the transfer of data between the computer’s central processing unit (CPU) and external devices like disks, network devices, and more.

Here’s a brief explanation of how device controllers perform I/O tasks:

Disk Controllers:
- Disk controllers manage the communication between the CPU and storage devices, such as hard drives and solid-state drives (SSDs).
- They handle reading data from and writing data to storage devices.
- Disk controllers perform operations like reading/writing sectors, managing disk caches, and handling error correction.
- They offload I/O operations from the CPU, allowing it to continue with other tasks while data is transferred between memory and storage.
Network Interface Controllers (NICs):
- NICs are responsible for handling network communication in a computer.
- They manage the sending and receiving of data packets over a network connection (e.g., Ethernet or Wi-Fi).
- NICs handle low-level tasks like packet assembly, collision detection (in the case of Ethernet), and error correction.
- They provide the necessary hardware and protocols to interface with network cables or wireless networks.
Other Device Controllers:
- Various other device controllers exist for managing different types of I/O operations, such as USB controllers for managing USB devices, GPU controllers for graphics operations, and sound controllers for audio I/O.
- These controllers specialize in managing their respective device types and offload I/O operations from the CPU, making the system more efficient.

Device controllers play a crucial role in managing I/O tasks for various hardware components within a computer. They handle the low-level operations required to interact with external devices, ensuring efficient data transfer and freeing up the CPU to focus on higher-level processing tasks.

Common asyncio functions

asyncio.gather

asyncio.gather() is a function in the Python asyncio library that allows you to wait for multiple coroutines to finish executing before returning. It takes a list of coroutines as input and returns a list of results, in the same order as the input coroutines.

If any of the coroutines raise an exception, gather() will raise an asyncio.FirstException exception. This exception will contain the first exception that was raised, and the other coroutines will be cancelled.

gather() is a useful function for waiting for multiple asynchronous operations to complete. It can be used to implement a variety of concurrent programming patterns, such as asynchronous I/O and cooperative multitasking.

Here is an example of how to use gather():

import asyncio

async def task1():
  await asyncio.sleep(1)
  return 1

async def task2():
  await asyncio.sleep(2)
  return 2

async def main():
  results = await asyncio.gather(task1(), task2())
  print(results)

if __name__ == "__main__":
  asyncio.run(main())

asyncio.run

asyncio.run() is a function in the Python asyncio library that runs an asyncio event loop. The event loop is responsible for scheduling and running asyncio coroutines.

When you call asyncio.run(), it will create a new event loop and run it until it completes. If you already have an event loop running, you can pass it to asyncio.run() to use that event loop instead.

asyncio.run() is the easiest way to run asyncio coroutines. It is also the most efficient way to run asyncio coroutines, because it handles all of the details of managing the event loop for you.

Here is an example of how to use asyncio.run():

import asyncio

async def task():
  await asyncio.sleep(1)
  print("Hello world!")

if __name__ == "__main__":
  asyncio.run(task())

asyncio.run() is a powerful tool that makes it easy to write and run asyncio applications.

Here are some of the benefits of using asyncio.run():

Simplicity: asyncio.run() is the simplest way to run asyncio coroutines. It handles all of the details of managing the event loop for you.
Efficiency: asyncio.run() is the most efficient way to run asyncio coroutines. It uses a single event loop to run all of the coroutines in your application.
Scalability: asyncio.run() is scalable to a large number of concurrent coroutines. It can handle thousands of concurrent coroutines without blocking the UI or other important tasks.

asyncio.run() is the recommended way to run asyncio coroutines in Python. It is simple, efficient, and scalable.

event loop

The event loop in asyncio is a core component that is responsible for scheduling and running asyncio coroutines. It is a single-threaded loop that monitors coroutines, taking feedback on what’s idle, and looking around for things that can be executed in the meantime. It is able to wake up an idle coroutine when whatever that coroutine is waiting on becomes available.

The event loop is responsible for a variety of tasks, including:

Scheduling and running coroutines
Performing network I/O
Running subprocesses
Handling timers and other events

The event loop is a powerful tool that makes it easy to write efficient and scalable concurrent applications.

Here is a simple example of how to use the event loop:

import asyncio

async def task():
  await asyncio.sleep(1)
  print("Hello world!")

if __name__ == "__main__":
  try:
    event_loop = asyncio.get_event_loop()
  except RuntimeError:
    ebent_loop = asyncio.new_event_loop()
  event_loop.run_until_complete(task())
  event_loop.close()

Application developers should typically use the high-level asyncio functions, such as asyncio.run(), and should rarely need to reference the loop object or call its methods.

Examples

Coroutines are designed for IO-bound tasks.

Web scrapping using Asyncio

In this example, we’ll use Python’s asyncio , aiohttp for web scraping, and aiofiles for asynchronous file I/O. Make sure you have aiofiles installed before running the code.

import asyncio
import aiohttp
import aiofiles
from bs4 import BeautifulSoup
from typing import List


async def fetch_url(url: str) -> str:
  """Fetch the HTML content of a given URL asynchronously.

  Args:
    url (str): The URL to fetch.

  Returns:
    str: The HTML content of the URL.
  """
  async with aiohttp.ClientSession() as session:
    async with session.get(url) as response:
      return await response.text()


async def scrape_and_write_to_file(url: str, filename: str) -> None:
  """Scrape a website asynchronously and write the output to a file.

  Args:
    url (str): The URL of the website to scrape.
    filename (str): The name of the file to write the output to.
  """
  try:
    content = await fetch_url(url)
    soup = BeautifulSoup(content, "html.parser")
    # Replace this line with your specific scraping logic.
    # Here, we extract the title of the webpage.
    title = soup.title.string.strip()

    async with aiofiles.open(filename, "w") as file:
      await file.write(f"URL: {url}\n")
      await file.write(f"Title: {title}\n")
  except Exception as e:
    print(f"An error occurred while processing {url}: {str(e)}")


async def main(urls: List[str]) -> None:
  """Main coroutine that scrapes multiple websites concurrently and writes output to files.

  Args:
    urls (List[str]): List of URLs to scrape.
  """
  tasks = []

  for url in urls:
    filename = f"{url.split('//')[1].replace('/', '_')}.txt"
    task = scrape_and_write_to_file(url, filename)
    tasks.append(task)

  await asyncio.gather(*tasks)


if __name__ == "__main__":
  urls_to_scrape = ["https://google.com", "https://fast.com", "https://github.com"]

  asyncio.run(main(urls_to_scrape))

In this example:

We define an async function fetch_url to asynchronously fetch the HTML content of a given URL using aiohttp.
The scrape_and_write_to_file coroutine performs the web scraping and writing to a file. You can replace the scraping logic with your specific requirements. Here, we extract the title of the webpage using the BeautifulSoup library.
In the main function, we specify a list of URLs to scrape. For each URL, we create a task that invokes the scrape_and_write_to_file coroutine, passing the URL and a generated filename.
We use asyncio.gather to execute all the tasks concurrently, allowing us to scrape and write data from multiple websites simultaneously.
The scraped data is written to separate text files named after the website’s domain.