Data Structures and Algorithms

Data structures and Algorithms are the foundation of computer science. To solve any problems or write any code efficiently it is essential to understand DSA and the underlying theory.

The term data structure signifies specific ways of organizing data for specific types of operations. Data structures are implemented using algorithms.

The term Algorithm for solving a specific task can be defined as a finite sequence of instructions, each of which has a definite meaning and can be performed within a finite amount of time. An algorithm must be precise enough to be understood.

Any computer program comprised of one/many data structures and algorithms. The choice of data structures and algorithms can adversely affect the performance of a program.

For any particular application the choice of DS can be made based on the following metrics:

  1. Insertion time
  2. Deletion time
  3. Access time
  4. Search time

Examples Usages

Hash Table

Suppose we build an application that involves searching data from memory 90% of the time and 10% of the time inserting and deleting data. For this application’s best choice of data, the structure would be a map or Hash Table as the Hash table has constant average-case time complexity for search, insertion, and deletion. Instead of using HashTable if we use List, the application will be prolonged as the average time complexity of searching an element in the list is linear.

Matrix

Take another example of a machine learning application that involves multiplying input data with intermediate data, for this particular implementation one has to select Matrix data structure as the multiplication between two independent data matrices can be paralleled using a hardware accelerator. One should not select DS like List, Tables, etc for this particular problem.

Complexity analysis is required to select DS and algorithms to solve a particular problem.

Time complexity

It represents the number of times a statement (or a group of statements) is executed. The less the number better is the algorithm. It is not the actual time required to complete the operation, as the actual time will depend on the hardware, the underlying OS, and the libraries used.

Constant time complexity

The time required to execute an algorithm does not depend on the length of the underlying data structure. The set of statements specific to the operation is executed for a specific number of times irrespective of the input length and does not change with input length.

Linear time complexity

The time required to execute an algorithm is directly proportional to the length of the underlying data structure. It scales linearly with the input length.

Basic DS

Some common data structures are:

  1. Arrays
  2. Lists
  3. Stacks
  4. Queues
  5. Heaps
  6. Graphs
  7. Trees
  8. Hash Tables
  9. Matrix

Basic Algorithms

Some common fundamentals algorithms, task-wise:

  1. Sorting
    1.Quick sort
    2.Merge sort
    3.Heap sort

  2. Searching
    1.Binary search
    2.AVL search trees
    3.Red-black Trees
    4.Depth/breadth-first search

  3. Hashing
    1.Secure Hash algorithms
    2.Message-digest algorithm

  4. Graph traversal
    1.Floyd’s algorithm
    2.Dijkstra Algorithm
    3.A* algorithm

Use of DSA in daily life

Some of the common examples of using these data structures and algorithms can be found in the Linux kernel which is the brain of many open-source OS including Android mobile phones OS.

https://github.com/torvalds/linux

Hash Tables are used in the Linux kernel file system implementation. It is used to implement inodes, cgroup, file system integrity checks, etc.

https://github.com/torvalds/linux/blob/master/include/linux/hashtable.h

Binary search is used in handling system interrupt, cache loop registration, etc.

https://github.com/torvalds/linux/blob/master/lib/bsearch.c

Merge sort is used in garbage collection, and file system management of the Linux kernel.

https://github.com/torvalds/linux/blob/master/lib/list_sort.c

Heap is used in the Linux kernel.

https://github.com/torvalds/linux/blob/master/include/linux/min_heap.h

Radix Trees is used in memory management, networking, etc

https://github.com/torvalds/linux/blob/master/include/linux/radix-tree.h

Red-black trees are used in virtual memory management to track file descriptor and directory entries and in the CFS (completely fair scheduler) process scheduler.

https://github.com/torvalds/linux/blob/master/include/linux/rbtree.h

Depth-first search is used in searching a namespace tree.

https://github.com/torvalds/linux/blob/master/drivers/acpi/acpica/nswalk.c