organized way to store, manage and retrieve data efficiently in a computer so that it can be used effectively
- remember when implementing data structures:
- when adding or removing elements, update the data structure's size/length accordingly
- when adding elements, update the references on the previous element
- when removing elements
- update the references on the next element
- check if data structure is empty and raise error if necessary
linear data structure where each element stores a reference of the address of the next element
| element1 | => | element2 | => | element3 | => | element4 |
| element1 | <=> | element2 | <=> | element3 | <=> | element5 |
| element1 | => | element2 | => | element3 | => | element4 | => | element1 |
-
FIFO: First In First Out
-
basic operations:
queue.push(item): adds an item to the tail of the queue (index 0 of list)queue.pop(): removes and returns an item from the head of the queue (last index of list)queue.peek(): returns an item from the head of the queuequeue.size(): returns the number of items in the queue
data structure where each element has a priority value that determines its processing order
-
elements with higher priority are processed before elements with lower priority
-
typically implemented using a heap data structure
-
basic operations:
enqueue(element, priority): add element with given prioritydequeue(): remove and return highest-priority elementpeek(): view highest-priority element without removing itis_empty(): check if queue is empty
-
LIFO: Last In First Out
-
basic operations:
push()/append(): add element to top of stackpop(): remove element from top of stack
- hash function: maps keys to values
- hash collision: when two keys map to the same value
- can be resolved by chaining (linked list) or open addressing (find next available slot)
- node or vertex
- edge: connection between two nodes in a graph can be directed or undirected
- adjacent nodes: two nodes with an edge directly connecting them
- incident edge
- degree: number of edges that enter or leave a vertex
- path: a sequence of edges that leads from one node to another
- cycle: path where the starting node is the same as the ending node, and no other nodes are repeated
- order: the total number of vertices in the graph
- adjacency matrix: 2D array representation of a graph
- size: n x n where n is number of vertices
matrix[i][j]= 1 if there is an edge from vertex i to jmatrix[i][j]= 0 if there is no edge- for undirected graphs, matrix is symmetric
hierarchical data structure composed by nodes, where each node has a value and may have child nodes
- node: stores data and link to children if they exist
- root: topmost node of a tree
- has no parent
- leaf: a node without children
- parent: node that has a child node
- children: node that has a parent node
- root: topmost node of a tree
- edge: connection between two nodes (parent to child)
- subtree: part of a tree, starting at a node and including its descendants
- height of a tree: the number of edges between the root to the deepest leaf
- if a tree has only one element, its height is 1
- depth of a node: number of edges from the root to the node
- level: the 'floor' of a node
- the root has level 0
- direct children of the root have level 1
- grandchildren of the root have level 2, and so on
- degree: the number of children a node has
- binary trees have a maximum degree of 2
- if vertex has no children:
- remove vertex
- update parent's reference to null
- if vertex has one child:
- connect parent to child
- remove vertex
- if vertex has two children:
- find successor (smallest value in right subtree)
- copy successor's value to vertex
- remove successor
Tip
the name of each traversal order describes when the root is visited
-
pre-order: root is visited first
-
in-order: root is visited in the middle-ish
-
post-order: root is visited last
-
pre-order
- visit parent node
- root gets visited first
- visit node on left subtree then visit the rest of the left subtree
- visit node on right subtree then visit the rest of the right subtree
- rightmost node gets visited last
-
in-order
- visit left subtree in in-order
- leftmost leaf gets visited first
- visit parent node
- root gets visited in the middle
- visit right subtree in in-order
- rightmost node gets visited last
-
post-order
- visit left subtree in post-order
- leftmost leaf gets visited first
- visit right subtree in post-order (on the same level as previous step)
- visit parent node
- root gets visited last
tree where each node has at most 2 children
- children don't follow any particular order based on the value of the node
- binary tree where:
- all levels are completely filled except possibly for the last level
- all nodes are as far left as possible
binary tree-based data structure that satisfies the heap property and also is a complete binary tree
- heap property: parent node is either greater than or less than its children
- max heap: parent's value is greater than children
- min heap: parent's value is less than children
key methods:
insert(): add a new element to the heapdelete(): remove the root element from the heappeek(): return the root element of the heapsize(): return the number of elements in the heapheapify(): convert an array into a heap
binary tree where:
- left child is smaller than parent
- right child is bigger than parent
self-balancing binary search tree
if ( | right subtree height - left subtree height | > 1 ):
balance the tree
the tree is balanced via 2 operations: left rotate and right rotate
T1, T2 and T3 are subtrees of the tree, rooted with y (on the left side) or x (on the right side)
y x
/ \ Right Rotation / \
x T3 - - - - - - - > T1 y
/ \ < - - - - - - - / \
T1 T2 Left Rotation T2 T3
Keys in both of the above trees follow the following order keys(T1) < key(x) < keys(T2) < key(y) < keys(T3) So BST property is not violated anywhere
- summary
- tree that self-balances to reduce time complexity of basic operations (insert, remove, etc)
- rotation
- x and y nodes are defined by ordering (x < y)
- x and y have parent-child relationship and switch relationship when rotating
- the input of the operation is the parent node (not the pivot node)
- the pivot node is the initial child node
- after rotating, the pivot node becomes the parent
- keeps all the nodes and subtrees in the correct order
- insertion, deletion and loop-up are guaranteed O(n * log(n))
- because AVL is balanced, lookups in AVL are faster than red-black
- red-black trees are better for insert intensive tasks
self-balancing search tree
- efficiently manages large amounts of sorted data stored on disk
- commonly used in databases and file systems
properties of B-tree of order m:
- each node can have at most m children