About

This is part of the university project at FIIT STU, Bratislava, Slovakia. Goal
is to compare different self-balancing trees to each other, as well as to the
regular Binary Search Tree.

This project comes with a small interaction program - treeio. It can be used
to manually insert and search nodes, as well as pretty print the whole tree.

Compilation

Project contains a Makefile to compile treeio program. Default setting is
to compile it with AVL tree, but you can change this behavior:

You can manually specify default behavior in Makefile.
You can pass target=[tree_name] argument to make command.

If you want to compile program with different tree, then you also need to
change #include statement inside treeio.c to match tree that you want to
compile. E.g., if you want to compile 2-3 tree, then you would include
2_3tree.h header and run:

$ make target=2_3tree

Tree type	Target name
BST	`bst`
AVL	`avltree`
2-3	`2_3tree`
Red-black	`rbtree`

Target names are folder names that contain source files of the given tree.

If you wish to run unit tests, you can add scenarios inside assert file of
the tree and include its header in treeio.c. You will also be required to
pass additional argument to make command in shell.

For example, if you’ve included avltree_assert.h header into treeio.c and
wish to run tests. Then you can add function call inside main:

/* treeio.c */
int main(void)
{
    run_internal_tests();
    ...
}

This will run scenarios specified in assert file.

To compile program with tests enabled, you can run shell command:

$ make assert=true

Please note, that currently there are not many test scenarios inside assert
files and they do not cover implementations fully.

AVL tree

Introduction

AVL tree, where AVL stands for its creators - Adelson-Velsky and Landis, is
a self balancing binary search tree. It is was the first such tree to be invented.

AVL tree has O(logn) time complexity for its operations. It is often compared
to red-black tree due to them both being height balanced.

Each node in AVL tree has a balance factor, which is calculated by substracting
the height of the left subtree from the height of the right subtree. I.e.:

B(N) = H(R(N)) - H(L(N)),

where B(N) is balance factor of the node N, R(N) is the right subtree of
the node N, L(N) is the left subtree of the node N, and H(T) is a
function to calculate height of the subtree T.

Binary search tree is defined as an AVL tree, if and only if, each node of the
tree has balance factor of 0, 1, or -1. Node N with balance factor
of -1 is left heavy, node N with balance factor of 1 is right
heavy, node N with balance factor of 0 is balanced.

Implementation

This implementation is based on wikipedia article as well as
geeksforgeeks article about AVL tree. This implementation includes only
search and insert operations.

For each node we can define a structure:

struct _node {
    struct _node *left;
    struct _node *right;
    char bfactor;
    long key;
};

where bfactor stands for the balance factor.

Search routine is the same as for the regular Binary Search Tree, so we will
skip it and go right to the insertion.

Insertion operation of the AVL tree is similar to one of regular binary search
tree. Simple recursive traversal will do the trick. Important bit comes after
the fact of insertion, which is the rebalancing of tree.

Each time we insert a new node, we calculate balance factor of that node, and
if it is not -1, 0, or 1, then we need to rebalance our tree. This
can be done with node rotation operations, such as left rotation, and
right rotation.

To left rotate node is to set its right subtree to left subtree of its
right subtree, and set itself as the left subtree of its right subtree. I.e.:

struct _node *rotate_left(struct _node *node)
{
    struct _node *right = node->right;
    node->right = right->left;
    right->left = node;
    set_bfactor(node);
    return set_bfactor(right);
}

Right rotation is the same process as the left rotation, but instead of setting
right subtree of the node, we set its left subtree to right subtree of its
left subtree. And after that, similar to the left rotation, we set the node
itself as the right subtree of its left subtree. I.e.:

struct _node *rotate_right(struct _node *node)
{
    struct _node *left = node->left;
    node->left = left->right;
    left->right = node;
    set_bfactor(node);
    return set_bfactor(left);
}

When new node is inserted to AVL tree, there are 4 cases:

Left left case.
Left Right case.
Right Right case.
Right Left case.

Based on these cases, we can define rebalancing routine as follows:

struct _node *rebalance(struct _node *node, long key)
{
    if (!node)
        return NULL;
    if (node->left && node->bfactor > 1) {
        if (key < node->left->key)
            return rotate_right(node);
        node->left = rotate_left(node->left);
        return rotate_right(node);
    }
    if (node->right && node->bfactor < -1) {
        if (key > node->right->key)
            return rotate_left(node);
        node->right = rotate_right(node->right);
        return rotate_left(node);
    }
    return node;
}

2-3 tree

Introduction

2-3 tree is a self-balancing search tree, where each node (often referred to
as the internal node) has either 2 children and 1 key or three children
and 2 keys.

Leaves of 2-3 tree are always on the level, meaning that tree is always balanced.
Traversal algorithm of 2-3 tree goes as follows:

Let T be a 2–3 tree and d be the data element we want to find. If T is
empty, then d is not in T and we’re done.
Let r be the root of T.
Suppose r is a leaf. If d is not in r, then d is not in T.
Otherwise, d is in T. In particular, d can be found at a leaf node.
We need no further steps and we’re done.
Suppose r is a 2-node with left child L and right child R. Let e be the
data element in r. There are three cases:
- If d is equal to e, then we’ve found d in T and we’re done.
- If d < e, then set T to L, which by definition is a 2–3 tree, and
  go back to step 2.
- If d > e, then set T to R and go back to step 2.
Suppose r is a 3-node with left child L, middle child M, and right
child R. Let a and b be the two data elements of r, where a < b.
There are four cases:
- If d is equal to a or b, then d is in T and we’re done.
- If d < a, then set T to L and go back to step 2.
- If a < d < b, then set T to M and go back to step 2.
- If d > b, then set T to R and go back to step 2.

To insert a new node to the tree, we search for the correct position for the
node with traversal algorithm and insert it there. If the node becomes a 4-node,
then we split it into 2 nodes, promoting middle key to the parent node. This
process repeats recursively until we reach the 2-node node, or the root, in
which case we split the root as well, making a new one out of the middle key.

Implementation

This implementation is based on paperwork, which is part of the CS230 Data
Structures course at the Princeton University. This implementation includes only
search and insert operations.

For each internal node we can define a structure:

struct _node {
    bool isfull;
    int low_key;
    int high_key;
    struct _node *left, *middle, *right;
    struct _node *parent;
};

where isfull indicates whether node is 3-node (has 2 keys and 3 children).

Search routine for the 2-3 tree is similar to the regular BST, only difference
being that we have an extra condition for the recursive call when the node is
a 3-node, and search value is between keys of the node. I.e.:

struct _node *search(struct _node *node, int key)
{
    if (!node)
        return NULL;
    if (key == node->low_key || (node->isfull && key == node->high_key))
        return node;
    if (key < node->low_key)
        return search(node->left, key);
    if (node->right && key > node->high_key)
        return search(node->right, key);
    return search(node->middle, key);
}

Insertion is the most complex part of this implementation. Steps are the same
as described in the Introduction. After traversing the tree and finding a spot
for the new node, we insert the key:

struct _node *_insert_key(struct _node *node, int key)
{
    if (node->isfull) {
        node = _split(node, key);
    } else {
        node = _add_key(node, key);
    }
    return _get_root(node);
}

If node is not full, then we just compare new key with the existing key and
insert it to the correct position. If new key is a duplicate, then we throw
EDUPNODE (error duplicate node) and simply return unchanged node:

Throw is a bit misleading, since it will only print an error message’

struct _node *_add_key(struct _node *node, int key)
{
    if (key == node->low_key) {
        throw(EDUPNODE);
        return node;
    }
    if (node->low_key < key) {
        node->high_key = key;
    } else {
        node->high_key = node->low_key;
        node->low_key = key;
    }
    return (node->isfull = true, node);
}

But if node is full, then we are at the step 5 of our insertion algorithm,
meaning we have to split our node. To do so, we sort all 3 keys (two keys of
the node, and the key that is being inserted) and create 2 new nodes from
the smallest and the largest keys. The middle size key gets promoted to the
parent:

struct _node *_split(struct _node *node, int key)
{
    if (key == node->low_key || key == node->high_key) {
        throw(EDUPNODE);
        return node;
    }
    int *keys = _sort_keys(node->low_key, node->high_key, key);
    int prom_key = keys[1];
    struct _node *left, *middle;
    left = _node((struct _node) {
        .isfull     = false,
        .low_key     = keys[0],
        .parent     = node->parent
    });
    middle = _node((struct _node) {
        .isfull     = false,
        .low_key     = keys[2],
        .parent     = node->parent
    });
    free(keys);
    return _merge_with_parent((struct _merge_args) {
        .node         = node,
        .left         = left,
        .middle     = middle,
        .prom_key     = prom_key
    });
}

After that we need to merge our 2 new nodes with the parent node, as well
as add the promoted key to its list of keys. This step can be divided into
3 scenarios:

Node does not have a parent.
Parent node is not full (2-node).
Parent node is full (3-node).

In the first case, we simply return a new node, which has a single key
(promoted key) and two children (newly made 2 nodes).

In the second case, we add promoted key to the parent and attach 2 new
children alongside his single child.

In the third case, we split the parent itself and promote its middle key
to its parent.

As expectedly, this process repeats recursively until we reach the first 2-node
node, or the root.

struct _node *_merge_with_parent(struct _merge_args args)
{
    __DESTRUCT_MERGE_ARGS__;
    if (!node->parent) {
        left->parent = middle->parent = node;
        return _update_node(node, (struct _node) {
            .isfull     = false,
            .low_key     = prom_key,
            .left         = left,
            .middle     = middle
        });
    }
    struct _node *parent = node->parent;
    free(node);
    if (!parent->isfull) {
        if (parent->low_key < prom_key) {
            parent->high_key = prom_key;
            parent->middle = left;
            parent->right = middle;
        } else {
            parent->high_key =parent->low_key;
            parent->low_key = prom_key;
            parent->left = left;
            parent->right = parent->middle;
            parent->middle = middle;
        }
        return (parent->isfull = true, parent);
    }
    int merge_prom_key;
    struct _node *parent_left, *parent_middle;
    if (prom_key < parent->low_key) {
        parent_left = _node((struct _node) {
            .isfull     = false,
            .low_key     = prom_key,
            .left         = left,
            .middle     = middle,
            .parent     = parent->parent
        });
        parent_middle = _node((struct _node) {
            .isfull     = false,
            .low_key     = parent->high_key,
            .left         = parent->middle,
            .middle     = parent->right,
            .parent     = parent->parent
        });
        merge_prom_key = parent->low_key;    
    } else if (prom_key > parent->high_key) {
        parent_left = _node((struct _node) {
            .isfull     = false,
            .low_key     = parent->low_key,
            .left         = parent->left,
            .middle     = parent->middle,
            .parent     = parent->parent
        });
        parent_middle = _node((struct _node) {
            .isfull     = false,
            .low_key     = prom_key,
            .left         = left,
            .middle     = middle,
            .parent     = parent->parent
        });
        merge_prom_key = parent->high_key;
    } else {
        parent_left = _node((struct _node) {
            .isfull     = false,
            .low_key     = parent->low_key,
            .left         = parent->left,
            .middle     = left,
            .parent     = parent->parent
        });
        parent_middle = _node((struct _node) {
            .isfull     = false,
            .low_key     = parent->high_key,
            .left         = middle,
            .middle     = parent->right,
            .parent     = parent->parent
        });
        merge_prom_key = prom_key;
    }
    return _merge_with_parent((struct _merge_args) {
        .node         = parent,
        .left         = parent_left,
        .middle     = parent_middle,
        .prom_key     = merge_prom_key
    });
}

Red-black tree

Introduction

Red-black tree is a self-balancing binary search tree, where each node is
colored either red or black:

Root of the red-black tree is always black.
Each leaf is colored black.
If node is red, then both of its children are black.
Every path from a given node to any of its descendant NIL nodes goes through
the same number of black nodes.

Like AVL tree, Red-black tree is height balanced. The balancing rule is:

The path from the root to the farthest leaf is no more than twice as long as
the path from the root to the nearest leaf

Implementation

This implementation was taken from www.programiz.com and it serves only for
comparison purposes. Although, several alterations to the source code had to be
made. These were primarily to adapt code for the testing and comparison.
Important, these changes did not affect unerlying logic nor performance of
the implementation:

global variable root was removed;
insertion function was renamed to insert, and was tweaked to work without
global variable;
deletion function was renamed to delete and tweaked to work without global
variable (although this function iss never used);
additional search function was implemented;
additional print_node function was implemented;
additional print function was implemented.

This implementation defined an internal node as a structure:

struct rbNode {
    int data, color;
    struct rbNode *link[2];
};

Comparison

All three implementations were tested for time efficiency during insertion and
search with set of keys listed in the randnumbers file in the data directory
of the project. Results were following:

Tree type	Average insertion time	Average search time	Node size
BST	~0.000060s	~0.000002s	24 bytes
AVL	~0.000160s	~0.000001s	24 bytes
2-3	~0.000050s	~0.000001s	48 bytes
Red-black	~0.000052s	~0.000002s	24 bytes

As we can see, the fastest insertions were in 2-3 and Red-black trees,
although 2-3 tree requires twice the size of memory for each node compared to
red-black or the AVL trees.

Slowest insertions were in the AVL tree, but in comparison to the red-black
tree, AVL has better height balance, thus look up operations are faster inside
AVL tree.

Testing

Nearly each function of the implementations was tested with unit tests (except
for the Red-black tree and the regular BST). Test functions including some test
scenarios can be found in the same directory of the source code of the
implementation. I.e., tests for the 2-3 tree implementation, are located in
2_3tree/2_3tree_assert.c.

References

2020, FIIT STU, Bratislava, Slovakia.