Understanding Scapegoat Trees

What is a scapegoat tree?

Wikipedia: In computer science, a scapegoat tree is a self-balancing binary search tree. It provides worst-case O(log n) lookup time, and O(log n) amortised insertion and deletion time.

This post is a note that attempts to make the original paper by Igal Galperin and Ronald L. Rivest easier to understand.

Notation

$\alpha$-weight-balanced

A binary tree node x is $\alpha$-weight-balanced if, for $\frac{1}{2} \le \alpha < 1$…

$\alpha$-height-balanced

Given a binary tree T, let $h_{\alpha}(size(T)) = \lfloor{\log_{\frac{1}{\alpha}}{size(T)}}\rfloor$.

As an example, if $\alpha = 0.5$ and size(T) = 8, then $h_{0.5}(8) = \lfloor{\log_{2}{8}}\rfloor = 3$,

A binary tree node x is $\alpha$-height-balanced if, for $\frac{1}{2} \le \alpha < 1$…

Lemma: if T is $\alpha$-height-balanced, then T is $\alpha$-weight-balanced

Deep node

A node’s depth, d(N), is the length of the path from the root to the node.

Given a binary tree T and a node N, we define a node N as deep if $d(N) > h_{\alpha}(T)$. The detection of a deep node would trigger a restructuring (or rebalancing) in the scapegoat tree.

Attributes

The node in scapegoat trees only need to store left and right pointers to its child and the key, and the tree itself stores size and max_size attributes. The max_size attribute is the maximal value of the size of the tree since the last time the tree was completely rebuilt. It is used for the deletion method explained later.

Operation - Insert

To insert a node into a scapegoat tree, we start with inserting with the normal BST insertion algorithm and then increment size and max_size by one.

Then, if the inserted node $X_{0}$ is deep, we climb the tree toward the root until we find a node $X_{i}$ that is not $\alpha$-weight-balanced. $X_{i}$ is then our scapegoat node.

We rebuild the subtree rooted at $X_{i}$ by replacing it with a 0.5-weight-balanced subtree.

First we flatten the subtree by add the nodes into an auxiliary array using in-order traversal, and then we rebuild the tree with ‘divide and conquer’, i.e. recursively finding the mid of the array as the root of the subtree and recursively build the left child and right child.

BUILD-TREE(nodeList, start, end):
if end<start:
  return nil

mid = (start+end) /2

nodeList[mid].left = BUILT-TREE(nodeList, start, mid-1)
if nodeList[mid].left != nil
  nodeList[mid].left.parent = nodeList[mid]

nodeList[mid].right = BUILD-TREE(nodeList, mid+1, end)
if nodeList[mid].right != nil
  nodeList[mid].right.parent = nodeList[mid]

return nodeList[mid]

Can we always find a scapegoat node?

Lemma:

Proof by contradiction:

Since $K_{d(\beta)} = \beta$ is a deep node and d($\beta$)>$h_{\alpha}(T)$ = $\lfloor{\log_{2}{size(T)}}\rfloor$. Also note that $\beta$ is the newly inserted node so the size of the tree rooted at $\beta$ is 1. It follows that…