# Understanding Scapegoat Trees

31 May 2019## What is a scapegoat tree?

Wikipedia: In computer science, a scapegoat tree is a self-balancing binary search tree. It provides worst-case O(log n) lookup time, and O(log n) amortised insertion and deletion time.

This post is a note that attempts to make the original paper by Igal Galperin and Ronald L. Rivest easier to understand.

## Notation

**$\alpha$-weight-balanced**

A binary tree node x is $\alpha$-weight-balanced if, for $\frac{1}{2} \le \alpha < 1$…

**$\alpha$-height-balanced**

Given a binary tree T, let $h_{\alpha}(size(T)) = \lfloor{\log_{\frac{1}{\alpha}}{size(T)}}\rfloor$.

As an example, if $\alpha = 0.5$ and size(T) = 8, then $h_{0.5}(8) = \lfloor{\log_{2}{8}}\rfloor = 3$,

A binary tree node x is $\alpha$-height-balanced if, for $\frac{1}{2} \le \alpha < 1$…

Lemma: if T is $\alpha$-height-balanced, then T is $\alpha$-weight-balanced

**Deep node**

A node’s depth, d(N), is the length of the path from the root to the node.

Given a binary tree T and a node N, we define a node N as deep if $d(N) > h_{\alpha}(T)$. The detection of a deep node would trigger a restructuring (or rebalancing) in the scapegoat tree.

**Attributes**

The node in scapegoat trees only need to store `left`

and `right`

pointers to its child and the `key`

, and the tree itself stores `size`

and `max_size`

attributes. The `max_size`

attribute is the maximal value of the size of the tree since the last time the tree was completely rebuilt. It is used for the deletion method explained later.

## Operation - Insert

To insert a node into a scapegoat tree, we start with inserting with the normal BST insertion algorithm and then increment `size`

and `max_size`

by one.

Then, if the inserted node $X_{0}$ is deep, we climb the tree toward the root until we find a node $X_{i}$ that is not $\alpha$-weight-balanced. $X_{i}$ is then our scapegoat node.

We rebuild the subtree rooted at $X_{i}$ by replacing it with a 0.5-weight-balanced subtree.

First we flatten the subtree by add the nodes into an auxiliary array using in-order traversal, and then we rebuild the tree with ‘divide and conquer’, i.e. recursively finding the mid of the array as the root of the subtree and recursively build the left child and right child.

```
BUILD-TREE(nodeList, start, end):
if end<start:
return nil
mid = (start+end) /2
nodeList[mid].left = BUILT-TREE(nodeList, start, mid-1)
if nodeList[mid].left != nil
nodeList[mid].left.parent = nodeList[mid]
nodeList[mid].right = BUILD-TREE(nodeList, mid+1, end)
if nodeList[mid].right != nil
nodeList[mid].right.parent = nodeList[mid]
return nodeList[mid]
```

**Can we always find a scapegoat node?**

Lemma:

Proof by contradiction:

Since $K_{d(\beta)} = \beta$ is a deep node and d($\beta$)>$h_{\alpha}(T)$ = $\lfloor{\log_{2}{size(T)}}\rfloor$. Also note that $\beta$ is the newly inserted node so the size of the tree rooted at $\beta$ is 1. It follows that…