Skip list

Invented by Pugh (1990)
Idea:
  • the problem of sorted linked list is that it does not support binary search
  • remedy: add another linked list with “shortcuts” (think of local trains vs. express trains which skip some stations)
  • if the second linked list contains \(\approx \sqrt{n}\) elements while skipping every \(\approx \sqrt{n}\) elements, we can search in \(O(\sqrt{n})\) time
  • ideally, we would have \(\log n\) layers where the first layer would skip \(\approx n/2\) elements, the second layer would skip \(\approx n/4\) elements etc., so we could simulate binary search
  • this is hard to maintain deterministically, but easy using randomization
  • after inserting a new element, toss a coin; if it comes up tails, stop; otherwise insert the element also to the layer above and toss again
Advantages:
  • it can be proved that the simple coin tossing method guarantees logarithmic time with high probability
  • it can be relatively easily modified to support concurrency (access from different threads or processes)
Implementation: ConcurrentSkipListSet and ConcurrentSkipListMap in util.concurrent package in Java
[You can download the application for offline viewing.]

Idea

Let’s reinvent an ordered dictionary data structure, while thinking out of the box (i.e., without using trees).

Let’s start with a simple sorted linked list and see how we could improve it:

The advantage of linked lists (as opposed to arrays) is that it is easy to add and remove elements at any position once you are there. On the other hand, even though the elements are sorted, we can’t directly reach them - we need to follow the links, which is very slow.

We can think of traversing a linked list as a local train which is slow and needs to stop at every village. How can we reach our destination faster? Well, we could take an express train to a nearby city and then continue with a local train. (Express trains are much faster and only stop in big cities.)

Similarly, we could create an additional “express” linked list, so that we could make “bigger jumps” and speed up traversal.

How sparse should it be? Intuitively, if the jumps are too big, the nearest “big city” might be too far and we spend most of the time on the local train. On the other hand, if the jumps are too small, the express train has to stop at many stations and it becomes slower too. So there is a trade-off.

Ideally, the “express train” will stop at every \(s\)-th station, so there will be \(\approx n/s\) stations of the express train and in the worst case, to reach any destination, we need to visit \(\approx n/s + s\) stations. This sum is minimized, when \(s \approx \sqrt n\).

Can we improve it further?

Yes, by adding more tracks. With 3 levels, the ideal configuration is with \(\approx\root 3\of n\) stations of “Shinkansen” bullet train skipping \(\approx n^{2/3}\) stations and \(\approx n^{2/3}\) stations of express train skipping \(\approx\root 3\of n\) stations. This way there would be \(\approx\root 3\of n\) stations between every two express train stations and \(\approx\root 3\of n\) express train stations between every two Shinkansen stations, so we could reach any destination within \(3\times\root3\of n\) hops.

Taking this idea of adding more levels even further, with \(\log n\) levels, we could simulate binary search:

The top level skips half of the list, the next level takes us to the right quarter and so on.

Search: The search algorithm is simple: start at the top level and continue RIGHT until the next element is bigger. Then continue with the next level (DOWN).

This actually reminds of a tree search. If you squint a little, rotate the picture 45° and remove unnecessary edges, you get a decision tree (where LEFT vs. RIGHT corresponds to going DOWN to the next level vs. continuing RIGHT in the same list):

Updates and balancing

To delete an element, we need to find it and remove it from every linked list. To insert an element, we need to find the proper position (in the bottom list) and then insert it to some of the lists. That’s the straightforward part. The tricky question is, how do we keep the lists “balanced” so that searching remains fast. To which lists do we add a new element? And do we need to change height of neighbouring elements?

As we showed above, ideally we would have all the elements on the bottom level, every other element on the second level, every 4th element on the third level (that’s half of those on the second level) and so on. This requirement is too strict to retain efficiently, so we might try to relax it and require that, say that between any two nodes of height \(h\) or higher, there are 1…\(k\) nodes of height \(h-1\).

Although balancing can be done deterministically (see section below), there is a simple and elegant randomized solution: When inserting a new element, we find the proper position and add it to the bottom list. We flip a coin. If it turns TAILS, we are done, but if it turns HEADS, we also add to it to the list above and flip a coin again. We keep promoting the element and adding it to the higher levels, until the random coin flip turns TAILS.

Analysis of randomized skip lists

Note that on average, approximately half of the nodes will get TAILS immediately and approximately half them will be promoted to the second level. Than approximately half of those will continue to the third level and so on. The expected number of elements on level \(\ell\) is \(n/2^{\ell-1}\).

Height

Let \(h(x)\), the height of element \(x\), be the number of levels it belongs to above the base level. It has geometric distribution. The expected height of an element is the same as the expected number of coin flips that turn HEADS before the first TAILS, which is \[E[h(x)] = 1/2 + 1/4 + 1/8 + 1/16 + \cdots = 1.\]

Let \(H = \max_x h(x)\) be the maximum “height” of the skip list. This is the same as doing the coin flipping experiment \(n\) times and taking the longest run of HEADS. Even though \(\approx\)half of the runs have length 0, \(\approx\)half of the runs have length at least one, \(\approx\)quarter of them has length at least two and so on… If we do \(n\) experiments, we might still expect \(\approx 1\) run of length \(\lg n\). On the other hand, there will be a run of length much bigger than \(\lg n\) only with negligible probability.

Since \(\Pr[h(x)\geq\ell] = 1/2^{\ell}\), by the union bound \(\Pr[H \geq \ell] \leq n/2^{\ell}\). So \(\Pr[H \geq c\log n] \leq 1/n^{c-1}\) – the maximum height is \(O(\log n)\) with high probability.

The expected height is \[E[H]=\sum_{\ell\geq 1}\Pr[H\geq\ell]=\sum_{\ell=1}^{\lg n} \underbrace{\Pr[H\geq\ell]}_{\leq 1}+\sum_{\lg n+1}^\infty \underbrace{\Pr[H\geq\ell]}_{n/2^\ell} \leq \lg n + 1.\]

How long does it take to find an element? Recall that when searching for \(x\), we start at the top level and go RIGHT until the next element is larger than \(x\), then go DOWN a level and continue until we get to the bottom level. A very elegant method to analyse this algorithm is the backwards analysis: imagine that we record the algorithm and then run the film backwards. So instead of starting at the beginning of the top list and going RIGHT/DOWN, until we arrive at element \(x\) in the bottom list, we will see the path in reversed order: starting at element \(x\), we go UP and LEFT until we reach the top left corner. More precisely, notice that when there is a choice and would go UP or LEFT, we always go UP! (This is very important, think about why this is the case.)

How many UP and LEFT steps do when make? Well, we go UP when (at the time of adding the given element), the coin toss turned HEADS, i.e., with probability 1/2. We go LEFT, when the coin turned TAILS, i.e., also with probability 1/2. So on average, we do 1 step LEFT for each step UP and as we showed above, there will be logarithmic number of steps UP, because the expected height is logarithmic. Thus, the expected time to find an element is \(O(\log n)\).

Deterministic skip lists*

There are many ways how to balance the skip lists deterministically. One possibility is to add an

INVARIANT: Between any two nodes of height \(h\) or higher (\(h>0\)), there is either 1 or 2 nodes of height \(h-1\).

Such skip lists are called 1-2 skip lists and they basically simulate 2-3 trees.

1-2 skip list corresponding to a 2-3 tree

There will be one or two keys at the top level (these correspond to a root 2- or 3-node) and these split the skip list into two or three parts - these are the children (continue recursively). Here is an example:

2-3 trees are balanced by splitting large nodes, joining small nodes or sharing keys between a small and a big node. These operations can be simulated by increasing or decreasing heights of elements in a skip list. For instance, if we get 3 elements of height \(h-1\) between two nodes of height \(h\) or higher, this corresponds to a 4-node that we need to split by increasing height of the middle element.

References

  • W. Pugh. Skip lists: a probabilistic alternative to balanced trees. Communications of the ACM, 33(6):668-676, 1990. [bib] [pdf]
  • W. Pugh. A skip list cookbook. 1998. [bib] [pdf]
  • W. Pugh. Concurrent maintenance of skip lists. Technical Report CS-TR-2222.1, 1990. [bib] [pdf]
  • J.I. Munro, T. Papadakis, and R. Sedgewick. Deterministic skip lists. In Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms, pages 367-375. Society for Industrial and Applied Mathematics, 1992. [bib] [pdf]
  • L. Devroye. A limit theory for random skip lists. The Annals of Applied Probability, 2(3):597-609, 1992. [bib] [pdf]