\documentclass{article} \usepackage{listings} \usepackage{fullpage} \usepackage{color} \usepackage{tikz} \newtheorem{definition}{Definition} \newtheorem{theorem}[definition]{Theorem} \begin{document} \lstset{language=java,mathescape=true,numbers=left,numberstyle=\tiny} \tikzstyle{vertex}=[draw,circle,minimum size=0.5cm,inner sep=0pt] \title{Concurrent Red-Black Trees} \author{Franck van Breugel\\ Department of Electrical Engineering and Computer Science, York University\\ 4700 Keele Street, Toronto, Ontario, Canada, M3J 1P3} \date{\today} \maketitle \begin{abstract} Three concurrent implementations of red-black trees are presented. Only the operations \lstinline{Contains} and \lstinline{Add} are considered. The first implementation exploits monitors. The second implementation is based on a solution of the readers-writers problem, where the readers are threads that perform the \lstinline{Contains} operation and the writers are the threads that perform the \lstinline{Add} operation. The third implementation is an adaptation of the concurrent implementation of AVL trees by Ellis to the setting of red-black trees. In this implementation, the threads lock nodes of the tree. A node can be locked in three different ways and different threads can have a lock on a node simultaneously. \end{abstract} \section{Introduction} Data structures such as sets can be efficiently implemented by means of red-black trees. For example, the class \lstinline{TreeSet} of the package \lstinline{java.util} of the Java class library has been implemented by means of a red-black tree. A red-black tree is a special type of binary search tree. Such a tree is approximately balanced by colouring the nodes of the tree and placing certain restrictions on the way the nodes can be coloured. With the arrival of multicore machines, there is a need for concurrent implementations of fundamental data structures such as sets. In this paper, we present a sequential implementation and three different concurrent implementations of red-black trees. We first present the sequential implementation as can be found in \cite{CormenLeisersonRivest89}. The first concurrent implementation is a simple modification of the sequential implementation by representing the red-black tree as a monitor. The second concurrent implementation allows for more concurrency by modifying a solution to the readers-writers problem \cite{CourtoisHeymansParnas71}. The third and final concurrent implementation uses fine grain locking to allow even more concurrency. In this implementation, we adapt the approach proposed by Ellis \cite{Ellis80} for concurrent AVL trees to red-black trees. \section{Red-Black Trees} \label{section:red-black-tree} We assume that the reader is familiar with binary search trees (their definition can be found in, for example, \cite[Section~13.1]{CormenLeisersonRivest89}). A red-black tree is a binary search tree where each node has a colour. A node is either coloured red or black. By restricting the way nodes can be coloured, the tree becomes approximately balanced. These trees were first introduced as symmetric binary B-trees by Bayer \cite{Bayer72}. Guibas and Sedgewick \cite{GuibasSedgewick78} characterized these trees by colouring the nodes red or black, leading to the following definition. \begin{definition} \label{definition:red-black-tree} A {\sl red-black tree} is a binary search tree where each node is either coloured red or black and \begin{itemize} \item the root is black, \item each leaf is black, \item if a node is red, then both its children are black, and \item for every node, every path from that node to a leaf contains the same number of black nodes. \end{itemize} \end{definition} For example, the binary search tree \begin{center} \begin{tikzpicture} \node[vertex] (a) at (2,2) [fill,color=black,text=white] {$3$}; \node[vertex] (b) at (1,1) [fill,color=red, text=white] {$1$}; \node[vertex] (c) at (3,1) [fill,color=black] {$0$}; \node[vertex] (d) at (0,0) [fill,color=black] {$0$}; \node[vertex] (e) at (2,0) [fill,color=black] {$0$}; \draw[->] (a)--(b); \draw[->] (a)--(c); \draw[->] (b)--(d); \draw[->] (b)--(e); \end{tikzpicture} \end{center} is a red-black tree. Note that only the internal nodes contain elements. Red-black trees have the following key property. \begin{theorem} A red-black tree with $n$ internal nodes has height at most $2 \log_2(n + 1)$. \end{theorem} A proof of this result can be found in, for example, \cite[Section~14.1]{CormenLeisersonRivest89}. Since a red-black tree is approximately balanced, the operations \lstinline{Contains} and \lstinline{Add} can be implemented efficiently. More precisely, both \lstinline{Contains} and \lstinline{Add} can be implemented in $O(\log_2(n))$, where $n$ is the number of internal nodes of the red-black tree. \section{Sequential Implementation} The \lstinline{Contains} operation for red-black trees can be implemented in exactly the same way as for binary search trees. Its pseudocode can be found in Appendix~\ref{appendix:sequential}. Also the \lstinline{Add} operation for red-black trees is similar to that for binary search trees. If the element $e$, which is to be added, is not already part of the red-black tree, then the appropriate leaf is replaced with the tree \begin{center} \begin{tikzpicture} \node[vertex] (a) at (1,1) [fill,color=red, text=white] {$e$}; \node[vertex] (b) at (0,0) [fill,color=black] {$0$}; \node[vertex] (c) at (2,0) [fill,color=black] {$0$}; \draw[->] (a)--(b); \draw[->] (a)--(c); \end{tikzpicture} \end{center} This modification does not violate condition~1, 2 and 4 of Definition~\ref{definition:red-black-tree}, but it may violate condition~3. To reestablish this condition, the colour of some of the nodes may have to be changed and the structure of the tree may have to be modified. The details can be found in Appendix~\ref{appendix:sequential}. For example, if we add the element~2 to the red-black tree depicted in Section~\ref{section:red-black-tree}, then we first replace the right child of the node containing the element~1 with the tree \begin{center} \begin{tikzpicture} \node[vertex] (a) at (1,1) [fill,color=red, text=white] {$2$}; \node[vertex] (b) at (0,0) [fill,color=black] {$0$}; \node[vertex] (c) at (2,0) [fill,color=black] {$0$}; \draw[->] (a)--(b); \draw[->] (a)--(c); \end{tikzpicture} \end{center} obtaining the tree \begin{center} \begin{tikzpicture} \node[vertex] (a) at (2,3) [fill,color=black,text=white] {$3$}; \node[vertex] (b) at (1,2) [fill,color=red, text=white] {$1$}; \node[vertex] (c) at (3,2) [fill,color=black] {$0$}; \node[vertex] (d) at (0,1) [fill,color=black] {$0$}; \node[vertex] (e) at (2,1) [fill,color=red, text=white] {$2$}; \node[vertex] (f) at (1,0) [fill,color=black] {$0$}; \node[vertex] (g) at (3,0) [fill,color=black] {$0$}; \draw[->] (a)--(b); \draw[->] (a)--(c); \draw[->] (b)--(d); \draw[->] (b)--(e); \draw[->] (e)--(f); \draw[->] (e)--(g); \end{tikzpicture} \end{center} Note that this is not a red-black tree, since the red node labelled~1 has a red child. After restructuring the tree, we obtain the following red-black tree. \begin{center} \begin{tikzpicture} \node[vertex] (a) at (3,2) [fill,color=black,text=white] {$2$}; \node[vertex] (b) at (1,1) [fill,color=red, text=white] {$1$}; \node[vertex] (c) at (5,1) [fill,color=red, text=white] {$3$}; \node[vertex] (d) at (0,0) [fill,color=black] {$0$}; \node[vertex] (e) at (2,0) [fill,color=black] {$0$}; \node[vertex] (f) at (4,0) [fill,color=black] {$0$}; \node[vertex] (g) at (6,0) [fill,color=black] {$0$}; \draw[->] (a)--(b); \draw[->] (a)--(c); \draw[->] (b)--(d); \draw[->] (b)--(e); \draw[->] (c)--(f); \draw[->] (c)--(g); \end{tikzpicture} \end{center} We believe that multiple threads manipulating a red-black tree concurrently using the operations \lstinline{Contains} and \lstinline{Add} may lead to counter-intuitive results. Consider the following concurrent program. \begin{lstlisting} Add(3) Add(1) (Add(2) $\parallel$ Contains(1)) \end{lstlisting} Starting from an empty red-black tree, we first add the elements~3 and 1. This results in the red-black tree depicted in the previous section. Subsequently, one threads adds the element~2 whereas the other thread checks if the tree contains the element~1. One would expect the \lstinline{Contains} operation to always return true. However, we believe that by interleaving the elementary operations of the operations \lstinline{Add} and \lstinline{Contains} in a particular way, the operation \lstinline{Contains} may return false. When the \lstinline{Add} operation modifies the structure of the tree, the \lstinline{Contains} operation may not be able to find the element~1. We plan to confirm this conjecture by our implementation. In the following sections, we present three ways to rule out this undesirable behaviour. \section{The Monitors Approach} \label{section:monitors} A simple way to ensure that the \lstinline{Add} operation does not interfere with the \lstinline{Contains} operation is to implement the red-black tree as a monitor. Monitors were introduced by Brinch Hansen and Hoare in \cite{BrinchHansen73,Hoare74}. Below we use the syntax as used in \cite{Hoare74}. \begin{lstlisting}[emph={begin,end,result,monitor,int},emphstyle=\bf] RedBlackTree : monitor begin root : node procedure contains (element : int, result contains : boolean) begin $\ldots$ end procedure add (element : int, result added : boolean) begin $\ldots$ end root := black node end \end{lstlisting} Within the body of the procedures \lstinline{Contains} and \lstinline{Add} we place the code presented in Appendix~\ref{appendix:sequential}. Since monitor procedures are always mutually exclusive, the \lstinline{Add} procedure never interferes with the \lstinline{Contains} procedure. \section{The Readers-Writers Approach} \label{section:readers} The solution presented in the previous section ensures that one operation at a time is performed on the red-black tree. However, multiple \lstinline{Contains} operations can be performed concurrently without giving rise to undesirable results. To accomplish this, we can easily modify a solution to the readers-writers problem \cite{CourtoisHeymansParnas71}. In our setting, the threads that perform the \lstinline{Contains} operation are the readers and the threads that perform the \lstinline{Add} operation are the writers. For a solution to the readers-writers problem in which no reader waits because a writer is waiting for other readers to finish, we refer the reader to \cite{CourtoisHeymansParnas71}. \section{Locking Nodes} \label{section:locking} In an attempt to increase concurrency even more, we adapt the approach proposed by Ellis \cite{Ellis80} for concurrent AVL trees to red-black trees. The key idea of this implementation is that individual nodes are locked. A node can be locked in three different ways. A thread that searches for an element (by performing the \lstinline{Contains} operation) $\rho$-locks a node of the tree to ensure that the locked node is not part of a restructuring of the tree. A thread that searches for a leaf to add an element (as part of the \lstinline{Add} operation) $\alpha$-locks a node of the tree to prevent another thread, which also wants to add an element, access to the subtree rooted at the locked node. Just before a thread restructures the tree (as part of the \lstinline{Add} operation), it $\xi$-locks the nodes that are part of the restructuring. This prevents other threads from accessing these nodes. Different threads can hold a lock on the same node at the same time. However, there are some restrictions. The following graph \cite{Ellis80} captures those restrictions. \begin{center} \begin{tikzpicture} \node[vertex] (a) at (0,0) {$\rho$}; \node[vertex] (b) at (1,0) {$\alpha$}; \node[vertex] (c) at (2,0) {$\xi$}; \path (a) edge [loop below] (a); \draw[->] (a)--(b); \end{tikzpicture} \end{center} If there is an edge between two lock types, then two threads can have a lock of the given type on a particular node at the same time. For example, multiple threads can $\rho$-lock a node and a single thread can $\alpha$-lock that node all at the same time. \subsection{The \lstinline{Contains} Operation} While searching for an element in the tree, we $\rho$-lock nodes on the path from the root of the tree to either a leaf (if the element is not stored in the tree) or the node containing the element. We start to $\rho$-lock the root of the red-black tree. Assume that, during the search, we have $\rho$-locked a particular node. Before releasing the lock, we first $\rho$-lock the appropriate child of that node. This lock coupling should avoid deadlock. We will try to confirm this with our implementation. The details of the implementation of the \lstinline{Contains} operation can be found in Section~\ref{section:contains}. \subsection{The \lstinline{Add} Operation} The \lstinline{Add} operation consists of two parts. The first part is very similar to the \lstinline{Contains} operation. In the first part, we locate the leaf where the element is to be inserted. While searching for that leaf, we $\alpha$-lock nodes on the path from the root to the leaf. An $\alpha$-lock on a node prevents other threads, which also want to add elements to the tree, access to the subtree rooted at that node, so that this subtree can be modified. We want to keep this subtree as small as possible to allow as much concurrency as possible. Initially, we $\alpha$-lock the root. If we encounter two consecutive black nodes on the path from the root to the leaf, we know that the potential restructuring will be limited to the subtree rooted at the first black node (the one closest to the root). Hence, we $\alpha$-lock this node and we release the lock on the previously $\alpha$-locked node. In this way, the subtree rooted at the $\alpha$-locked node becomes smaller, therefore, allowing more concurrency. After we have inserted the element at a leaf of the tree, we may have to modify the structure of the tree and change the colour of some of the nodes. These changes will be limited to the subtree rooted at the $\alpha$-locked node. Whenever, we change the structure of the tree, we $\xi$-lock all the nodes involved in the restructuring. We lock them in a top-down fashion to avoid deadlock. The details can be found in Section~\ref{section:add}. \section{Conclusion} A lot of work has been done on the concurrent implementation of data structures. We refer the reader to, for example, \cite{MoirShavit05} for an overview. The concurrent red-black tree implementation described in Section~\ref{section:locking} is an adaptation of the concurrent implementation of AVL trees as introduced by Ellis in \cite{Ellis80}. Hanke \cite{Hanke99} also mentions that the implementation of Ellis can be adapted to red-black trees. Nurmi and Soisalon-Soininen \cite{NurmiSoisalonSoininen96} present a slightly different concurrent implementation of red-black trees. Also their work is based on the original work of Ellis. Although the work of Ellis is more than thirty years old, the quest for efficient concurrent implementations of balanced binary search trees is still ongoing (see, for example, \cite{BronsonCasperChafiOlukotun10,DrachslerVechevYahav14}). We have presented three concurrent implementations of red-black trees. The implementation in Section~\ref{section:readers} allows for more concurrency than the one in Section~\ref{section:monitors}. The implementation in Section~\ref{section:locking} gives rise to even more concurrency. However, as the concurrency increases, so does the complexity of the implementation. There seem to be opportunities to increase the amount of concurrency of the implementation in Section~\ref{section:locking}. First of all, rather than locking nodes, we could lock only ``half a node.'' For example, instead of locking a node, we can lock only its left part. In this way, its right child is still available. Secondly, there seem opportunities to decrease the lock granularity. For example, line~73--87 of Section~\ref{section:add} can be modified as follows. \begin{lstlisting}[literate={:=}{{$\gets$}}1,emph={then},emphstyle=\bf] $\xi$-lock grandparent $\xi$-lock parent $\xi$-lock node Left-Child(node, parent) Right-Child(node, grandparent) root := node $\xi$-unlock node Left-Child(grandparent, right) $\xi$-unlock grandparent Right-Child(parent, left) $\xi$-unlock parent \end{lstlisting} Note that \lstinline{left} and \lstinline{right} are not locked at all. Also notice that \lstinline{grandparent} and \lstinline{node} are locked for a ``smaller amount of time.'' Thirdly, we may attempt to avoid using locks completely by using atomic operations such as ``compare and swap.'' \bibliographystyle{plain} \bibliography{myassignment1} \appendix \section{Pseudocode for the Sequential Implementation} \label{appendix:sequential} \subsection{Pseudocode for the \lstinline{Contains} Operation} The following pseudocode is based on the pseudocode found in \cite[Section~13.2]{CormenLeisersonRivest89}. \begin{lstlisting}[literate={:=}{{$\gets$}}1,emph={then},emphstyle=\bf] Contains(e) found := false node := root while node is not a leaf $\wedge$ $\neg$ found do if e = element of node then found := true else if e < element of node then node := left child of node else node := right child of node return found \end{lstlisting} \subsection{Pseudocode for the \lstinline{Add} Operation} The following pseudocode is based on the pseudocode found in \cite[Section~14.3]{CormenLeisersonRivest89}. Before presenting the pseudocode for \lstinline{Add}, we first present some simple operations that will be used in the pseudocode for \lstinline{Add}. The operation \lstinline{Left-Child(p, c)} ensures that the node~\lstinline{c} becomes the left child of the node~\lstinline{p}. \begin{lstlisting}[literate={:=}{{$\gets$}}1] Left-Child(p, c) parent of c := p left child of p := c \end{lstlisting} Similarly, the operation \lstinline{Right-Child(p, c)} ensures that the node~\lstinline{c} becomes the right child of the node~\lstinline{p}. \begin{lstlisting}[literate={:=}{{$\gets$}}1] Right-Child(p, c) parent of c := p right child of p := c \end{lstlisting} \begin{lstlisting}[literate={:=}{{$\gets$}}1,emph={then},emphstyle=\bf] Add(e) found := false node := root while node is not a leaf $\wedge$ $\neg$ found do if e = element of node then found := true else if e < element of node then node := left child of node else node := right child of node if $\neg$ found then colour of node := red element of node := e left := black node right := black node Left-Child(node, left) Right-Child(node, right) while node $\not=$ root $\wedge$ parent of node is red do parent := parent of node grandparent := parent of parent if parent is left child of grandparent then aunt := right child of grandparent if aunt is red then colour of aunt := black colour of parent := black colour of grandparent := red node := grandparent else if node is left child of parent then colour of parent := black colour of grandparent := red sister := right child of parent Right-Child(parent, grandparent) Left-Child(grandparent, sister) if grandparent = root then root := parent else grandgrandparent := parent of grandparent if grandparent is a left child of grandgrandparent then Left-Child(grandgrandparent, parent) else Right-Child(grandgrandparent, parent) else (node is right child of parent) colour of node := black colour of grandparent := red left := left child of node right := right child of node Left-Child(node, parent) Right-Child(node, grandparent) Right-Child(parent, left) Left-Child(grandparent, right) if grandparent = root then root := node else grandgrandparent := parent of grandparent if grandparent is a left child of grandgrandparent then Left-Child(grandgrandparent, node) else Right-Child(grandgrandparent, node) else (parent is right child of grandparent) aunt := left child of grandparent if aunt is red then colour of aunt := black colour of parent := black colour of grandparent := red node := grandparent else if node is right child of parent then colour of parent := black colour of grandparent := red sister := left child of parent Left-Child(parent, grandparent) Right-Child(grandparent, sister) if grandparent = root then root := parent else grandgrandparent := parent of grandparent if grandparent is a left child of grandgrandparent then Left-Child(grandgrandparent, parent) else Right-Child(grandgrandparent, parent) else (node is left child of parent) colour of node := black colour of grandparent := red left := left child of node right := right child of node Right-Child(node, parent) Left-Child(node, grandparent) Left-Child(parent, right) Right-Child(grandparent, left) if grandparent = root then root := node else grandgrandparent := parent of grandparent if grandparent is a left child of grandgrandparent then Left-Child(grandgrandparent, node) else Right-Child(grandgrandparent, node) colour of root := black return $\neg$ found \end{lstlisting} Note that line~59--96 is the mirror image of line~21--58. \section{Pseudocode for the Concurrent Implementation} \label{appendix:concurrent} We augment the pseudocode of Appendix~\ref{appendix:sequential} with the locking of nodes. \subsection{Pseudocode for the \lstinline{Contains} Operation} \label{section:contains} We modify the implementation of the \lstinline{Contains} operation as follows. \begin{lstlisting}[literate={:=}{{$\gets$}}1,emph={then},emphstyle=\bf] Contains(e) found := false node := root $\rho$-lock node while node is not a leaf $\wedge$ $\neg$ found do parent := node if e = element of node then found := true else if e < element of node then node := left child of node else node := right child of node $\rho$-lock node $\rho$-unlock parent $\rho$-unlock node return found \end{lstlisting} Note that line~4, 6, 13, 14 and 15 are new. \subsection{Pseudocode for the \lstinline{Add} Operation} \label{section:add} We modify the implementation of the \lstinline{Add} operation as follows. \begin{lstlisting}[literate={:=}{{$\gets$}}1,emph={then},emphstyle=\bf] Add(e) found := false node := root $\alpha$-lock node locked := node while node is not a leaf $\wedge$ $\neg$ found do parent := node if e = element of node then found := true else if e < element of node then node := left child of node else node := right child of node if node and parent are black and parent $\not=$ locked then $\alpha$-lock parent $\alpha$-unlock locked locked := parent if $\neg$ found then $\xi$-lock node colour of node := red element of node := e left := black node right := black node Left-Child(node, left) Right-Child(node, right) $\xi$-unlock node while node $\not=$ root $\wedge$ parent of node is red do parent := parent of node grandparent := parent of parent if parent is left child of grandparent then aunt := right child of grandparent if aunt is red then colour of aunt := black colour of parent := black colour of grandparent := red node := grandparent else if node is left child of parent then colour of parent := black colour of grandparent := red sister := right child of parent if grandparent = root then $\xi$-lock grandparent $\xi$-lock parent $\xi$-lock sister root := parent Right-Child(parent, grandparent) Left-Child(grandparent, sister) $\xi$-unlock sister $\xi$-unlock parent $\xi$-unlock grandparent else grandgrandparent := parent of grandparent $\xi$-lock grandgrandparent $\xi$-lock grandparent $\xi$-lock parent $\xi$-lock sister if grandparent is a left child of grandgrandparent then Left-Child(grandgrandparent, parent) else Right-Child(grandgrandparent, parent) Right-Child(parent, grandparent) Left-Child(grandparent, sister) $\xi$-unlock sister $\xi$-unlock parent $\xi$-unlock grandparent $\xi$-unlock grandgrandparent else (node is right child of parent) colour of node := black colour of grandparent := red left := left child of node right := right child of node if grandparent = root then $\xi$-lock grandparent $\xi$-lock parent $\xi$-lock node $\xi$-lock left $\xi$-lock right root := node Left-Child(node, parent) Right-Child(node, grandparent) Right-Child(parent, left) Left-Child(grandparent, right) $\xi$-unlock right $\xi$-unlock left $\xi$-unlock node $\xi$-unlock parent $\xi$-unlock grandparent else grandgrandparent := parent of grandparent $\xi$-lock grandgrandparent $\xi$-lock grandparent $\xi$-lock parent $\xi$-lock node $\xi$-lock left $\xi$-lock right if grandparent is a left child of grandgrandparent then Left-Child(grandgrandparent, node) else Right-Child(grandgrandparent, node) Left-Child(node, parent) Right-Child(node, grandparent) Right-Child(parent, left) Left-Child(grandparent, right) $\xi$-unlock right $\xi$-unlock left $\xi$-unlock node $\xi$-unlock parent $\xi$-unlock grandparent else (parent is right child of grandparent) aunt := left child of grandparent if aunt is red then colour of aunt := black colour of parent := black colour of grandparent := red node := grandparent else if node is right child of parent then colour of parent := black colour of grandparent := red sister := left child of parent if grandparent = root then $\xi$-lock grandparent $\xi$-lock parent $\xi$-lock sister root := parent Left-Child(parent, grandparent) Right-Child(grandparent, sister) $\xi$-unlock sister $\xi$-unlock parent $\xi$-unlock grandparent else grandgrandparent := parent of grandparent $\xi$-lock grandgrandparent $\xi$-lock grandparent $\xi$-lock parent $\xi$-lock sister if grandparent is a left child of grandgrandparent then Left-Child(grandgrandparent, parent) else Right-Child(grandgrandparent, parent) Left-Child(parent, grandparent) Right-Child(grandparent, sister) $\xi$-unlock sister $\xi$-unlock parent $\xi$-unlock grandparent $\xi$-unlock grandgrandparent else (node is left child of parent) colour of node := black colour of grandparent := red left := left child of node right := right child of node if grandparent = root then $\xi$-lock grandparent $\xi$-lock parent $\xi$-lock node $\xi$-lock left $\xi$-lock right root := node Right-Child(node, parent) Left-Child(node, grandparent) Left-Child(parent, right) Right-Child(grandparent, left) $\xi$-unlock right $\xi$-unlock left $\xi$-unlock node $\xi$-unlock parent $\xi$-unlock grandparent else grandgrandparent := parent of grandparent $\xi$-lock grandgrandparent $\xi$-lock grandparent $\xi$-lock parent $\xi$-lock node $\xi$-lock left $\xi$-lock right if grandparent is a left child of grandgrandparent then Left-Child(grandgrandparent, node) else Right-Child(grandgrandparent, node) Right-Child(node, parent) Left-Child(node, grandparent) Left-Child(parent, right) Right-Child(grandparent, left) $\xi$-unlock right $\xi$-unlock left $\xi$-unlock node $\xi$-unlock parent $\xi$-unlock grandparent $\xi$-unlock grandgrandparent colour of root := black $\alpha$-unlocked locked return $\neg$ found \end{lstlisting} \end{document}