\documentclass[dvips,11pt]{article} \usepackage{listings} \usepackage{fullpage} \usepackage{color} \usepackage[all]{xy} \usepackage{pst-uml} \newtheorem{definition}{Definition} \newtheorem{theorem}[definition]{Theorem} % change codes of < and > \mathcode`<="4268 \mathcode`>="5269 \mathchardef\ls="213C \mathchardef\gr="213E \def\registered{{\ooalign{\hfil\raise .00ex\hbox{\scriptsize R}\hfil\crcr\mathhexbox20D}}} \begin{document} \lstset{mathescape=true} \title{Implementing Concurrent Red-Black Trees in Java} \author{Franck van Breugel\\ Department of Computer Science and Engineering, York University\\ 4700 Keele Street, Toronto, Ontario, Canada, M3J 1P3} \date{\today} \maketitle \begin{abstract} In the previous assignment, we presented three concurrent implementations of red-black trees. In this assignment, we present their implementation in Java. Furthermore, we discuss how we tested our implementations for correctness and performance. \end{abstract} \section{Introduction} In \cite{Breugel10}, we presented three different ways to implement red-black trees concurrently. In all three implementations, we considered only the operations {\sc Contains} and {\sc Add}. Our first implementation uses monitors. Our second implementation is an adaptation of a solution of the readers-writers problem. In our third implementation, we adapt the concurrent implementation of AVL trees by Ellis \cite{Ellis80} to the setting of red-black trees. In this paper, we discuss the implementations of all three in Java. All three Java implementations are based on a Java implementation of the sequential algorithms for the {\sc Contains} and {\sc Add} operations as can be found in \cite{CormenLeisersonRivest89}. We introduce an interface {\tt Set} that contains the methods {\tt contains} and {\tt add}. To avoid name clashes, each implementation resides in a different package. Each package contains a class {\tt RedBlackTree} and an inner class {\tt Node}. The former implements the interface {\tt Set} and the latter represents a node of a red-black tree. The concurrent Java implementation based on monitors is simply obtained from the sequential Java implementation of red-black trees by making the methods {\tt contains} and {\tt add} synchronized. This ensures that no method invocation can interfere with another one. This amounts to the execution of the method invocations one at a time. To implement a variation on a solution to the readers-writers problem is Java, we exploit the class {\tt ReentrantReadWriteLock}. This class implements the interface {\tt ReadWriteLock}. According to the documentation of the Java class library,% \footnote{See {\tt download.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReadWriteLock.html}} ``a {\tt ReadWriteLock} maintains a pair of associated locks, one for read-only operations and one for writing. The read lock may be held simultaneously by multiple reader threads, so long as there are no writers. The write lock is exclusive.'' We use the read lock in the {\tt contains} method and the write lock in the {\tt add} method. The main challenge of the third implementation is the mechanism to lock nodes in different ways. Since the paper by Ellis \cite{Ellis80} does not provide any details, we develop them ourselves. The remainder of the pseudocode can be mapped to Java in a straightforward way. In this paper, we also discuss the tests of all three implementations. Since the three concurrent implementations are based on a sequential implementation, we first test our sequential implementation. In \cite{Breugel10}, we conjectured that multiple threads manipulating the sequential implementation of a red-black tree concurrently may lead to counter-intuitive results. Here, we put that conjecture to the test. Furthermore, we test the correctness of the {\tt contains} and {\tt add} methods in a concurrent settings. Finally, we test the performance of the three concurrent implementations. We focus only on throughput, that is, the number of operations that can be performed on the red-black tree per second. \section{The {\tt Set} Interface} Our interface {\tt Set} is a simplification of the interface {\tt Set} which is part of the package {\tt java.util}. Our interface only contains the methods {\tt contains} and {\tt add}, whereas the one in Java's standard library contains several other methods. In our setting, a {\tt Set} cannot contain {\tt null} whereas a {\tt java.util.Set} can. \begin{verbatim} package collection; /** * A set of elements different from null. * * @author Franck van Breugel */ public interface Set> { /** * Tests whether this tree contains the given element. * * @param element the element for which to search. * @pre. element != null * @return true if this tree contains the given element. */ public boolean contains(T element); /** * Attempts to add the given element to this tree. * The attempt is successful if this tree does not contain * the given element yet. * * @param element the element to be inserted. * @pre. element != null * @return true if the addition is successful, false otherwise. */ public boolean add(T element); } \end{verbatim} \section{The {\tt RedBlackTree} and {\tt Node} Classes} All three implementations consist of two classes: {\tt RedBlackTree} and {\tt Node}. Instances of the innerclass {\tt Node} represent a node of a red-black tree. An instance of the class {\tt RedBlackTree} represents a red-black tree. The UML diagram below contains those attributes and methods that are common to all three implementations. \begin{center} \begin{pspicture}(2,14) \rput(1,13){\rnode{Set}{\umlClass{\emph{Set$<$T$>$}}{+ contains(T) : boolean\\+ add(T) : boolean}}} \rput(1,9){\rnode{RedBlackTree}{\umlClass{RedBlackTree$<$T$>$}{root : Node\\\hline + contains(T) : boolean\\+ add(T) : boolean\\leftRotate(Node)\\rightRotate(Node)\\ + toString() : String}}} \rput(1,3){\rnode{Node}{\umlClass{Node$<$T$>$}{key : T\\left : Node\\right : Node\\parent : Node\\ red : boolean\\\hline + isLeaf() : boolean\\ + isLeft() : boolean\\ + isRight() : boolean\\ + toString() : String}}} \ncS[linestyle=dashed]{Set}{RedBlackTree} \ncputicon{umlHerit} \ncS{RedBlackTree}{Node} \ncputicon{umlAgreg} \nccurve[angleA=0,angleB=45]{Node}{Node} \ncputicon{umlAgreg} \end {pspicture} \end{center} Note that both the {\tt RedBlackTree} class and the {\tt Node} class contain a {\tt toString} method. This method is useful for debugging and testing. The methods {\tt leftRotate} and {\tt rightRotate} are auxiliary methods to the {\tt add} method. \section{The Monitors Approach} The monitors approach presented in \cite{Breugel10} can be mapped to Java in a straightforward way. The only thing that needs to be done to turn a sequential implementation of a red-black tree into a concurrent one is make the methods {\tt contains} and {\tt add} synchronized. \begin{verbatim} public class RedBlackTree> implements Set { ... public synchronized boolean contains(T key) ... public synchronized boolean add(T key) ... } \end{verbatim} \section{The Readers-Writers Approach} The key ingredient of this implementation is the the class {\tt ReentrantReadWriteLock} which implements the interface {\tt ReadWriteLock}. A {\tt ReadWriteLock} has two {\tt Lock}s: a read-lock and a write-lock. The relevant interfaces and classes and their relationships are given in the UML diagram below. \begin{center} \begin{pspicture}(10,8) \rput(2,6){\rnode{ReadWriteLock}{\umlClass{$<<$interface$>>$\\ReadWriteLock}{readLock() : Lock\\writeLock() : Lock}}} \rput(2,2){\rnode{ReentrantReadWriteLock}{\umlClass{ReentrantReadWriteLock}{}}} \rput(10,2){\rnode{Lock}{\umlClass{$<<$interface$>>$\\Lock}{lock()\\unlock()}}} \ncS[]{ReadWriteLock}{ReentrantReadWriteLock}\ncputicon{umlHerit} \ncE[]{ReentrantReadWriteLock}{Lock}\ncputicon{umlAgreg}\naput[npos=0.9]{2} \end{pspicture} \end{center} The read-lock is used in the {\tt contains} method and the write-lock is used in the {\tt add} method as follows. \begin{verbatim} public class RedBlackTree> implements Set { private ReadWriteLock lock; ... public RedBlackTree() { this.lock = new ReentrantReadWriteLock(); ... } public boolean contains(T element) { this.lock.getReadLock().lock(); ... this.lock.getReadLock().unlock(); } public boolean add(T element) { this.lock.getWriteLock().lock(); ... this.lock.getWriteLock().unlock(); } ... } \end{verbatim} \section{The Fine-Grained Locking Approach} Next, we discuss the implementation in Java of our adaptation of the concurrent AVL trees algorithms proposed by Ellis \cite{Ellis80} to red-black trees. Recall that the key idea of this implementation is that individual nodes can be locked in three different ways: $\rho$-locked, $\alpha$-locked and $\xi$-locked. Although threads can hold a lock on the same node, there are some restrictions. The following graph \cite{Ellis80} captures those restrictions. $$ \UseComputerModernTips \xymatrix@R=1ex{ *+<3ex>[o][F**:yellow]{\rho} \ar@{-}@*{[|(2)]}@(ul,dl)[] \ar@{-}@*{[|(2)]}[rr] && *+<3ex>[o][F**:red]{\alpha}\\ & *+<3ex>[o][F**:black][white]{\xi}} $$ If there is an edge between two lock types, then two threads can have a lock of the given type on a particular node at the same time. For example, multiple threads can $\rho$-lock a node and a single thread can $\alpha$-lock that node all at the same time. Most of the pseudocode presented in \cite{Breugel10} can be translated into Java in a straightforward way. The most challenging part of the implementation is the locking and unlocking of the nodes. For that purpose, we add the following methods to the {\tt Node} class. \begin{verbatim} public synchronized void readLock() { ... } public synchronized void readUnlock() { ... } public synchronized void writeLock() { ... } public synchronized void writeUnock() { ... } public synchronized void exclusiveLock() { ... } public synchronized void exclusiveUnlock() { ... } \end{verbatim} The methods {\tt readLock} and {\tt readUnlock} correspond to $\rho$-lock and $\rho$-unlock, respectively. Furthermore, the methods {\tt writeLock} and {\tt writeUnlock} correspond to $\alpha$-lock and $\alpha$-unlock, respectively. Finally, the methods {\tt exclusiveLock} and {\tt exclusiveUnlock} correspond to $\xi$-lock and $\xi$-unlock, respectively. All these methods are synchronized since, as we will see, they manipulate shared data. In order to implement the locking and unlocking, we keep track of the following data: \begin{itemize} \item the number of threads that have $\rho$-locked this node. In order to $\xi$-lock the node, we need to know that no thread has $\rho$-locked it. Hence, we introduce the attribute \begin{verbatim} private int readers; \end{verbatim} which is initialized to zero. \item whether a thread has $\alpha$-locked this node. This allows us to ensure that at most one thread $\alpha$-locks a node. For that purpose we introduce the attribute \begin{verbatim} private boolean write; \end{verbatim} which is initialized to false. \item whether a thread has $\xi$-locked this node. To ensure that a $\xi$-lock is exclusive we introduce the attribute \begin{verbatim} private boolean exclusive; \end{verbatim} which is initialized to false. \end{itemize} The above three attributes capture the state of a node. In the diagram below, we depict how the state of a node changes by performing locking and unlocking. The numbers correspond to the value of the attribute {\tt readers}. In the red states, the attribute {\tt write} has the value true. In the black states, the attribute {\tt exclusive} has the value true. Note that if a thread has $\alpha$-locked a node, that thread can change it into a $\xi$-lock. Once the thread $\xi$-unlocks the node, the node will still be $\alpha$-locked. To distinguish between a node whose $\alpha$-lock was changed into a $\xi$-lock and a node that was unlocked before it was $\xi$-locked, we label the former with an $\alpha$. $$ \UseComputerModernTips \xymatrix@R=4ex@C=6ex{ *+<3ex>[o][F**:black]{\rho} \ar@{<->}@*{[|(2)]}[r]^{\xi} & *+<3ex>[o][F**:white]{0} \ar@{<->}@*{[|(2)]}[r]^{\rho} \ar@{<->}@*{[|(2)]}[dd]^{\alpha} & *+<3ex>[o][F**:yellow]{1} \ar@{<->}@*{[|(2)]}[r]^{\rho} \ar@{<->}@*{[|(2)]}[dd]^{\alpha} & *+<3ex>[o][F**:yellow]{2} \ar@{<->}@*{[|(2)]}[r]^{\rho} \ar@{<->}@*{[|(2)]}[dd]^{\alpha} & \ldots\\ \\ *+<3ex>[o][F**:black][white]{\alpha} \ar@{<->}@*{[|(2)]}[r]_{\xi} & *+<3ex>[o][F**:red]{0} \ar@{<->}@*{[|(2)]}[r]_{\rho} & *+<3ex>[o][F**:red]{1} \ar@{<->}@*{[|(2)]}[r]_{\rho} & *+<3ex>[o][F**:red]{2} \ar@{<->}@*{[|(2)]}[r]_{\rho} & \ldots} $$ The lock and unlock methods are all implemented similarly. Let us only look at the most interesting ones: {\tt exclusiveLock} and {\tt exclusiveUnlock}. \begin{verbatim} public synchronized void exclusiveLock() { while (this.readers != 0) { try { this.wait(); } catch (InterruptedException e) { System.out.println("wait within exclusiveLock was interrupted"); } } this.exclusive = true; } public synchronized void exclusiveUnlock() { this.exclusive = false; this.notifyAll(); } \end{verbatim} \section{Testing} First, we test our sequential implementation. In our tests, we check the invariant that the tree is a red-black tree is maintained. We also check the postcondition of the {\tt contains} and {\tt add} method. To run the tests, we use JUnit% \footnote{{\tt www.junit.org}}. \subsection{Synchronization is Essential} In \cite{Breugel10}, we conjectured that multiple threads manipulating a red-black tree concurrently using the operations {\sc Contains} and {\sc Add} may lead to counter-intuitive results. Consider the following concurrent program. \begin{lstlisting} $\mbox{\sc Add}$(3) $\mbox{\sc Add}$(1) ($\mbox{\sc Add}$(2) $\parallel$ $\mbox{\sc Contains}$(1)) \end{lstlisting} We conjectured that by interleaving the elementary operations of the operations {\sc Add} and {\sc Contains} in a particular way, the operation {\sc Contains} may return false. We ran the Java counterpart of the above code 1,000,000 times on several different machines. In the table below, we present the results. The processor column specifies the number of processors of the machine and the core column describes the number of cores per processor. The columns true and false return the number of times true and false are returned, respectively. \begin{tabular}{rrrr} processor & core & true & false\\ \hline 1 & 1 & 1,000,000 & 0\\ 2 & 1 & 999,997 & 3\\ 8 & 1 & 1,000,000 & 0\\ 2 & 2 & 999,999 & 1\\ 8 & 4 & 999,947 & 53 \end{tabular} The tests confirm that some form of synchronization is essential to avoid counter-intuitive results as described above. Note that some machines did not detect the counter-intuitive results. \subsection{Concurrent Tests} \subsection{Performance Tests} \section{Conclusion} \subsection{Acknowledgements} Thanks to the management, staff, and facilities of the Intel$^\registered$ Manycore Testing Lab% \footnote{{\tt www.intel.com/software/manycoretestinglab}}. \bibliographystyle{plain} \bibliography{assignment2} \end{document}