\documentclass[dvips,11pt]{article}

\usepackage{listings}
\usepackage{fullpage}
\usepackage{color}
\usepackage[all]{xy}
\usepackage{pst-uml}

\newtheorem{definition}{Definition}
\newtheorem{theorem}[definition]{Theorem}

% change codes of < and >
\mathcode`<="4268
\mathcode`>="5269
\mathchardef\ls="213C
\mathchardef\gr="213E

\def\registered{{\ooalign{\hfil\raise .00ex\hbox{\scriptsize R}\hfil\crcr\mathhexbox20D}}}

\begin{document}

\lstset{mathescape=true}

\title{Implementing Concurrent Red-Black Trees in Java} 
\author{Franck van Breugel\\
        Department of Computer Science and Engineering, York University\\
        4700 Keele Street, Toronto, Ontario, Canada, M3J 1P3}
\date{\today}
\maketitle

\begin{abstract}
In the previous assignment, we presented three concurrent implementations 
of red-black trees.  In this assignment, we present their implementation
in Java.  Furthermore, we discuss how we tested our implementations
for correctness and performance.
\end{abstract}

\section{Introduction}

In \cite{Breugel10}, we presented three different ways to implement
red-black trees concurrently.  In all three implementations, we
considered only the operations {\sc Contains} and {\sc Add}.
Our first implementation uses monitors.  Our second implementation is 
an adaptation of a solution of the readers-writers problem.  In our
third implementation, we adapt the concurrent implementation of AVL trees 
by Ellis \cite{Ellis80} to the setting of red-black trees.  

In this paper, we discuss the implementations of all three in Java.
All three Java implementations are based on a Java implementation of
the sequential algorithms for the {\sc Contains} and {\sc Add}
operations as can be found in \cite{CormenLeisersonRivest89}.  
We introduce an interface {\tt Set} that contains the methods
{\tt contains} and {\tt add}.  To avoid name clashes, 
each implementation resides in a different package.  Each package
contains a class {\tt RedBlackTree} and an inner class {\tt Node}.
The former implements the interface {\tt Set} and the latter
represents a node of a red-black tree.

The concurrent Java implementation based on monitors is simply
obtained from the sequential Java implementation of red-black
trees by making the methods {\tt contains} and {\tt add}
synchronized.  This ensures that no method invocation can 
interfere with another one.  This amounts to the execution of the method
invocations one at a time.

To implement a variation on a solution to the readers-writers
problem is Java, we exploit the class {\tt ReentrantReadWriteLock}.
This class implements the interface {\tt ReadWriteLock}.
According to the documentation of the Java class library,%
\footnote{See {\tt download.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReadWriteLock.html}}
``a {\tt ReadWriteLock} maintains a pair of associated locks, one 
for read-only operations and one for writing. The read lock may be held 
simultaneously by multiple reader threads, so long as there are no writers. 
The write lock is exclusive.''  We use the read lock in the 
{\tt contains} method and the write lock in the
{\tt add} method.

The main challenge of the third implementation is the mechanism to lock
nodes in different ways.  Since the paper by Ellis \cite{Ellis80}
does not provide any details, we develop them ourselves.  The
remainder of the pseudocode can be mapped to Java in a straightforward
way.

In this paper, we also discuss the tests of all three implementations.  
Since the three concurrent implementations are based on a sequential 
implementation, we first test our sequential implementation.  
In \cite{Breugel10}, we conjectured that multiple threads manipulating the
sequential implementation of a red-black tree concurrently may lead to 
counter-intuitive results.  Here, we put that conjecture to the test.
Furthermore, we test the correctness of the {\tt contains} and {\tt add}
methods in a concurrent settings.  Finally, we test the performance
of the three concurrent implementations.  We focus only on throughput,
that is, the number of operations that can be performed on the red-black
tree per second.

\section{The {\tt Set} Interface}

Our interface {\tt Set} is a simplification of the interface
{\tt Set} which is part of the package {\tt java.util}.
Our interface only contains the methods {\tt contains} and
{\tt add}, whereas the one in Java's standard library contains
several other methods.  In our setting, a {\tt Set} cannot 
contain {\tt null} whereas a {\tt java.util.Set} can.

\begin{verbatim}
package collection;

/**
 * A set of elements different from null.
 *
 * @author Franck van Breugel
 */
public interface Set<T extends Comparable<T>>
{
  /**
   * Tests whether this tree contains the given element.
   * 
   * @param element the element for which to search.
   * @pre. element != null
   * @return true if this tree contains the given element.
   */
  public boolean contains(T element);
    
  /**
   * Attempts to add the given element to this tree.
   * The attempt is successful if this tree does not contain 
   * the given element yet.
   * 
   * @param element the element to be inserted.
   * @pre. element != null
   * @return true if the addition is successful, false otherwise.
   */
  public boolean add(T element);
}
\end{verbatim}

\section{The {\tt RedBlackTree} and {\tt Node} Classes}

All three implementations consist of two classes: {\tt RedBlackTree} 
and {\tt Node}.  Instances of the innerclass {\tt Node} represent
a node of a red-black tree.  An instance of the class {\tt RedBlackTree}
represents a red-black tree.  The UML diagram below contains those 
attributes and methods that are common to all three implementations.

\begin{center}
\begin{pspicture}(2,14)
\rput(1,13){\rnode{Set}{\umlClass{\emph{Set$<$T$>$}}{+ contains(T) : boolean\\+ add(T) : boolean}}}
\rput(1,9){\rnode{RedBlackTree}{\umlClass{RedBlackTree$<$T$>$}{root : Node<T>\\\hline + contains(T) : boolean\\+ add(T) : boolean\\leftRotate(Node<T>)\\rightRotate(Node<T>)\\ + toString() : String}}}
\rput(1,3){\rnode{Node}{\umlClass{Node$<$T$>$}{key : T\\left : Node<T>\\right : Node<T>\\parent : Node<T>\\ red : boolean\\\hline + isLeaf() : boolean\\ + isLeft() : boolean\\ + isRight() : boolean\\ + toString() : String}}}
\ncS[linestyle=dashed]{Set}{RedBlackTree}
\ncputicon{umlHerit}
\ncS{RedBlackTree}{Node}
\ncputicon{umlAgreg}
\nccurve[angleA=0,angleB=45]{Node}{Node}
\ncputicon{umlAgreg}
\end {pspicture}
\end{center}

Note that both the {\tt RedBlackTree} class and the {\tt Node}
class contain a {\tt toString} method.  This method is useful for
debugging and testing.  The methods {\tt leftRotate} and 
{\tt rightRotate} are auxiliary methods to the {\tt add}
method.

\section{The Monitors Approach}

The monitors approach presented in \cite{Breugel10} can be mapped
to Java in a straightforward way.  The only thing that needs to
be done to turn a sequential implementation of a red-black tree
into a concurrent one is make the methods {\tt contains}
and {\tt add} synchronized.
\begin{verbatim}
public class RedBlackTree<T extends Comparable<T>> implements Set<T>
{
  ...
  public synchronized boolean contains(T key)
  ...
  public synchronized boolean add(T key)
  ...
}
\end{verbatim}

\section{The Readers-Writers Approach}

The key ingredient of this implementation is the 
the class {\tt ReentrantReadWriteLock} which
implements the interface {\tt ReadWriteLock}.
A {\tt ReadWriteLock} has two {\tt Lock}s:
a read-lock and a write-lock.  The relevant interfaces
and classes and their relationships are given in the
UML diagram below.
\begin{center}
\begin{pspicture}(10,8)
\rput(2,6){\rnode{ReadWriteLock}{\umlClass{$<<$interface$>>$\\ReadWriteLock}{readLock() : Lock\\writeLock() : Lock}}}
\rput(2,2){\rnode{ReentrantReadWriteLock}{\umlClass{ReentrantReadWriteLock}{}}}
\rput(10,2){\rnode{Lock}{\umlClass{$<<$interface$>>$\\Lock}{lock()\\unlock()}}}
\ncS[]{ReadWriteLock}{ReentrantReadWriteLock}\ncputicon{umlHerit}
\ncE[]{ReentrantReadWriteLock}{Lock}\ncputicon{umlAgreg}\naput[npos=0.9]{2}
\end{pspicture}
\end{center}
The read-lock is used in the {\tt contains} method and the
write-lock is used in the {\tt add} method as follows.
\begin{verbatim}
public class RedBlackTree<T extends Comparable<T>> implements Set<T>
{
  private ReadWriteLock lock;
  ...

  public RedBlackTree()
  {
    this.lock = new ReentrantReadWriteLock();
    ...
  }

  public boolean contains(T element)
  {
    this.lock.getReadLock().lock();
    ...
    this.lock.getReadLock().unlock();
  }

  public boolean add(T element)
  {
    this.lock.getWriteLock().lock();
    ...
    this.lock.getWriteLock().unlock();
  }

  ...
}
\end{verbatim}

\section{The Fine-Grained Locking Approach}

Next, we discuss the implementation in Java of our adaptation of the
concurrent AVL trees algorithms proposed by Ellis \cite{Ellis80} to 
red-black trees.
Recall that the key idea of this implementation is that individual 
nodes can be locked in three different ways: $\rho$-locked,
$\alpha$-locked and $\xi$-locked.  Although threads can hold a lock 
on the same node, there are some restrictions.  The following graph 
\cite{Ellis80} captures those restrictions.
$$
\UseComputerModernTips
\xymatrix@R=1ex{
*+<3ex>[o][F**:yellow]{\rho} \ar@{-}@*{[|(2)]}@(ul,dl)[] \ar@{-}@*{[|(2)]}[rr] && *+<3ex>[o][F**:red]{\alpha}\\
& *+<3ex>[o][F**:black][white]{\xi}}
$$
If there is an edge between two lock types, then two threads can have
a lock of the given type on a particular node at the same time.  For
example, multiple threads can $\rho$-lock a node and a single thread can
$\alpha$-lock that node all at the same time.

Most of the pseudocode presented in \cite{Breugel10} can be translated
into Java in a straightforward way.  The most challenging part of the
implementation is the locking and unlocking of the nodes.  For that
purpose, we add the following methods to the {\tt Node} class.
\begin{verbatim}
  public synchronized void readLock() { ... }
  public synchronized void readUnlock() { ... }
  public synchronized void writeLock() { ... }
  public synchronized void writeUnock() { ... }
  public synchronized void exclusiveLock() { ... }
  public synchronized void exclusiveUnlock() { ... }
\end{verbatim}
The methods {\tt readLock} and {\tt readUnlock} correspond
to $\rho$-lock and $\rho$-unlock, respectively.  Furthermore, the methods 
{\tt writeLock} and {\tt writeUnlock} correspond
to $\alpha$-lock and $\alpha$-unlock, respectively.  Finally, the methods 
{\tt exclusiveLock} and {\tt exclusiveUnlock} correspond
to $\xi$-lock and $\xi$-unlock, respectively.  All these methods are 
synchronized since, as we will see, they manipulate shared data.

In order to implement the locking and unlocking, we keep track of the
following data:
\begin{itemize}
\item
the number of threads that have $\rho$-locked this node.
In order to $\xi$-lock the node, we need to know that
no thread has $\rho$-locked it.  Hence, we introduce the
attribute
\begin{verbatim}
  private int readers;
\end{verbatim}
which is initialized to zero.
\item
whether a thread has $\alpha$-locked this node.
This allows us to ensure that at most one thread
$\alpha$-locks a node.  For that purpose we introduce
the attribute
\begin{verbatim}
  private boolean write;
\end{verbatim}
which is initialized to false.
\item
whether a thread has $\xi$-locked this node.
To ensure that a $\xi$-lock is exclusive we introduce the
attribute
\begin{verbatim}
  private boolean exclusive;
\end{verbatim}
which is initialized to false.
\end{itemize}
The above three attributes capture the state of a node.
In the diagram below, we depict how the state of a node
changes by performing locking and unlocking.  
The numbers correspond to the value of the attribute
{\tt readers}.  In the red states, the attribute
{\tt write} has the value true.  In the black
states, the attribute {\tt exclusive} has the
value true.  Note that
if a thread has $\alpha$-locked a node, that thread
can change it into a $\xi$-lock.  Once the thread 
$\xi$-unlocks the node, the node will still be 
$\alpha$-locked.  To distinguish between a node whose 
$\alpha$-lock was changed into a $\xi$-lock and a node 
that was unlocked before it was $\xi$-locked, we label 
the former with an $\alpha$.
$$
\UseComputerModernTips
\xymatrix@R=4ex@C=6ex{
*+<3ex>[o][F**:black]{\rho} \ar@{<->}@*{[|(2)]}[r]^{\xi} & *+<3ex>[o][F**:white]{0} \ar@{<->}@*{[|(2)]}[r]^{\rho} \ar@{<->}@*{[|(2)]}[dd]^{\alpha} & *+<3ex>[o][F**:yellow]{1} \ar@{<->}@*{[|(2)]}[r]^{\rho} \ar@{<->}@*{[|(2)]}[dd]^{\alpha} & *+<3ex>[o][F**:yellow]{2} \ar@{<->}@*{[|(2)]}[r]^{\rho} \ar@{<->}@*{[|(2)]}[dd]^{\alpha}  & \ldots\\
\\
*+<3ex>[o][F**:black][white]{\alpha} \ar@{<->}@*{[|(2)]}[r]_{\xi} & *+<3ex>[o][F**:red]{0} \ar@{<->}@*{[|(2)]}[r]_{\rho} & *+<3ex>[o][F**:red]{1} \ar@{<->}@*{[|(2)]}[r]_{\rho} & *+<3ex>[o][F**:red]{2} \ar@{<->}@*{[|(2)]}[r]_{\rho} & \ldots}
$$

The lock and unlock methods are all implemented similarly.
Let us only look at the most interesting ones:
{\tt exclusiveLock} and {\tt exclusiveUnlock}.


\begin{verbatim}
public synchronized void exclusiveLock()
{
   while (this.readers != 0)
   {
     try
     {
       this.wait();
     }
     catch (InterruptedException e) 
     {
        System.out.println("wait within exclusiveLock was interrupted");
     }
   }
   this.exclusive = true;
}

public synchronized void exclusiveUnlock()
{
  this.exclusive = false;
  this.notifyAll();
}
\end{verbatim}

\section{Testing}

First, we test our sequential implementation. In our tests, we check
the invariant that the tree is a red-black tree is maintained.  We
also check the postcondition of the {\tt contains} and {\tt add}
method.  To run the tests, we use JUnit%
\footnote{{\tt www.junit.org}}.

\subsection{Synchronization is Essential}

In \cite{Breugel10}, we conjectured that multiple threads manipulating a 
red-black tree concurrently using the operations {\sc Contains} and 
{\sc Add} may lead to counter-intuitive results.  Consider the following 
concurrent program.
\begin{lstlisting}
$\mbox{\sc Add}$(3)
$\mbox{\sc Add}$(1)
($\mbox{\sc Add}$(2) $\parallel$ $\mbox{\sc Contains}$(1))
\end{lstlisting}
We conjectured that by interleaving the elementary operations of the operations 
{\sc Add} and {\sc Contains} in a particular way, the operation 
{\sc Contains} may return false.  We ran the Java counterpart of the 
above code 1,000,000 times on several different machines.  In the table
below, we present the results.  The processor column specifies the number
of processors of the machine and the core column describes the number of
cores per processor.  The columns true and false return the number of times
true and false are returned, respectively.

\begin{tabular}{rrrr}
processor & core & true & false\\
\hline
1 & 1 & 1,000,000 & 0\\
2 & 1 & 999,997 & 3\\
8 & 1 & 1,000,000 & 0\\
2 & 2 & 999,999 & 1\\
8 & 4 & 999,947 & 53
\end{tabular}

The tests confirm that some form of synchronization is essential to
avoid counter-intuitive results as described above.  Note that some
machines did not detect the counter-intuitive results.

\subsection{Concurrent Tests}

\subsection{Performance Tests}

\section{Conclusion}

\subsection{Acknowledgements}

Thanks to the management, staff, and facilities of the Intel$^\registered$
Manycore Testing Lab%
\footnote{{\tt www.intel.com/software/manycoretestinglab}}.

\bibliographystyle{plain}
\bibliography{assignment2}

\end{document}