Differences

This shows you the differences between two versions of the page.

--- projects [2015/08/11 20:43] – jarek
+++ projects [2015/08/26 21:59] (current) – jarek
@@ Line 1: / Line 1: @@
 ====== Proposed Projects for Fall 2015 ======
 \\
+======Clustering High-Dimensional Data Sets======
-===Genome-wide identification of plant micro RNAs===
+**Supervisor:** Suprakash Datta
+Clustering is a basic technique for analyzing data sets. Clustering is the process of grouping data points in a way that points within a group are
+more similar to each other than points in other clusters. Many clustering algorithms have been developed over the years. However no single algorithm works well for all data sets. Further, most clustering algorithms have running times of the order of n^2 or n^3, so that they are not feasible for data sets with hundreds of thousands of points. In this project we will design good clustering algorithms for large real data sets. In particular we are interested in
+Biological data sets.
-**Supervisor: Katalin Hudak**
+Our data sets will include those obtained from Flow Cytometry data. Flow Cytometry is a common technique in many areas of Biology, particularly Immunology. Typical usage involves testing a blood sample for 25 attributes on a per-cell basis, and thus typical data sets are arrays of 500,000 points in a 25 dimensional space. The aim is to identify clusters that correspond to a biologist's notion of a cell "population". In addition to the size of the data, the factors that make this problem difficult are heterogeneity of population sizes and densities and overlapping populations. With the help of collaborators we proposed an accurate algorithm called SWIFT (Cytometry A. 2014 May;85(5):408-33) that successfully finds small cell populations of interest to Immunologists. This project attempts to design and implement algorithms that are faster than SWIFT but at least as accurate. The supervisor has collaborators in Immunology who are experts in interpreting and analyzing Flow Cytometry data. Their expertise will be used to design and validate effective clustering algorithms that can run on large data sets.
+No Biology knowledge is required. The student should be a strong programmer. Knowledge of C/C++ is desirable but not essential. The work involves reading and understanding existing algorithms and working with the supervisor to design and implement improved algorithms and to measure the performance of the proposed algorithm(s).
+For more information, please send email to datta@cse.yorku.ca.
+Required Background: General CSE408x prerequisites
+\\
+======Metaheuristic-based Optimization techniques======
+**Supervisor:** Suprakash Datta
+Optimization is a crucial step in many computational problems. For computational problems that seem (or are known to be) intractable, metaheuristic-based techniques often work well in practice. These are typically randomized algorithms, often inspired by physical or biological systems. Examples of such algorithms include simulated annealing, genetic algorithms and ant colony optimization. In this project we will focus on particle swarm optimization (PSO), a technique inspired by the search for food by flocks of birds or schools of fish. Briefly, a set (or population) of candidate solutions (called particles) are maintained at all times by the algorithm. These particles move in the search-space using simple rules that make use of the best solutions found so far by the particle as well as by the swarm. Movement of particles result in new particles being generated. The process is repeated until some termination criteria are met and the best solution found is output by the algorithm. While there is no guarantee of optimality, PSO has been shown to produce good or very good solutions for many practical problems. Many variants of PSO's have been proposed. In this problem we will study the performance of some PSO variants on both artificial and real optimization problems.
+The student should be a strong programmer. A good grasp of algorithms and knowledge of C/C++ are desirable but not essential. The work involves reading and understanding existing algorithms and working with the supervisor to design and implement improved algorithms and to measure the performance of the proposed algorithm(s).
+For more information, please send email to datta@cse.yorku.ca.
+Required Background: General CSE408x prerequisites
+\\
+======Data visualization in Skydive======
+**Supervisor:** Jarek Gryz
+Skydive is a prototype system designed for database visualization using a concept of the so called
+data pyramid. The system is composed of three modules (DB - Database Module, D2I -
+Data-to-Image module, and VC - Visualizaton Client). Each is designed to use a different type
+of computer memory. The DB module uses disk to store and manage the raw data, and materialized
+data pyramids. The D2I module works with a small subset of the aggregated dataset,
+and stores data in main memory (RAM). The VC module uses the graphic card’s capabilities to
+perform more advanced operations – such as zooming, scaling, panning, and rotation – over the
+graphical representation of the data.
+Currently the system support three presentation models implemented within the Visualization
+Component, namely:
+• a 2D heat-map;
+• a 2.5 D heat-map by 3D barchart; and
+• a 2.5 D terrain (by mesh and UV-mapping).
+The goal of the project is to implement two additional ways of data visualization as well as
+extend some of existing ones, that is:
+. Implement and test functions for data pyramid-based visualization of time series.
+. Implement functions for visualization based on cross-product of data pyramids.
+. Add support for specular and normal maps for 2.5 D terrain presentation model.
+Required Background: CSE 3421, Java programming course, (C programming course a plus)
+\\
+======Genome-wide identification of plant micro RNAs======
+**Supervisor:** Katalin Hudak
@@ Line 50: / Line 118: @@
 \\
+======Dynamic Interface Detection and Control Project======
-===Dynamic Interface Detection and Control Project===
+**Supervisor:** Michael Jenkin
-**Supervisor: Michael Jenkin**
@@ Line 79: / Line 146: @@
 \\
 ====== DDoS Attack using Google-bots ======
-**Supervisor**: Ntalija Vlajic
+**Supervisor:** Natalija Vlajic
 **Recommended Background**: CSE 3213 or CSE 3214, CSE 3482
@@ Line 105: / Line 173: @@
 \\
 ====== Attentive Sensing for Better Two-Way Communication in Remote Learning Environments ======
@@ Line 144: / Line 213: @@
 \\
-====== Hunting for Bugs in Logging: applying JPF to log4j ======
+====== JPF in a Jar ======
 **Supervisor:** Franck van Breugel
 Description:
-Java PathFinder (JPF) is a tool that can detect bugs in Java code.
+JPF, which is short for Java PathFinder, is an open source
-The Java library Apache log4j allows developers to control which log
+tool that has been developed at NASA's Ames Research Center.
-statements are output.  In the past, Dickey et al. [1] have attempted
+The aim of JPF is to find bugs in Java code.  Instead of
-to detect bugs in log4j by means of JPF with very limited succes.
+using testing to find those bugs, JPF uses model checking.
+The facts that JPF is downloaded hundreds of times per month
+and that some of the key papers on JPF have been cited more
+than a thousand times reflect the popularity of JPF. In
+fact it is the most popular model checker for Java.
+A study done by Cambridge University in 2014 found that the
+global cost of debugging code has risen to $312 billion annually.
+Furthermore, on average software developers spend 50% of their
+programming time with finding and fixing bugs.  As a consequence,
+advocating the use tools, such as JPF, may have significant impact.
+Installing JPF is far from trivial.  The tool itself has been
+implemented in Java.  Therefore, it should, in theory, be
+feasible to encapsulate JPF in a Java archive (jar) file.
+This would make it significantly simplifying the installation
+process of JPF and, therefore, make the tool more easily
+accessible to its potential users.
-Recently, in collaboration with Shafiei (NASA) we have developed
+The aim of this project is to attempt to put JPF in a jar.
-an extension of JPF called jpf-nhandler.  The aim of this project
+Since JPF relies on a number of configuration files, so-called
-is to apply this extension to log4j.
+Java properties files, incorporating these properly into the
+jar is one of the challenges.  Setting JPF's classpath is
+another challenge.  Since JPF changes almost on a daily basis,
+our modifications to JPF should ideally be limited to only a
+few classes, yet another challenge.
-[1] David A. Dickey, B. Sinem Dorter, J. Michael German, Benjamin D. Madore, Mark W. Piper, Gabriel L. Zenarosa. "Evaluating Java PathFinder on Log4J."  2011.
+In this project you may collaborate with graduate students
+of the DisCoVeri group (discoveri.eecs.yorku.ca) and
+computer scientists of NASA.  For more information, feel
+free to send email to franck@cse.yorku.ca.
 **Required Background:** General CSE408x prerequisites