User Tools

Site Tools


projects

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
projects [2015/08/17 12:02] jarekprojects [2015/08/26 21:55] jarek
Line 1: Line 1:
 ====== Proposed Projects for Fall 2015 ====== ====== Proposed Projects for Fall 2015 ======
 \\ \\
 +=====Clustering High-Dimensional Data Sets=====
 +
 +**Supervisor: Suprakash Datta**
 +
 +Clustering is a basic technique for analyzing data sets. Clustering is the process of grouping data points in a way that points within a group are
 +more similar to each other than points in other clusters. Many clustering algorithms have been developed over the years. However no single algorithm works well for all data sets. Further, most clustering algorithms have running times of the order of n^2 or n^3, so that they are not feasible for data sets with hundreds of thousands of points. In this project we will design good clustering algorithms for large real data sets. In particular we are interested in 
 +Biological data sets. 
 +
 +Our data sets will include those obtained from Flow Cytometry data. Flow Cytometry is a common technique in many areas of Biology, particularly Immunology. Typical usage involves testing a blood sample for 25 attributes on a per-cell basis, and thus typical data sets are arrays of 500,000 points in a 25 dimensional space. The aim is to identify clusters that correspond to a biologist's notion of a cell "population". In addition to the size of the data, the factors that make this problem difficult are heterogeneity of population sizes and densities and overlapping populations. With the help of collaborators we proposed an accurate algorithm called SWIFT (Cytometry A. 2014 May;85(5):408-33) that successfully finds small cell populations of interest to Immunologists. This project attempts to design and implement algorithms that are faster than SWIFT but at least as accurate. The supervisor has collaborators in Immunology who are experts in interpreting and analyzing Flow Cytometry data. Their expertise will be used to design and validate effective clustering algorithms that can run on large data sets.
 +
 +No Biology knowledge is required. The student should be a strong programmer. Knowledge of C/C++ is desirable but not essential. The work involves reading and understanding existing algorithms and working with the supervisor to design and implement improved algorithms and to measure the performance of the proposed algorithm(s).
 +
 +For more information, please send email to datta@cse.yorku.ca.
 +
 +Required Background: General CSE408x prerequisites 
 +
 +
 +\\
 +
 +
 +=====Metaheuristic-based Optimization techniques=====
 +
 +**Supervisor: Suprakash Datta**
 +
 +Optimization is a crucial step in many computational problems. For computational problems that seem (or are known to be) intractable, metaheuristic-based techniques often work well in practice. These are typically randomized algorithms, often inspired by physical or biological systems. Examples of such algorithms include simulated annealing, genetic algorithms and ant colony optimization. In this project we will focus on particle swarm optimization (PSO), a technique inspired by the search for food by flocks of birds or schools of fish. Briefly, a set (or population) of candidate solutions (called particles) are maintained at all times by the algorithm. These particles move in the search-space using simple rules that make use of the best solutions found so far by the particle as well as by the swarm. Movement of particles result in new particles being generated. The process is repeated until some termination criteria are met and the best solution found is output by the algorithm. While there is no guarantee of optimality, PSO has been shown to produce good or very good solutions for many practical problems. Many variants of PSO's have been proposed. In this problem we will study the performance of some PSO variants on both artificial and real optimization problems.
 +
 +The student should be a strong programmer. A good grasp of algorithms and knowledge of C/C++ are desirable but not essential. The work involves reading and understanding existing algorithms and working with the supervisor to design and implement improved algorithms and to measure the performance of the proposed algorithm(s).
 +
 +For more information, please send email to datta@cse.yorku.ca.
 +
 +Required Background: General CSE408x prerequisites
  
 =====Data visualization in Skydive===== =====Data visualization in Skydive=====
projects.txt · Last modified: 2016/01/13 20:05 by stevenc