Supervisor: Suprakash Datta
Clustering is a basic technique for analyzing data sets. Clustering is the process of grouping data points in a way that points within a group are more similar to each other than points in other clusters. Many clustering algorithms have been developed over the years. However no single algorithm works well for all data sets. Further, most clustering algorithms have running times of the order of n^2 or n^3, so that they are not feasible for data sets with hundreds of thousands of points. In this project we will design good clustering algorithms for large real data sets. In particular we are interested in Biological data sets.
Our data sets will include those obtained from Flow Cytometry data. Flow Cytometry is a common technique in many areas of Biology, particularly Immunology. Typical usage involves testing a blood sample for 25 attributes on a per-cell basis, and thus typical data sets are arrays of 500,000 points in a 25 dimensional space. The aim is to identify clusters that correspond to a biologist's notion of a cell “population”. In addition to the size of the data, the factors that make this problem difficult are heterogeneity of population sizes and densities and overlapping populations. With the help of collaborators we proposed an accurate algorithm called SWIFT (Cytometry A. 2014 May;85(5):408-33) that successfully finds small cell populations of interest to Immunologists. This project attempts to design and implement algorithms that are faster than SWIFT but at least as accurate. The supervisor has collaborators in Immunology who are experts in interpreting and analyzing Flow Cytometry data. Their expertise will be used to design and validate effective clustering algorithms that can run on large data sets.
No Biology knowledge is required. The student should be a strong programmer. Knowledge of C/C++ is desirable but not essential. The work involves reading and understanding existing algorithms and working with the supervisor to design and implement improved algorithms and to measure the performance of the proposed algorithm(s).
For more information, please send email to datta@cse.yorku.ca.
Required Background: General CSE408x prerequisites
Supervisor: Suprakash Datta
Optimization is a crucial step in many computational problems. For computational problems that seem (or are known to be) intractable, metaheuristic-based techniques often work well in practice. These are typically randomized algorithms, often inspired by physical or biological systems. Examples of such algorithms include simulated annealing, genetic algorithms and ant colony optimization. In this project we will focus on particle swarm optimization (PSO), a technique inspired by the search for food by flocks of birds or schools of fish. Briefly, a set (or population) of candidate solutions (called particles) are maintained at all times by the algorithm. These particles move in the search-space using simple rules that make use of the best solutions found so far by the particle as well as by the swarm. Movement of particles result in new particles being generated. The process is repeated until some termination criteria are met and the best solution found is output by the algorithm. While there is no guarantee of optimality, PSO has been shown to produce good or very good solutions for many practical problems. Many variants of PSO's have been proposed. In this problem we will study the performance of some PSO variants on both artificial and real optimization problems.
The student should be a strong programmer. A good grasp of algorithms and knowledge of C/C++ are desirable but not essential. The work involves reading and understanding existing algorithms and working with the supervisor to design and implement improved algorithms and to measure the performance of the proposed algorithm(s).
For more information, please send email to datta@cse.yorku.ca.
Required Background: General CSE408x prerequisites
Supervisor: Jarek Gryz
Skydive is a prototype system designed for database visualization using a concept of the so called data pyramid. The system is composed of three modules (DB - Database Module, D2I - Data-to-Image module, and VC - Visualizaton Client). Each is designed to use a different type of computer memory. The DB module uses disk to store and manage the raw data, and materialized data pyramids. The D2I module works with a small subset of the aggregated dataset, and stores data in main memory (RAM). The VC module uses the graphic card’s capabilities to perform more advanced operations – such as zooming, scaling, panning, and rotation – over the graphical representation of the data. Currently the system support three presentation models implemented within the Visualization Component, namely:
• a 2D heat-map;
• a 2.5 D heat-map by 3D barchart; and
• a 2.5 D terrain (by mesh and UV-mapping).
The goal of the project is to implement two additional ways of data visualization as well as extend some of existing ones, that is:
1. Implement and test functions for data pyramid-based visualization of time series.
2. Implement functions for visualization based on cross-product of data pyramids.
3. Add support for specular and normal maps for 2.5 D terrain presentation model.
Required Background: CSE 3421, Java programming course, (C programming course a plus)
Supervisor: Katalin Hudak
The Hudak Lab in the Biology Department has an opening for a fourth-year Honours student to assist with a bioinformatics project. We study the pokeweed plant, Phytolacca americana, which displays broad-spectrum virus resistance. To evaluate pokeweed gene expression, we recently sequenced the plant’s mRNA and small RNA transcriptomes under jasmonic acid (JA) treatment. JA is a plant hormone that mediates defence against pathogens and insect herbivores. We are interested in learning how pokeweed gene expression is regulated by miRNAs during biotic stress. Please note: no previous knowledge of biology is required.
Working with the support of a PhD student, your project will involve:
1) Prediction of micro RNA (miRNA) targets on the basis of complementary sequence matches
2) Correlation of miRNA and mRNA expression changes to identify genes that are regulated by miRNAs
3) Conducting pathway analysis to determine which biological processes are controlled by miRNAs
4) Construction of a miRNA/target interaction network to visualize predictions This work will contribute to a scientific manuscript on miRNA-mediated gene regulation in pokeweed during response to JA.
Requirements:
1) Pre-requisites as per EECS Calendar
2) Facility with script-writing/modification (in Perl or Python)
3) Preference for students with knowledge of statistics and familiarity with R programming
4) Able to begin in September 2015
Learning outcomes:
1) Manipulate and analyze quantitative biological data
2) Develop and test hypotheses by modifying existing software and writing new script
3) Manage a CentOS computer server to store and facilitate ongoing research
No knowledge of biology is required.
For more information, please see: Hudak Lab website- http://hudak.lab.yorku.ca/
RNA sequencing- http://www.illumina.com/applications/sequencing/rna.html
miRNAs- http://en.wikipedia.org/wiki/MicroRNA
Supervisor: Michael Jenkin
Contrary to most industries, fine chemical manufacturing is dominated by batch production methods. Increasing economic, environmental and safety pressures are motivating a turn towards continuous synthesis. Rather than making products in one big flask, continuous synthesis involves performing chemical reactions by flowing reagents through a tube. Working in this way provides more control over the reaction parameters leading to increases in product quality, and process efficiency and safety. The flow chemistry industry for fine chemical production is a relatively new but burgeoning field with a projected market capacity of billions of dollars by 2018.
Extraction of the reaction mixture for purification and/or further processing is an important step in chemical manufacturing. This is a relatively straightforward operation in batch production, but offers several challenges for flowing processes. In order to facilitate continuous liquid extraction we require a sophisticated control system. This project involves designing, constructing and evaluating a pertinent practical problem in the field.
A key step in the process takes place in a clear tube that is mounted vertically. The tube contains two fluids with a boundary between them. During the process material flows into and out of the tube from the top and the bottom. Chemical reactions take place within this tube and It is essential that the position of the boundary be monitored as its position in the tube is used to control the flow of materials into the tube.
One way of solving this problem is to float a marker at the boundary between the two liquids and to monitor this boundary using a video camera. Although this approach solves the problem, it requires the introduction of a specific float within the tube. Can we build a system that monitors the boundary without resorting to the use of an artificial float?
Specific goals of the project include:
- Develop a computer vision system that can detect and monitor the interface between two miscible fluids of different density.
- Evaluate the performance of the system over a range of different (and typical) fluids
- Explore the use of different illuminant/filter choices to simplify the task for specific fluid combinations.
The successful candidate(s) will have the experience of working with a diverse group of scientists and engineers toward the design and implementation of an automated liquid extraction device with applications across many industries. Upon successful prototyping, you will be able to interact with professionals in high-throughput manufacturing and system integration. Based on project success, you may be invited to join the MACOS(TM) team for implementation and process validation, which may involve opportunities in graduate school. You will have the opportunity to interact with the broad audience of MACOS(TM) technology including governmental regulatory agencies and industrial partners. This project will give you a great opportunity to apply your engineering expertise and gain experience in process implementation and technology transfer.
For further information please contact,
Michael Jenkin (jenkin@cse.yorku.ca) or Michal Organ (organ@yorku.ca)
Supervisor: Natalija Vlajic
Recommended Background: CSE 3213 or CSE 3214, CSE 3482
Not long ago, botnets - networks of compromised computers - were seen as the most effective (if not the only) means of conducting Distributed Denial of Service (DDoS) attacks. However, with the growing popularity and prevalence of application-layer over other types of DDoS attacks, the DDoS execution landscape is becoming increasingly more diverse. An especially interesting new trend is the execution of application-layer DDoS attacks by means of skillfully manipulated Web-crawlers, such as Google-bots. The goal of this project is to design, implement and test a real-world framework consisting of the following: a) the attacker's web-accessible domain specially designed to attract Google-bots and then manipulate them into generating attack traffic towards the target/victim site; b) the victim's Web site set up in Amazon S3 cloud. In addition to the hands-on component, the project will also look into the statistical/numerical estimation of the framework's anticipated 'attack potential' relative to an actual (real-world) target/victim site.
Supervisor: James Elder
Required Background: General CSE408x prerequisites, good programming skills, good math skills, knowledge of C and MATLAB programming languages
One of the challenges in remote learning is to allow students to communicate effectively with the lecturer. For example, when a student asks a question, communication will be more effective if the instructor has a zoomed view of the student’s face, so that s/he can interpret expressions etc.
The goal of this project is to apply attentive sensing technology (www.elderlab.yorku.ca) to this problem. This technology is able to monitor a large environment such as a classroom and direct a high-resolution ‘attentive’ sensor to events of interest.
In particular, working with a senior graduate student or postdoctoral fellow, the successful applicant will:
Supervisor: James Elder
Required Background: Good programming skills; Good math skills; Knowledge of C and MATLAB programming languages
The goal of this project is to modify York University’s patented attentive sensor technology to the sport video recording market. Specific application domains under investigation include skiing, indoor BMX parks, and horse tracks.
The general problem is to use attentive sensing technology (www.elderlab.yorku.ca) to visually detect and track multiple moving agents (e.g., skiers, riders, horses) and to select specific agents for active high-resolution smooth pursuit.
The student will work with senior graduate students, postdoctoral fellows and research scientists to help modify the attentive sensing technology to operate in these domains. Specific tasks include:
1. Ground-truth available datasets 2. Evaluate current attentive algorithms on these datasets 3. Modify these algorithms to improve performance on these datasets
————
Supervisor: Franck van Breugel
Description: JPF, which is short for Java PathFinder, is an open source tool that has been developed at NASA's Ames Research Center. The aim of JPF is to find bugs in Java code. Instead of using testing to find those bugs, JPF uses model checking. The facts that JPF is downloaded hundreds of times per month and that some of the key papers on JPF have been cited more than a thousand times reflect the popularity of JPF. In fact it is the most popular model checker for Java.
A study done by Cambridge University in 2014 found that the global cost of debugging code has risen to $312 billion annually. Furthermore, on average software developers spend 50% of their programming time with finding and fixing bugs. As a consequence, advocating the use tools, such as JPF, may have significant impact.
Installing JPF is far from trivial. The tool itself has been implemented in Java. Therefore, it should, in theory, be feasible to encapsulate JPF in a Java archive (jar) file. This would make it significantly simplifying the installation process of JPF and, therefore, make the tool more easily accessible to its potential users.
The aim of this project is to attempt to put JPF in a jar. Since JPF relies on a number of configuration files, so-called Java properties files, incorporating these properly into the jar is one of the challenges. Setting JPF's classpath is another challenge. Since JPF changes almost on a daily basis, our modifications to JPF should ideally be limited to only a few classes, yet another challenge.
In this project you may collaborate with graduate students of the DisCoVeri group (discoveri.eecs.yorku.ca) and computer scientists of NASA. For more information, feel free to send email to franck@cse.yorku.ca.
Required Background: General CSE408x prerequisites
Supervisor: Zhen Ming (Jack) Jiang (zmjiang at cse dot yorku dot ca)
Required Background: Good programming skills in Java; Good analytical and communication skills; Knowledge in AI and statistics; Interested in large scale software analysis
Short Description: Software engineering data (e.g., source code repositories and bug databases) contains a wealth of information about a project's status and history. The research on Mining Software Repositories (MSR) aims to transform the data from static record-keeping repositories into knowledge, which can guide the software development process. For example, one can derive correct API usage patterns and flag anomalous (and potentially buggy) API usages by mining the source code across many projects in GitHub and Google Code. In this project, the student(s) will research and develop an efficient infrastructure, where MSR researchers and practitioners can share and analyze such data.
Supervisor: Jia Xu
Required Background: At least a B+ in Embedded Systems (CSE3215), MATLAB, C programming skills, solid experience in using a microcontroller such as Arduino.
Project Description:
Model-based design with code generation tools can be used for simulation, rapid prototyping, and hardware-in-the-loop testing of embedded systems. This project explores model-based design and development of embedded systems on various hardware platforms with code generation tools. The selected student will develop and test embedded systems using model-based design and code generation tools such as MathWorks MATLAB /Simulink Coder.
Supervisor: Jia Xu
Required Background: At least a B+ in Embedded Systems (CSE3215), strong C programming skills, solid knowledge of microcontrollers
Description: The C2000 Concerto family of microcontrollers combines two cores on a single-chip with on-chip low latency interprocessor communication between the two cores: a C28x 32-bit control core for real-time control with faster/more loops and small sampling window; and an ARM 32-bit Cortex-M3 host core for communications and general purpose. The selected student will evaluate the capabilities of the C2000 Concerto family of microcontrollers through testing and investigating open source software for real-time control applications that runs on C2000 Concerto Microcontrollers.
Supervisor: Jia Xu
Required Background: At least a B+ in Operating System Fundamentals (CSE3221), strong Ubuntu/Linux, C++ programming, GCC, TCP/IP skills
Description: Real-time bidding (RTB) is a new method of selling and buying online display advertising in real-time one ad impression at a time. Once a bid request has been sent out, all bids must be received within a strict deadline - generally under 100 milliseconds, including network latency. This project explores RTBkit, an open source SDK allowing developers to create customized real time ad bidding systems (for Media Buyers/Bidders).
Supervisor:Sebastian Magierowski
Description: The project requires the construction of components for a ground penetrating radar. The students would have to design microwave boards for the high-frequency components of this unit, on both the transmitter and the receiver. On the transmitter side the board would take a 5-MHz input clock, run it through a series of off-the-shelf amplifiers and then through a shaping circuit that would convert the input into an outgoing series of pulses (still at 5-MHz repetition rate) less than 400-ps in duration each. The bandwidth of the signal is roughly 2-8 GHz and hence requires very careful board layout. The receiver would be a time-shifted sampler, used to sample the returning pulses in progressive periods. This radar circuit is ultimately intended to be positioned on a rover doing ground analysis.
Required Background A background in undergraduate-level electronics is very important. Experience with board level implementations and knowledge of microstrip lines would be helpful, otherwise the basics would have to be picked up during the project.
More project proposals may be added here in the first week of the winter term.