CSE4080

This is an old revision of the document!

Currently offered Projects, Winter 2012

Updated: December 7, 2011

Projects will be added to this page until the beginning of the winter term.

If you have an idea for a project that is not listed here, you are welcome to contact faculty members to find out if they are willing to supervise it. (If you are not sure who to approach as a potential supervisor for a particular project you have in mind, ask the course coordinator, Eric Ruppert.)

Most of the projects listed here are intended for CSE4080. A project is only suitable for CSE4480 if it has a significant security component. They are listed at the bottom of the page.

Developing Fast Speech Recognition Engine using GPU

Supervisor: Hui Jang

Required Background: General prerequisites

Description: Recently, Graphics Processing Units (GPU's) have been widely used as an extremely fast computing vehicle for a variety of real-world applications. Many software programs have been developed for GPU's to take advantage of its multi-core parallel computing architecture (see gpgpu.org). In the past few years, we have developed a state-of-the-art speech recognition engine using anti-C at York and it runs very well in a normal CPU-based platform. In this project, you are required to port this engine (the C source code is available) based on the standard CUDA or OpenCL library to make it run in GPU's. It has been reported that this may lead to a speedup of at least 10 times faster in many speech recognition tasks [1][2].

During the recent years, there is an increasing demand in the job market for programmers who can use GPU's for general purpose computing tasks. This project will serve as a perfect vehicle for you to learn such a cutting-edge programming skill.

References

[1] Kisun You, Jike Chong, Youngmin Yi, Gonina, E., Hughes, C.J., Yen-Kuang Chen, Wonyong Sung, Keutzer, K., “Parallel Scalibility in Speech Recognition: inference engines in large vocabulary continuous speech recognition,” IEEE Signal Processing Magazine, pp.124-135, No. 6, Vol 26, Nov 2009.

[2] Jike Chong, Ekaterina Gonina, Youngmin Yi, Kurt Keutzer, “A Fully Data Parallel WFST-based Large Vocabulary Continuous Speech Recognition on a Graphics Processing Unit,” Proc. of Interspeech 2009, Brigton, UK, 2009.

YUsend Thermal Vacuum (TVAC) Test Manager

Supervisor: Rob Allison (co-supervised with Hugh Chesser, Space Engineering)

Required Background: General CSE408x prerequisites, familiarity with C++ and Windows software tools

Description: The YUsend (York University Space Engineering Nanosatellite Demonstration) Lab has procured a Windows XP-based industrial computer and temperature acquisition card (as well as other hardware) for performing TVAC testing of nanosatellites in the CSIL Lab (PSE 003). A “TVAC Test Manager” application written using LabView's G programming language will oversee the acquisition of temperatures (thermal test outputs) and control of IR lamps (thermal test inputs) during the rather long periods (4 or more days, 24 hours a day) of a TVAC test.

Specific tasks include: 1. Writing temperature acquisition card (OMEGA Engineering CIO-DAS-Temp) drivers for LabView - should be written in Visual C++ or similar and compiled into SubVI format. 2. Write LabView VI's (“Virtual Instrument”) to perform (a) Test set-up activities - checkout of sensor and lamps, assigning neumonics to temperature sensors, setting of alarm conditions for sensors and lamps (b) Acquire and monitor temperature data and control lamp voltage during test, raise operator alarms for temperature or IR lamp anomalous conditions as required © Store temperature and control data for subsequent analysis and reporting. 3. (Optional) Interface the Test Manager with an orbital simulation tool which would be used to compute IR lamp inputs based on a simulation of the nanosatellite's orbital position and attitude (eg - in the sun, lamps on, in eclipse lamps off). The simulation tool is a package called Satellite Toolkit (STK) which has an TCP/IP-based API.

Three-Dimensional Context from Linear Perspective for Video Surveillance Systems

Supervisor: James Elder

Requirements: Good facility with applied mathematics

Description: To provide visual surveillance over a large environment, many surveillance cameras are typically deployed at widely dispersed locations. Making sense of activities within the monitored space requires security personnel to map multiple events observed on two-dimensional security monitors to the three-dimensional scene under surveillance. The cognitive load entailed rises quickly as the number of cameras, complexity of the scene and amount of traffic increases.

This problem can be addressed by automatically pre-mapping two-dimensional surveillance video data into three-dimensional coordinates. Rendering the data directly in three dimensions can potentially lighten the cognitive load of security personnel and make human activities more immediately interpretable.

Mapping surveillance video to three-dimensional coordinates requires construction of a virtual model of the three-dimensional scene. Such a model could be obtained by survey (e.g., using LIDAR), but the cost and time required for each site would severely limit deployment. Wide-baseline uncalibrated stereo methods are developing and have potential utility, but require careful sensor placement, and the difficulty of the correspondence problem limits reliability.

This project will investigate a monocular method for inferring three-dimensional context for video surveillance. The method will make use of the fact that most urban scenes obey the so-called “Manhattan-world” assumption, viz., a large proportion of the major surfaces in the scene are rectangles aligned with a three-dimensional Cartesian grid (Coughlan & Yuille, 2003). This regularity provides strong linear perspective cues that can potentially be used to automatically infer three-dimensional models of the major surfaces in the scene (up to a scale factor). These models can then be used to construct a virtual environment in which to render models of human activities in the scene.

Although the Manhattan world assumption provides powerful constraints, there are many technical challenges that must be overcome before a working prototype can be demonstrated. The prototype requires six stages of processing: 1)The major lines in each video frame are detected. 2) These lines are grouped into quadrilaterals projecting from the major surface rectangles of the scene. 3) The geometry of linear perspective and the Manhattan world constraint are exploited to estimate the three-dimensional attitude of the rectangles from which these quadrilaterals project. 4) Trihedral junctions are used to infer three-dimensional surface contact and ordinal depth relationships between these surfaces. 5) The estimated surfaces are rendered in three-dimensions. 6) Human activities are tracked and rendered within this virtual three-dimensional world.

The student will work closely with graduate students and postdoctoral fellows at York University, as well as researchers at other institutions involved in the project. The student will develop skills in using MATLAB, a very useful mathematical programming environment, and develop an understanding of basic topics in image processing and vision.

For more information on the laboratory: http://www.elderlab.yorku.ca.

Estimating Pedestrian and Vehicle Flows from Surveillance Video

Supervisor: James Elder

Requirements: Good facility with applied mathematics

Description: Facilities planning at both city (e.g., Toronto) and institutional (e.g., York University) scales requires accurate data on the flow of people and vehicles throughout the environment. Acquiring these data can require the costly deployment of specialized equipment and people, and this effort must be renewed at regular intervals for the data to be relevant.

The density of permanent urban video surveillance camera installations has increased dramatically over the last several years. These systems provide a potential source of low-cost data from which flows can be estimated for planning purposes.

This project will explore the use of computer vision algorithms for the automatic estimation of pedestrian and vehicle flows from video surveillance data. The ultimate goal is to provide planners with accurate, continuous, up-to-date information on facility usage to help guide planning.

The student will work closely with graduate students and postdoctoral fellows at York University, as well as researchers at other institutions involved in the project. The student will develop skills in using MATLAB, a very useful mathematical programming environment, and develop an understanding of basic topics in image processing and vision.

For more information on the laboratory: http://www.elderlab.yorku.ca.

An Open Source Structural Equation Modelling Path Diagram to Syntax Application

Supervisor: Jeff Edmonds

Required Skills: No knowledge of statistics/structural equation modelling is required! Input/output requirements will be specified for you. Java and GUI development will be required.

Description: See this document.

Network analysis of EEG data: Understanding connections in the brain

Supervisor: Andrew Eckford

Required Background: CSE 3213 (Computer Networks), CSE 3451 (Signals and Systems), and MATH 2030 (Elementary Probability); or equivalents

Preferred: At least a B in all of the above courses

Description: Electroencephalogram (EEG) data indicates electrical activity at particular locations in the brain. Using EEG data from multiple sensors, it is possible to find correlations among the measurements, and identify “networks” of activity in the brain. These networks help researchers to determine exactly how the brain processes various stimuli.

The tools that are used to analyze communication networks can also be used to analyze brain networks. In this interdisciplinary project, you will work with a collection of EEG data to identify correlated measurements, and determine network-type relationships based on those measurements. To do so, you will apply skills you learned in courses on Signals and Systems, Computer Networks, and Probability. Your work may lead to a research publication.

Data structures for Estonian

Supervisor: Eric Ruppert

Background: good grades in CSE2001, CSE2011, good software design and programming skills; interest in languages; having learned a second (human) language is helpful because you will be more familiar with grammatical concepts.

Description: This project will explore some aspect of machine-aided translation for Estonian. One of the challenges of this project is that you probably do not know Estonian and I only know it at a basic level. The idea is to see if programmers can build a system without too much expert knowledge of the language (but with extensive help from grammar books and occasional queries to native speakers). Estonian is a Uralic language with many interesting grammatical features.

For example, the Estonian noun for “water” comes in many different forms (vesi, vee, vett, vees, vette, veest, veeta, veed, …) depending on the role of the word within the sentence. The dictionary entry for this word is alphabetized under its first form (vesi), and may or may not give a couple of other basic forms (vee, vett). All other forms can (usually) be derived from these basic forms by applying rules. Thus, if you see the word veest, you would have to know that it is a form of vesi before being able to look it up in a typical dictionary.

The exact topic to be studied for the project will depend on student's interests. Possible topics include the following.

Given a dictionary entry for a word (e.g. vesi, vee, vett), compute other forms (e.g. veest) by applying rules or using statistical methods of machine learning. Or, conversely, given one form that appears in a sentence (veest) find its dictionary entry (under vesi).
Design a data structure to represent the meaning of a sentence in a way that would be useful for generating an Estonian sentence with that meaning (or, conversely, extracting the meaning from an Estonian sentence).
Pick some limited subdomain of the language (e.g. phrases involving time: “in two days”, “during September”, “5 years ago”, “next week”) and design a module to translate from English to Estonian or vice versa.
Survey existing work on Estonian computational linguistics, identify existing tools that would be helpful, and see how they could be incorporated into this work.

MF7114 Assembler

Supervisor: Zbigniew Stachniak

Required Background: Some knowledge of microprocessor architecture and assembly programming

Description: Every microprocessor is supported by a variety of software tools, such as assemblers, disassemblers, and debuggers to allow the development and testing of application programs destined for that microprocessor. The purpose of an assembler is to translate a program written in the target CPU's assembly language into that CPU's machine language. The objective of this project is to write an assembler for the MF7114 microprocessor and test it on a recently written MF7114 emulator.

Background Information: The MF7114 CPU was the first microprocessor designed and manufactured in Canada (by Microsystems International Ltd, or MIL) and one of the earliest microprocessors ever produced. The microprocessor was used, among other applications as the CPU of the CPS-1 microcomputer. Although none of the CPS/1 computers (nor MF7114 software) have survived, technical information about the microprocessor and the CPS-1 has been preserved. This makes the design and implementation of an assembler possible. More information on

http://www.cse.yorku.ca/museum/collections/MIL/MIL.htm

MF7114 Debugger

Supervisor: Zbigniew Stachniak

Required Background: Some knowledge of microprocessor architecture and assembly programming

Description: Every microprocessor is supported by a variety of software tools, such as assemblers,disassemblers, and debuggers to allow the development and testing of application programs destined for that microprocessor. The purpose of an MF7114 debugger is to debug programs written in the assembly language of the MF7114 microprocessor. The objective of this project is to write an MF7114 debugger and test it on a recently written MF7114 emulator.

Background Information: The MF7114 CPU was the first microprocessor designed and manufactured in Canada (by Microsystems International Ltd, or MIL) and one of the earliest microprocessors ever produced. The microprocessor was used, among other applications as the CPU of the CPS-1 microcomputer. Although none of the CPS/1 computers (nor MF7114 software) have survived, technical information about the microprocessor and the CPS-1 has been preserved. This makes the design and implementation of a debugger possible. More information on

http://www.cse.yorku.ca/museum/collections/MIL/MIL.htm

Athenians Data Project

Supervisor: Nick Cercone

Required Background: General CSE408x prerequisites

Recommended Background: Data Mining

Description: The Athenians Project is a multi-year, ongoing project of compiling, computerizing and studying data about the persons of ancient Athens. Possible project ideas for this term span from simpler ones such as how to present data in the best possible way, add spatial characteristics to existing data, add multimedia data, improve text searching, etc. to more complex ideas such as filling missing parts for the “broken” words on the existing inscriptions. Filling text for the broken words has been done in the past using expert knowledge. Those experts have establish certain rules/guidelines that may be possible to extrapolate in some kind of expert system when talking in IT terminology. Furthermore, any hypotheses on word completion enters the database with some likelihood. Associating probabilities with hypotheses introduces another opportunity for research projects.

Early Breast Cancer Detection based on MRI’s

Supervisor: Amir Asif

Required Background: General CSE408x prerequisites

Recommended background: Signal processing, i.e. CSE3451

Description: This research will develop advanced computer-aided, signal processing techniques for early detection of breast cancer using the available modalities. In particular, we propose to develop time reversal beamforming imager, based on our earlier work in time reversal signal processing, for detecting early stage breast cancer tumours from MRI data. Our preliminary work has illustrated the type of results that are possible for breast cancer detection by applying time reversal signal processing on MRI breast data. In this research, we propose to extend these results to provide a quantitative understanding of the practical gains provided by time reversal in MRI based breast cancer detection and its limitations. This will be accomplished a local hospital, and running our algorithms on these datasets. The first step is important to check the validity of our algorithms. The next step is to compare the estimated locations of the tumours (as derived with our algorithms) to their precise locations as identified by the pathologists. The second step will quantify the accuracy of our estimation algorithms.

Touch- and Gesture-based Text Entry With Automatic Error Correction

Supervisor: Scott Mackenzie

Required Background: CSE3461 (or equivalent), CSE3311 (or equivalent), CSE4441 (or equivalent) A student wishing to do this project must be well versed in Java, Eclipse, and developing java code for the Android operating system.

Recommended Background: Possession of an Android touch-based phone or tablet would be an asset, but is not essential.

Description: This project involves extending a touch-based text entry method to include automatic error correction. The method, as is, uses Graffiti strokes entered via a finger on a touch-based Android tablet. The stroke recognizer works fine, but it is not perfect. Some strokes are mis-recognized while others are un-recognized. The fault is sometimes attributable to the recognizer, but, often, the fault is simply that the user's input was sloppy. The work involves developing, integrating, and testing software. The core software is already written, but automatic error correction is lacking. The primary task of the added software is to receive a sequence of characters representing a word and matching the sequence with words in a dictionary. If a match is found, all is well (presumably). If a match is not found, the search is extended to find a set of candidate words that are “close” to the inputted sequence. “Close”, here, involves using a minimum string distance algorithm (provided). The user interface must be modified to present the user with alternative words in the event an error occurred. The user selects the desired word by tapping on a word in the list. The project will involve testing the new input method in a small user study and writing up a report describing the work and presenting the results of the user study.

Tandem repeat detection using spectral methods

Supervisor: Suprakash Datta

Required Background: The student should have completed undergraduate courses in Algorithms and Signals and Systems.

Recommended Background: Some background in Statistics is desirable but not essential.

Description: DNA sequences of organisms have many repeated substrings. These are called repeats in Biology, and include both exact as well as approximate repeats. Repeats are of two main types: interspersed repeats (which are spread across a genome) and tandem repeats, which occur next to each other. Tandem repeats play important roles in gene regulation and are also used as markers that have several important uses, including human identity testing.

Finding tandem repeats is an important problem in Computational Biology. The techniques that have been proposed for it fall into two classes: string matching algorithms and signal processing techniques. In this project, we will explore fast, accurate algorithms for detecting tandem repeats and evaluate the outputs of the algorithms studied by comparing their outputs with those of available packages, including mreps, SRF and TRF.

4480 Project: Localizing nodes and tracking targets in wireless ad hoc networks securely

Supervisor: Suprakash Datta

Required Background: CSE4480 prerequisites

Description: A key infrastructural problem in wireless networks is localization (or the determination of geographical locations) of nodes. A related problem is the tracking of mobile targets as they move through the radio ranges of the wireless nodes.

If security is not a concern, then any of numerous existing algorithms can be implemented to get reasonably accurate location estimates of nodes or targets. These algorithms typically involve nodes sharing locations and assume that there are no malicious nodes and no privacy issues in sharing locations. However, localization or target tracking in the presence of malicious nodes or nodes that do not wish to disclose their locations is much more difficult.

This project will look at current research on localization algorithms. The student will read papers to learn about existing work and then implement a few algorithms to compare their performance. Then, with assistance from the supervisor, (s)he will attempt to propose improvements and/or combinations of ideas from the papers in a Java/C/C++/MatLab simulator.

Expected learning outcomes: Apart from familiarity with the current literature, the project will provide the student an introduction to scientific research and analysis of experimental data.

Skills required: Proficiency with one of Java, C, C++, MatLab; interest in developing algorithms for distributed systems; interest in experimental approaches to problems.

References:

1. Multiple target localisation in sensor networks with location privacy, Matthew Roughan, Jon Arnold· Proceedings of the 4th European conference on Security and privacy in ad-hoc and sensor networks (ESAS'07), Springer-Verlag, 2007

2. Defending Wireless Sensor Networks against Adversarial Localization, Neelanjana Dutta, Abhinav Saxena, Sriram Chellappan, Proceedings of the 2010 Eleventh International Conference on Mobile Data Management (MDM '10).

The student will implement existing spectral algorithms based on Fourier Transforms and on an autoregressive model. He will then make changes suggested by the supervisor, and evaluate the effect of the modifications. Throughout the course, the student is required to maintain a course Web site to report any progress and details about the project.

4480 Project: GFI Sandbox Analysis of Malware for DDoS

Supervisor: Natalija Vlajic

Required Background: General prerequisites.

Description: GFI Sandbox is a sophisticated industry-leading tool for quick and safe analysis of malware behaviour. The goals of this project are: 1) familiarize yourself with the operation of GFI Sandbox; 2) using readily available GFI Sandbox Feeds (i.e., ThreatTrack Feeds), build a database of malware designed specifically for execution of DDoS-attacks - the so-called botnet malware; 3) examine the behaviour of the collected malware 'upon execution'; 4) propose and build an environment - comprising the standard freeware security tools - for longer term (beyond immediate execution) analysis of the collected malware.

Table of Contents