Table of Contents

Proposed Projects for Fall 2023

Electric Load Forecasting via Deep Generative Models

Course: EECS4080

Supervisor: Michael Jenkin

Supervisor's email address: jenkin@yorku.ca

Project Description: With the fast increase in renewable energy generation and electric vehicles, electric load forecasting is becoming more and more important for power system operation. Based on the forecasting horizon, there are mainly three types of load forecasting, i.e., short-term, medium-term, and long-term. Short-term load forecasting mainly aims to predict the electric load in the next few seconds to the next few hours, which can be very helpful for real-world energy dispatching. In recent years, machine learning, especially deep learning, has shown impressive performance for short-term load forecasting. Generative models, e.g., generative adversarial networks, have shown great potential for computer vision and natural language processing. The potential of such generative models has not been well studied for load forecasting. In this project, we mainly aim to benchmark the performance of different types of deep generative models for short-term load forecasting. We will mainly work on OPEN EI data sets which consist of electric load consumption data sets for different buildings in the US.

Required skills or prerequisites: Good python software skills. Interest in AI systems.

Recommended skills or prerequisites: Interest in GANs. Interest in AI software development.

Instructions: Send CV, (unofficial transcript), GitHub repo address if available to Prof. Jenkin.

Analyze the Impacts of Ensemble Learning for Anomaly Detection

Course: EECS4080

Supervisor: Michael Jenkin

Supervisor's email address: jenkin@yorku.ca

Project Description: Hacking and false data injection from adversaries threaten can cause significant financial loss. Accurate detection of anomalies is of significant importance for the safe and efficient operation of modern power grids. In recent years, different types of techniques, such as statistical methods, unsupervised learning methods, generative models, and prediction-based methods, have been applied for anomaly detection. However, most of the current works assume the stability of the data distribution and ignore the distribution drift, which often happens in the real world. In this work, we aim to utilize the benefits of ensemble learning to address real-world anomaly detection problems. Specifically, we plan to dynamically utilize the different base models via ensemble learning to tackle the challenges of distribution drift in the real world. For this project, we will mainly work on two data sets, i.e., the Secure Water Treatment (SWaT) Dataset and ICS Cyber Attack Dataset. These two data sets are frequently used real-world data sets for anomaly detection.

Required skills or prerequisites: Interest in AI systems. Interest in AI software development

Recommended skills or prerequisites: Good python programming skills. Some course(s) in AI systems

Instructions: Send CV, (unofficial transcript), GitHub repo address if available to Prof. Jenkin.

Robot Tutors in Higher Education

Course: EECS4070

Supervisor: Meiying Qin

Supervisor's email address: mqin@yorku.ca

Project Description: In this reading course, you will survey the literature on robot tutoring systems, which lies in the field of human-robot interactions (HRI). In particular, you will read literature on robot tutors for different ages ranging from elementary school students to university school students. You will also learn reinforcement learning relevant to model the robot tutors. In order to gain a deeper understanding of the materials, you may design a relevant project with what you have learned, though you will not implement the project. You are expected to compile a survey of robot tutoring systems as an outcome of this course. Depending on the quality of the survey, we may publish this survey and you may gain experience of formal publication.

Required skills or prerequisites:

Recommended skills or prerequisites:

Instructions: Please send your c.v. and transcript. Optional: e-portfolio that demo previous projects that one has worked on

Designing Privacy-preserving Systems

Course: EECS4080 or EECS4480

Supervisor: Yan Shvartzshnaider

Supervisor's email address: rhythm.lab@yorku.ca

Project Description: Modern sociotechnical systems share and collect vast amounts of information. These systems violate users’ privacy by ignoring the context in which the information is shared and failing to incorporate contextual information norms.

Using techniques in natural language processing, machine learning, network, and data analysis, this project is set to explore the privacy implications of mobile apps, online platforms, and other systems in different social contexts/settings.

To tackle this challenge, the project will operationalize a cutting-edge privacy theory and methodologies to conduct an analysis of existing technologies and design privacy-enhancing tools.

Students will help analyze information handling practices of online services and design privacy-enhancing tools.

Specific tasks include: comprehensive literature review of existing methodologies and tools, analysis of privacy policies and regulations, visualization of information collection practices, and design of a web-based interface for analyzing extracted privacy statements to identify vague, misleading, or incomplete privacy statements.

For prior project, see this link

Required skills or prerequisites: Good programming and data analysis skills overall, and experience in using Jupyter and/or R for data analysis.  Ability to work independently. Interest in usable privacy, critical analysis of privacy policies and privacy related regulation.

Recommended skills or prerequisites: Experience with Machine Learning, Natural Language Processing techniques, HCI design. Students with diverse backgrounds, including in technical fields, social sciences and humanities are encouraged to apply.

Instructions: Please fill in this form

Strengthening the Security of a Python Autograder

Course: EECS4480/EECS4080/EECS4088

Supervisor: Jonatan Schroeder

Supervisor's email address: jonatan@yorku.ca

Project Description: Unit testing platforms like Java's JUnit and Python's unittest provide a simple interface for evaluating the correctness of individual functions in a large project. These platforms can also be used in an academic environment to automatically test student-submitted code in programming assignments and generate a grade based on if these tests pass or fail. However, given that these platforms were originally developed for running code that is expected to be trusted, this practice can lead to a potential risk if students are able to provide code that causes the test to pass without resulting in the expected value (see https://www.seas.upenn.edu/~hanbangw/blog/hack-gs/). While most modern autograding platforms introduce security practices to avoid this kind of code from receiving a valid grade, some vulnerabilities still exist.

For this project you will strengthen the security of an autograder process for Python code for the PrairieLearn platform. You will start by creating possible attack vectors in the form of code that is expected to cause the autograder to pass a test without actually returning the expected results. Examples of attack vectors include code that saves or outputs well-formatted values that are interpreted by the autograder as a success, code that is able to identify secret information from the autograder code, and/or code that crashes the original autograder process. Then you will implement safeguards that ensure student-submitted code is unable to bypass container sandbox limitations, and that ensure that malicious student code does not result in a successful grade.

You will work in coordination with the supervisor and the PrairieLearn developer community to brainstorm possible strategies and guidelines. Your final deliverable will be a pull request to the PrairieLearn codebase with the proposed fix.

Required skills or prerequisites: Must have completed EECS 1015 (or a similar course) with an A/A+. Must have solid programming skills in Python, including the use of unit testing. Must be able to work independently and have good communication skills.

Recommended skills or prerequisites: EECS 3221 is highly recommended. Experience with Docker containers is helpful but can be obtained during the project. Git experience is helpful. Experience with open source software development is an asset.

Instructions: Additional information about PrairieLearn can be found here: https://prairielearn.readthedocs.io/en/latest/. A sample PrairieLearn assessment that includes Python autograded questions can be found here: https://us.prairielearn.com/pl/course_instance/136606/assessment/2351069. Please submit a brief description of your experience with the skills listed above.

CTF for Applied Cryptography course

Course: EECS4480/EECS4090/EECS4080/EECS4088

Supervisor: Ruba Al Omari

Supervisor's email address: alomari@yorku.ca

Project Description: Create a CTF for the applied cryptography course that includes tasks that cover classical, symmetric, and asymmetric techniques. Something similar to https://cryptohack.org/ but on a smaller scale. The idea is for this CTF to be created and maintained locally the flags can be changed regularly. With publicly available CTFs students can easily google the flags or read the writeups.

Required skills or prerequisites: Programming skills, and an understanding of cryptography.

Recommended skills or prerequisites: It is recommended that students have taken EECS3481 before.

Instructions: Please send your CV and transcript and specify whether you are a computer security major. Optional: Link to any previous projects that you have worked on.

Computer Science Education Research - Robots Tutors in First Year Programming Courses

Course: EECS4080

Supervisor: Meiying Qin

Supervisor's email address: mqin@yorku.ca

Project Description: The goal of this project is to implement a robotic tutor to assist first-year programming students in Computer Science Education (CSE) research, while also delving into the realm of Human-Robot Interactions (HRI). Students enrolled in the first-year programming course will go to a designated location to interact with a robot to do exercises together. While robot tutors have been utilized in pre-university education, few studies explored their effectiveness within post-secondary education settings. By participating in the 4080 project, students will gain immersive exposure to the entire research project lifecycle, spanning from initial project planning and design to the contemplation of ethical considerations in real-world applications. The roles and responsibilities of students involved in this project include drafting protocol and consent forms, project design, and implementation of the project.

Required skills or prerequisites: Python

Recommended skills or prerequisites: Data analysis and writing skills

Instructions: Please send your c.v. and transcript

C++ in Embedded Systems: A Reality Check

Course: EECS4070/4080/4088/4090

Supervisor: James Smith

Supervisor's email address: drsmith@yorku.ca

Project Description: While C++ is one of the most common general purpose programming languages, it is not commonly used in resource-poor (“bare metal”) embedded devices. Issues related to complexity, efficiency, memory requirements, determinism and real time constraints are often cited as reasons for this reluctance. With the advent of less restrictive 32-bit embedded devices, updates to the C++ language and availability of C++ compilers alongside C in many embedded device IDEs we wish to examine whether the assumptions that are commonly held about C++ in the embedded systems space still hold true. The learning outcomes will be as follows. By the end of this course, the student will be able to

  1. Articulate how they have applied the knowledge they have gained in other software engineering courses to a real-world system
  2. Implement schedulers (Cooperative, with and without ISRs, as well as with and without finite state machine) for a baseline embedded system problem in both contemporary procedural C (C11+ or greater and C++17 or greater), as well as contemporary object oriented C++ (C++17 or greater)
  3. Illustrate the performance differences between contemporary procedural (C11 or greater and C++17 or greater) and contemporary object oriented (C++17 or greater) programming solutions for baseline, resource-poor bare metal embedded devices.
  4. Articulate the questions that a particular area of research in embedded systems and programming languages attempts to address.
  5. Prepare a professional presentation that outlines the contributions they made to the project and the knowledge they acquired.

Required skills or prerequisites: General knowledge of procedural and object-oriented programming languages. Previous experience with Arduinos or other embedded devices like Raspberry Pis.

Recommended skills or prerequisites: Previous C++ experience.

CiteFair: an online tool to detect and mitigate unfairness citation patterns in scientific articles

Course: EECS4080/4088/4090

Supervisor: Alvine Belle

Supervisor's email address: alvine.belle@lassonde.yorku.ca

Project Description: The number of citations of scientific articles has a huge impact on recommendations for funding allocations, recruitment decisions, and rewards, just to name a few. However, some researchers belonging to some socio-cultural groups (e.g., women) are usually less cited than other researchers coming from dominating groups. This may be due to the presence of some unfairness citation patterns in some scientific articles. These citation patterns are tangible examples of biases against researchers from some socio-cultural groups and may inevitably cause unfairness and inaccuracy in the assessment of articles impact. These citations patterns may therefore translate to significant disparities in promotion, retention, grant funding, awards, collaborative opportunities, and publications. The project will first start by analyzing the existing scientific literature to find out the various unfairness citations patterns that may be present in some scientific articles. Then, the project will focus on the exploration of existing mitigation solutions and their limitations. The project will then aim at developing an online tool called CiteFair that will be able to:

  1. Automatically analyze scientific articles to detect the potential presence of unfairness citation patterns
  2. Rely on existing bibliometric tools to provide some suggestions to articles authors to mitigate these citations patterns and increase the fairness citation score of their articles.

The project will also consist in validating the accuracy of the CiteFair tool by making experiments on a sample of the scientific articles published within the last decade in a wide range of venues. Experiments will also focus on evaluating the usability and performance of the CiteFair tool.

Required skills or prerequisites: Solid experience with JavaScript, HTML, and CSS

Recommended skills or prerequisites: Experience with web-development frameworks (e.g., React JS, Spring Boot) and good oral and written skills in English

Large Language Models based Test Case Generation

Course: EECS4080

Supervisor: Song Wang

Supervisor's email address: wangsong@yorku.ca

Project Description: Recently, pre-trained large language models (LLMs) have emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effective software testing techniques becomes increasingly urgent and making it an area ripe for innovative approaches such as the use of LLMs. Our recent collaboration with Meta also confirms the limitations of existing widely used testing techniques in test input generations, test oracle generation, and test scenario generation. This project takes a solid initial step towards exploring the next-generation software testing techniques powered by LLMs.

Required skills or prerequisites: Be familiar with DL libraries such as Tensorflow and Pytorch;

Instructions: Send your c.v. and transcript to supervisor