Computer Vision and Machine Learning applied to assistive technologies such as gesture-based interfaces, crime detection, visually-impaired navigation and machine translation

Your Brilliant Idea

If you have something that you would really like to do, feel free to speak to me and we can see if we can develop it into an honours project for you.

 

Drowsiness Detection

A major cause of night-time worldwide road accidents is driver drowsiness. A late night and/or a long journey can take a toll on a driver and cause the driver to be sleepy. In this state, even the shortest moment in which the driver temporarily falls asleep can result in a loss of control over the wheel, potentially resulting in an accident that may involve one or more other road users. This project aims to make use of machine learning coupled with image processing techniques to be able to detect when a driver’s face may appear to be drowsy. This can eventually be deployed in a vehicle in order to immediately sound an alert should the driver fall asleep.

 

Weather Prediction

Weather prediction is a big part of the weather service of every country. Historical weather data are used to train one or more models, the goal of which is/are to predict the weather over the next N days. In this project, we will specifically be looking at historical images of precipitation over a given region. One or more machine learning techniques (such as deep learning) will be used to predict the next N days of precipitation given the previous M days of precipitation. Speak to me for further detail about this project.

 

Spoiler Detection

The popular IMDB site is a rich resource for film lovers. The site allows viewers to vote on a movie, but also to provide a detailed description of their experience of the movie. This information is intended to provide other users a taste of the movie, in order to decide whether or not it is worth a watch. Ideally, users providing detailed feedback mention whether or not their feedback contain spoilers – many users care while some (such as my wife) do not. This project aims to employ text processing coupled with machine learning in order to predict whether a piece of text may contain spoilers. This can be used to supplement the tireless efforts of site admins in reviewing feedback and warning others when there are *spoilers ahead*.

 

Pool Shot Assistant

The goal of this project is to provide an augmented reality experience to someone watching/playing pool. A camera is placed in a fixed pre-determined location e.g. right above the pool table. Then, as the player positions the cue to take a shot, the application augments the video feed in real-time by displaying the potential angles at which one or more of the balls will travel if that shot were to be taken.

 

Audio Track-Pad Using Two Microphones

Two transducer microphones are attached to the top-left and bottom-left corner of a fixed surface e.g. a table. Then, the idea is to train a system that can determine the x- and y-locations of a user-initiated tap. Doing so would mean that a user can turn any arbitrary surface into a touch-pad of sorts using only two very low-cost microphones. The training procedure here could make use of e.g. linear regression (which is taught in the Machine Learning course in Honours, so its very important for you to do this course if you intend to take up this project).

 

Automatic Attendance Taker

One of the problems that we, as academics, face is reliably determining which student was present in which class. Class registers help, but it is not impossible for one student to sign on behalf of one or more other students. The goal of this project is to be able to automatically determine attendance. The lecturer takes one or more photos of the class. The photo is fed into the proposed system which automatically locates all the faces and attempts to recognize them. Once recognized, it can keep a register of attendance.

 

Motion Detection Using A Moving Camera Update

Motion detection is a very key technique in image processing. It can be used to find objects of interest in a camera feed. The wide majority of currently available motion detection methods assume that the camera is stationary. In 2017, a previous student (Ms Williams) successfully created a system that could detect motion in a video feed despite small movements of the camera by making an assumption that the object moving in the view makes up a minority of the frame. In the effort, the Lucas-Kanade optical flow method of motion detection was used. This project aims to use a better motion detection method such as the Farneback dense optical flow method. This will be compared to Ms Williams’ work and is hoped to result in far better results.

 

Depth Inference for Hand Tracking

Automatically tracking a hand in a (single) camera feed is a very well-established field of research, with a variety of techniques proposed and used to successfully track one or both hands in a video as they move. The use of a single camera provides only a single 2D view of the scene, meaning that it is not realistically possible to determine the how far (depth) the hand(s) are from the camera in the image at any time. On the other hand, a large body of research has dealt with the problem of determining the depth of objects using 2 cameras capturing the same scene at the same time. The problem tackled by this project is to apply depth estimation using 2 cameras to hand tracking. The project would first involve re-implementing one of the existing hand tracking strategies that are documented. Then, it would involve re-implementing on of the existing object depth estimation strategies that are documented. Then, with the position of the hand known, it would be possible to get the depth of the hand.

 

Digitised Note-Taking and Patient Management System for Dentists

Most dentists currently use a file-based and paper-based system of organising their patients’ information. This is obviously very inefficient and ineffective. This project will involve creating a comprehensive note-taking and patient management suite for use by dentists. It will involve creating a suitable back-end and a very polished app-based and/or web-based front-end.

 

Facial Expression Recognition for Customer Satisfaction

Customer satisfaction is usually important to most companies. It may be a little bit difficult to accurately gauge satisfaction, with most companies relying on customers to answer questionnaires and/or actively come forward with criticism. This project aims to make use of facial expression recognition techniques to attempt to automatically gauge satisfaction. Facial expression recognition techniques are very well-established and the student will be required to first familiarize him/herself with these techniques, and then apply them.

 

Physical Book Search Assistant

With the introduction of eBooks, searching a book for a desired word/phrase has become very easy. With the touch of a few keys i.e. <Ctrl-F>, <Type in Search Phrase> and <Enter>, the user can quickly find the word/phrase he/she is looking for. This luxury does not extend to physical books. While a glossary goes a long way towards addressing this problem, it is not nearly as effective or convenient as being able to search in an eBook. This project entails the following: a physical book is placed on a viewing surface; a camera (most likely mounted in a fixed face-down position) is pointed towards the pages of the book and continuously captures the scene; the system then analyses the images captured as the user pages through the book using Original Character Recognition (OCR) techniques to locate a desired word/phrase in the book. The desired word/phrase can be indicated on the computer screen for the user to see. Since book scanning and OCR are very well-established fields, this would require the student to familiarise him/herself with these techniques and apply them to this task.

 

Automatic Baby Monitor

There are currently a very wide array of baby monitoring tools in the market place. Some of these take the form of just a simple camera that points towards the baby and allows the parent to act as the “intelligence” of the system by constantly watching the baby through a remote monitor. Some of these provide a few extra features such as automatically detecting motion in the room, monitoring for noise, detecting the baby’s cries, monitoring room temperature etc. In all these cases, these features help assist the parent in detecting an event that requires attention. This project entails developing a system using a web camera, ideally attached to a light-weight computing device such as a raspberry pi, that captures the scene and transmits it to a custom-made android application with which the parent can tap into the video feed, as well as receive notifications. The application would also include some motion detection and other such features.

 

Automatic Puzzle Solver

As children, we may have all played with puzzles, pouring the pieces onto the table, and then systematically (and sometimes haphazardly) going about figuring out which piece fits where. This project aims to write a system that does the following: with the pieces of a puzzle (starting with a very simple puzzle of very few pieces at first) placed all face up on the table, the system captures a picture of those pieces and then proceeds to figure out which piece fits where to solve the puzzle.

 

Interactive Sudoku Solver

This project entails (at least in my mind, feel free to mould and mend) creating a game in which a camera is pointed towards a sudoku grid and the computer program “monitors” the game as the user physically puts in numbers and, at the request of the user, can give ‘hints’ i.e. highlight wrong entries, provide the solution to n squares etc.

 

Interactive Visual Tic-Tac-Toe

This project is somewhat similar to the interactive sudoku solver. A camera is pointed towards a tic tac toe grid drawn by the user. Thereafter, the user and computer take turns making moves on the grid. The player’s choices are drawn physically on the grid, whereas the computer’s moves are painted onto the grid virtually by superimposing them on the image on the screen.

 

Visual Plagiarism Detector

The majority of applications aiming towards detecting plagiarism dig deep, performing matching at the text and semantic level. Doing so can be very complex and computationally heavy, especially in the case of code plagiarism. This project aims to investigate whether taking a more general ‘visual’ approach can be just as, if not more, effective. The aim is to determine whether finding matches between a set of documents/code by comparing them as images (rather than text) can help detect similarities and, hence, plagiarism. If shown to be workable, this can be a very helpful tool in undergrad practicals 🙂


Audio Gesture Recognition

Have you ever placed your ear on the wall and then tried to draw shapes on the wall with your finger nails? The sound made by each type of shape can be very distinctive. This project aims to do something similar to this. Attach one (or more) microphones onto the wall. Then, as he user draws gestures onto the wall, the audio gesture recognition program detects and recognizes these gestures. This can be used, for example, as a means of remotely carrying out various actions such as start/stop/pause of the media player etc.

 

Leap Motion Gesture Recognition

 
The leap motion is a device that can track a user’s hands and determine and provide the pose of the hands in real-time. This project aims to take the information provided by the leap motion and train a classifier to recognize various handshapes in various orientations and locations towards recognizing unique gestures.
 

Visually Impaired Helper (VIH) – Flying Edition

A previous student (Kurt Jacobs) created the VIH system which won the World Citizenship Category at the Microsoft Imagine Cup 2015. Read about it HERE. The project uses the Microsoft Kinect to guide a visually impaired person navigate his/her surroundings. This project entails using a drone such as the A.R Parrot or Bebop Dron, flying in front of the visually impaired person, to guide the person while navigating his/her surroundings.


e-Conductor

This project will attempt to develop a program that will produce music. The program will capture the motions of a conductors hand(s) and based on the speed and direction of the hand(s), produce musical notes. A fast motion will create a loud sound, a slow motion will create a soft sound, moving the hands upwards will increase the frequency of the notes, moving them downwards will decrease the frequency of the notes.

 

Forex Forecasting

Foreign Exchange rates between all currencies vary by the second. Many companies exist that allow one to predict whether the rate will increase or decrease and reward the forecaster for every correct prediction made. This project entails the creation of an intelligent system that will be able to automatically carry out this forecasting with a high accuracy.


Phone Reader Update

Phone reader was developed by honours students. The application allows a user to take a photo of a piece of text, send it to a server, have it automatically recognized, sent back and read to the user as audio.An update to this application is to allow the user to select and label specific parts of the photo which are to be recognized. Only these parts will be recognized and sent back from the server. Subsequently, the labels can be clicked to read only those parts back. Also,the images should be pre-processed to make recognition easier.


Anaglyph Videos

Anaglyph images are used to provide a stereoscopic 3D effect, when viewed with 2 color glasses (each lens a chromatically opposite color, usually red and cyan). Images are made up of two color layers, superimposed, but offset with respect to each other to produce a depth effect (en.wikipedia.org/wiki/Anaglyphic) Read the wikipedia page for more details on the glasses required and processes that images need to be put through to become stereoscopic. Anaglyphs have become widely used: http://mars.jpl.nasa.gov/MPF/mpf/anaglyph-arc.html. The objective is to produce an anaglyphic video of a tour of the Computer Science building. Anaglyph production must be done programmatically and not using third party tools (like photoshop or gimp). The end result of the project is to produce a program that:

  • allows a user to select a video (commonly used format such as avi or mpg),
    play the video
    convert the video to anaglyph
    play the anaglph video
    save the video in the same format it was read.

 

Personal Finance Manager

This project is best suited for a student with a B.Comm background. The student is required to build a system that can be used to manage the finances of an individual, household or small business. The system must be able to report on the financial status of the entity as well as helping the entity to deal with matters such as personal income tax. Strong emphasis must be placed on building a system that is robust, stable and secure.