COMP 5704: Parallel Algorithms and Applications in Data Science


School of Computer Science
Carleton University, Ottawa, Canada


Project Title: A GPU-accelerated One Class Support Vector Machine

Name: Richard Moulton, MSc student at the University of Ottawa


Project Outline:

Parallel computing offers exciting possibilities to address problems where processing time has been a major limiting factor. An example is stream learning, characterized in both [1] and [2] by: an inability to store all instances in memory (or even simply in storage); the number of instances precluding multiple passes by the learning algorithm; and a potential desire to extract data from each instance as it arrives. The characteristics of these streams can change over time, a phenomenon known as concept drift, and it is not uncommon for the vast majority of data to be "normal," an example of class imbalance [2]. A concrete application of stream learning is network intrusion detection where an Intrusion Detection System (IDS) monitors a continuous, high-speed data stream of network traffic and labels anomalous network packets for further action.

One class support vector machines (OCSVMs) are highly capable classifiers that are designed for one class classification problems and are naturally applied for anomaly detection. Their significant training time, however, is a drawback and is exacerbated when applied to stream learning. Open source machine learning libraries, such as [3] and [4], do not have graphical processing unit (GPU)-accelerated implementations of OCSVMs.

This project fills this gap and implements an OCSVM using the parallel processing power of GPUs, incorporating aspects from [1] and [2]. Firstly, using GPUs to tacklecomputationally expensive calculations, as in [1], allows OCSVMs to be trained faster as required by the stream learning domain. Secondly, the use of OCSVMs is modeled on the autoencoders presented by [2]: both require only instances from the "normal" majority class to be trained, addressing the class imbalance problem. The end result of this project is a GPU-acclerated implemention of an OCSVM capable of being trained quickly enough to be used in continuous streams of data and in the presence of concept drift.

Startup Paper(s):

Deliverables:

Relevant References: