COMP 5704: Parallel Algorithms and Applications in Data Science
|
|
School of Computer Science
Carleton University, Ottawa, Canada
|
Project Title: A GPU-accelerated One Class Support Vector Machine
Name: Richard Moulton, MSc student at the University of Ottawa
Project Outline:
Parallel computing offers exciting possibilities to address problems where processing time has been a major limiting factor. An example is stream learning, characterized in both [1] and [2] by: an inability to store all instances in memory (or even simply in storage); the number of instances precluding multiple passes by the learning algorithm; and a potential desire to extract data from each instance as it arrives. The characteristics of these streams can change over time, a phenomenon known as concept drift, and it is not uncommon for the vast majority of data to be "normal," an example of class imbalance [2]. A concrete application of stream learning is network intrusion detection where an Intrusion Detection System (IDS) monitors a continuous, high-speed data stream of network traffic and labels anomalous network packets for further action.
One class support vector machines (OCSVMs) are highly capable classifiers that are designed for one class classification problems and are naturally applied for anomaly detection. Their significant training time, however, is a drawback and is exacerbated when applied to stream learning. Open source machine learning libraries, such as [3] and [4], do not have graphical processing unit (GPU)-accelerated implementations of OCSVMs.
This project fills this gap and implements an OCSVM using the parallel processing power of GPUs, incorporating aspects from [1] and [2]. Firstly, using GPUs to tacklecomputationally expensive calculations, as in [1], allows OCSVMs to be trained faster as required by the stream learning domain. Secondly, the use of OCSVMs is modeled on the autoencoders presented by [2]: both require only instances from the "normal" majority class to be trained, addressing the class imbalance problem. The end result of this project is a GPU-acclerated implemention of an OCSVM capable of being trained quickly enough to be used in continuous streams of data and in the presence of concept drift.
Startup Paper(s):
-
[1] C. HewaNadungodage, Y. Xia and J. J. Lee, "GPU-accelerated Outlier Detection for Continuous Data Streams," in 2016 IEEE International Parallel and Distributed Processing Symposium, Orlando, FL, 2016, pp.1133-1142.
(PDF)
-
[2] Y. Dong and N. Japkowicz, "Threaded Ensembles of Supervised and Unsupervised Neural Networks for Stream Learning," in Canadian Conference on Artificial Intelligence, Victoria, BC, 2016, pp.304-315.
(
PDF)
Deliverables:
-
-
-
-
- Code and Data (ZIP file containing source code for the sequential and GPU-accelerated implementations, as well as the test and training data used in the project)
Relevant References:
- [3] Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.
- [4] Thorsten Joachims. Making large scale svm learning practical. Technical report, Universität Dortmund, 1999.
- [5] Asghar Ali Shah, Malik Sikander Hayat Khiyal, and Muhammad Daud Awan. Analysis of machine learning techniques for intrusion detection system: A review. International Journal of Computer Applications, 119(3), 2015.
- [6] Mohiuddin Ahmed, Abdun Naser Mahmood, and Jiankun Hu. A survey of network anomaly detection techniques. Journal of Network and Computer Applications, 60:19-31, 2016.
- [7] Anna L Buczak and Erhan Guven. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials, 18(2):1153-1176, 2015.
- [8] Vincent Lemaire, Christophe Salperwyck, and Alexis Bondu. A survey on supervised classification on data streams. In Business Intelligence, pages 88-125. Springer, 2015.
- [9] Elaine R Faria, Isabel JCR Gonçalves, André CPLF de Carvalho, and João Gama. Novelty detection in data streams. Artificial Intelligence Review, 45(2):235-269, 2016.
- [10] Jeffrey C Schlimmer and Richard H Granger Jr. Incremental learning from noisy data. Machine learning, 1(3):317-354, 1986.
- [11] Gerhard Widmer and Miroslav Kubat. Learning in the presence of concept drift and hidden contexts. Machine learning, 23(1):69-101, 1996.
- [12] João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4):44, 2014.
- [13] Ryan Elwell and Robi Polikar. Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks, 22(10):1517-1531, 2011.
- [14] Shehroz S Khan and Michael G Madden. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review, 29(03):345-374, 2014.
- [15] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):15, 2009.
- [16] Marco AF Pimentel, David A Clifton, Lei Clifton, and Lionel Tarassenko. A review of novelty detection. Signal Processing, 99:215-249, 2014.
- [17] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273-297, 1995.
- [18] Liva Ralaivola and Florence dAlché Buc. Incremental support vector machine learning: A local approach. In International Conference on Artificial Neural Networks, pages 322-330. Springer, 2001.
- [19] R Ravinder Reddy, B Kavya, and Y Ramadevi. A survey on svm classifiers for intrusion detection. International Journal of Computer Applications, 98(19), 2014.
- [20] Pawe l Drozda and Krzysztof Sopy la. Accelerating svm with gpu: The state of the art. In International Conference on Artificial Intelligence and Soft Computing, pages 624-634. Springer, 2016.
- [21] John C Platt. Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods, pages 185-208, 1999.
- [22] Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7):1443-1471, 2001.
- [23] Jia Jiong and Zhang Hao-ran. A Fast Learning Algorithm for One-Class Support Vector Machine. In Third International Conference on Natural Computation (ICNC 2007), pages 19-23. IEEE, 2007.
- [24] Qing Song, Wenjie Hu, and Wenfang Xie. Robust support vector machine with bullet hole image classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 32(4):440-448, 2002.
- [25] Katherine A Heller, Krysta M Svore, Angelos D Keromytis, and Salvatore J Stolfo. One class support vector machines for detecting anomalous windows registry accesses. In Proc. of the workshop on Data Mining for Computer Security, volume 9, 2003.
- [26] Leandros A Maglaras and Jianmin Jiang. A real time ocsvm intrusion detection module with low overhead for scada systems. International Journal of Advanced Research in Artificial Intelligence (IJARAI), 3(10), 2014.
- [27] Wenli Shang, Peng Zeng, Ming Wan, Lin Li, and Panfeng An. Intrusion detection algorithm based on ocsvm in industrial control system. Security and Communication Networks, 2015.
- [28] Leandros A Maglaras. A novel distributed intrusion detection system for vehicular ad hoc networks. International Journal of Advanced Computer Science and Applications (IJACSA), 6(4):101-106, 2015.
- [29] Abdulmohsen Almalawi, Adil Fahad, Zahir Tari, Abdullah Alamri, Rayed AlGhamdi, and Albert Y Zomaya. An efficient data-driven clustering technique to detect attacks in scada systems. IEEE Transactions on Information Forensics and Security, 11(5):893-906, 2016.
- [30] R Ravinder, Y Ramadevi, and KVN Sunitha. Anomaly detection using feature selection and svm kernel trick. International Journal of Computer Applications, 129(4):31-35, 2015.
- [31] Ralf Klinkenberg and Thorsten Joachims. Detecting concept drift with support vector machines. In ICML, pages 487-494, 2000.
- [32] Bartosz Krawczyk and Micha l Woźniak. One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Computing, 19(12):3387-3400, 2015.
- [33] Andreas Athanasopoulos, Anastasios Dimou, Vasileios Mezaris, and Ioannis Kompatsiaris. Gpu acceleration for support vector machines. In WIAMIS 2011: 12th International Work-hop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands, April 13-15, 2011. TU Delft; EWI; MM; PRB, 2011.
- [34] Robert Hochberg. Matrix multiplication with cuda a basic introduction to the cuda programming model, Aug 2012.
- [35] M. Lichman. UCI machine learning repository, 2013.
- [36] Nathalie Japkowicz and Mohak Shah. Evaluating learning algorithms: a classification perspective. Cambridge University Press, 2011.