Big Data Cyber-attack Detection
Jianwu Wang, Information Systems
Muthukumar Thevar, Information Systems
Neha Jha, Information Systems
Riyaz Habibi, Information Systems
As a core mechanism for cybersecurity, the ability to detect cyber-attacks is increasingly critical nowadays.There have been many types of network intrusion detection approaches, such as flow-based and packet-based, targeting single attack and multistage attack detection. Each approach has its own advantages and disadvantages. In this paper, we design an organic combination of these types of efforts into one comprehensive system. Furthermore, to deal with increasing volumes of network traffic and improve full packet analysis efficiency, we employ Spark Streaming platform for parallel detection.
Multidisciplinary Research and Education on Big Data + HPC + Atmospheric Sciences
Dr. Jianwu Wang, Department of Information Systems, Dr. Matthias K. Gobbert, Department of Mathematics and Statistics, Dr. Zhibo Zhang, Department of Physics and Dr. Aryya Gangopadhyay, Department of Information Systems.
We will use HPCF for a new NSF-funded initiative in big data applied to atmospheric sciences and using high-performance computing as a vital tool. The research training consists of instruction in the areas of data, computing, and atmospheric sciences supported by teaching assistants, followed by faculty-guided project research in a multidisciplinary team of participants from each area. Participating graduate students, post-docs, and junior faculty from around the nation will be exposed to multidisciplinary research experiences and have the opportunity for significant career growth. Details of the project can be found at cybertraining.umbc.edu.
Weather Data Clustering Project
Dr. Jianwu Wang, Department of Information Systems.
For this project, the available data was collected and presented in the NetCDF4 format which is a file format designed to support the creation, access, and sharing of scientific data. Since we were dealing with climate data that comprises of spatial information, time information and scientific values, the NetCDF4 data format was the best-suited format to hold all of this information in a convenient fashion.
The task at hand was to extend an existing clustering algorithm to make it working with a four-dimensional (4D) multivariate weather dataset. Although the end goal was to utilize the weather data (based on all the available attributes) to group similar days together, an imperative task that had to be handled initially was to bring down the xarray dataset into a two-dimensional format so that traditional machine learning algorithms can work well.
As we do not have any ground truth value of our dataset, it becomes an unsupervised data clustering task. So we want to apply some state-of-the-art deep learning models for this clustering task. Because deep learning-based models can represent more complex and nonlinear properties of the dataset and can generate clusters more robustly.
Reproducible and Portable Big Data Analytics in the Cloud
Dr. Jianwu Wang, Department of Information Systems.
Cloud computing has become a major approach to help reproduce computational experiments. Yet there are still two main difficulties in reproducing batch-based big data analytics (including descriptive and predictive analytics) in the cloud. The first is how to automate end-to-end scalable execution of analytics including distributed environment provisioning, analytics pipeline description, parallel execution, and resource termination. The second is that an application developed for one cloud is difficult to be reproduced in another cloud, a.k.a. vendor lock-in problem. To tackle these problems, we leverage serverless computing and containerization techniques for automated scalable execution and reproducibility and utilize the ada