Weather Data Clustering Project

Dr. Jianwu Wang, Department of Information Systems.

For this project, the available data was collected and presented in the NetCDF4 format which is a file format designed to support the creation, access, and sharing of scientific data. Since we were dealing with climate data that comprises of spatial information, time information and scientific values, the NetCDF4 data format was the best-suited format to hold all of this information in a convenient fashion.

The task at hand was to extend an existing clustering algorithm to make it working with a four-dimensional (4D) multivariate weather dataset. Although the end goal was to utilize the weather data (based on all the available attributes) to group similar days together, an imperative task that had to be handled initially was to bring down the xarray dataset into a two-dimensional format so that traditional machine learning algorithms can work well.

As we do not have any ground truth value of our dataset, it becomes an unsupervised data clustering task. So we want to apply some state-of-the-art deep learning models for this clustering task. Because deep learning-based models can represent more complex and nonlinear properties of the dataset and can generate clusters more robustly.