Classification for High-Dimensional Data

Dr. Seungchul Baek, Department of Mathematics and Statistics.
Vahid Andalib, Department of Mathematics and Statistics.

Classification is a supervised rule learning method in statistical learning categorizing (predicting) new observations into various classes of the target variable using training data. There are several methods for classification which work well in low-dimension settings where the number of features is much less than the sample size. However, for high-dimensional data where the number of observations is much less than the number of features, the predictions using usual classifiers such as LDA becomes very unstable and unreliable. Overfitting is another issue that arises when using LDA in high dimensions. We aim to develop a classifier that works in a high-dimensional setting. In particular, we may focus on a new classifier using random partitioning and weighted classifiers.