Predictive Analytics on High-Dimensional Big Data using Principal Component Regression（PCR）
Kyi Lai Lai KhineThi Thi Soe Nyunt
Cloud Computing Lab, University of Computer StudiesFaculty of Computer Science, University of Computer Studies
摘要：Nowadays, the increasing volume, complexity of formats and delivery speed of "Big Data" from diverse application domains have exceeded the capabilities of traditional data management tools and technologies. There is a need to re-design classical data analysis methods and algorithms to be adaptable in parallel and distributed architecture which can work well with the vast amounts of data not only in size of samples but also in number of dimensions. Moreover, high-dimensional big datasets have experienced many issues and challenges to handle huge collection of wide（dimensions） and tall（samples） data nature extracting useful value from it. Principal Component Analysis（PCA） is an important machine learning algorithm in dimensionality reduction for highly correlated large-scale data. In this system, we will apply PCA as selecting regressors for multiple linear regression model we called Principal Component Regression（PCR） for high-dimensional big data analytics with the aim to select effective and efficient features or dimensions. Additionally, we will develop the parallel and distributed version of PCA as preliminary machine learning approach for multiple linear regression model implemented on two widely-used scalable and distributed platforms such as Disk-Based MapReduce and Memory-Based Spark solving the scalability issue of big data. Large-scale OpenStreetMap（OSM） data which can provide as reality fulfillment to GIS market and spatial world will be applied for experimentation of the system.
2019 The 11th International Conference on Future Computer and Communication （ICFCC 2019）