Data Mining and Machine Learning Approach for Air Quality Index Prediction





In recent years, Air Pollution has increased drastically and having worse effect of that on all the living beings. Majority of Countries in the world battling with increasing Air Pollution Levels. So, it has become a necessity to control and predict the Air Quality Index. In this research project, we will be implementing Data Mining and Machine Learning models to predict the AQI and Classify the AQI into buckets. For AQI prediction we have implemented five regression models Principal Component, Partial Least Square, Principal Component with Leave One Out CV, Partial Least Square with Leave One Out CV, Multiple regression AQI Data of Multiple Indian Cities. AQI Index further gets classified into 6 Different Categories called Buckets “Good, Satisfactory, Moderate, Poor, Very Poor and Severe” based on the value of the AQI. To predict the AQI bucket we have developed three classification models which are Multinomial Logistic Regression and K Nearest Neighbor and K Nearest Neighbors with repeat CV Classification algorithm. From the Air Quality Dataset of Different Indian Cities PLS model with Leave One Out Cross Validation was best at dimension reduction considering only the 5th component from all the models. In terms of accuracy PLS model was best with Lowest RMSE. From Station, Wise Data of Indian Cities KNN Model with Repeated CV and Tune Length 10 performed best in terms of accuracy and AUC.


How to Cite

M. Londhe, “Data Mining and Machine Learning Approach for Air Quality Index Prediction”, Int J Eng and Appl Phys, vol. 1, no. 2, pp. 136–153, May 2021.