Data Mining and Machine Learning Approach for Air Quality Index Prediction





In recent years, Air Pollution has increased drastically and having worse effect of that on all the living beings. Majority of Countries in the world battling with increasing Air Pollution Levels. So, it has become a necessity to control and predict the Air Quality Index. In this research project, we will be implementing Data Mining and Machine Learning models to predict the AQI and Classify the AQI into buckets. For AQI prediction we have implemented five regression models Principal Component, Partial Least Square, Principal Component with Leave One Out CV, Partial Least Square with Leave One Out CV, Multiple regression AQI Data of Multiple Indian Cities. AQI Index further gets classified into 6 Different Categories called Buckets “Good, Satisfactory, Moderate, Poor, Very Poor and Severe” based on the value of the AQI. To predict the AQI bucket we have developed three classification models which are Multinomial Logistic Regression and K Nearest Neighbor and K Nearest Neighbors with repeat CV Classification algorithm. From the Air Quality Dataset of Different Indian Cities PLS model with Leave One Out Cross Validation was best at dimension reduction considering only the 5th component from all the models. In terms of accuracy PLS model was best with Lowest RMSE. From Station, Wise Data of Indian Cities KNN Model with Repeated CV and Tune Length 10 performed best in terms of accuracy and AUC.


Download data is not yet available.


S. S. Ganesh, S. H. Modali, S. R. Palreddy and P. Arulmozhivarman, "Forecasting air quality index using regression models: A case study on Delhi and Houston," 2017 International Conference on Trends in Electronics and Informatics (ICEI), Tirunelveli, 2017, pp. 248-254, doi: 10.1109/ICOEI.2017.8300926.

K. Nandini and G. Fathima, "Urban Air Quality Analysis and Prediction Using Machine Learning," 2019 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE), Bangalore, India, 2019, pp. 98-102, doi: 10.1109/ICATIECE45860.2019.9063845.

S. Srikamdee and J. Onpans, "Forecasting Daily Air Quality in Northern Thailand Using Machine Learning Techniques," 2019 4th International Conference on Information Technology (InCIT), Bangkok, Thailand, 2019, pp. 259-263, doi: 10.1109/INCIT.2019.8912072.

T. M. Amado and J. C. Dela Cruz, "Development of Machine Learning-based Predictive Models for Air Quality Monitoring and Characterization," TENCON 2018 - 2018 IEEE Region 10 Conference, Jeju, Korea (South), 2018, pp. 0668-0672, doi: 10.1109/TENCON.2018.8650518.

N. Tomar, D. Patel and A. Jain, "Air Quality Index Forecasting using Auto-regression Models," 2020 IEEE International Students' Conference on Electrical,Electronics and Computer Science (SCEECS), Bhopal, India, 2020, pp. 1-5, doi: 10.1109/SCEECS48394.2020.216.

D. Ao, Z. Cui and D. Gu, "Hybrid model of Air Quality Prediction Using K-Means Clustering and Deep Neural Network," 2019 Chinese Control Conference (CCC), Guangzhou, China, 2019, pp. 8416-8421, doi: 10.23919/ChiCC.2019.8865861.

R. K. Grace, K. Aishvarya S., B. Monisha and A. Kaarthik, "Analysis and Visualization of Air Quality Using Real Time Pollutant Data," 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2020, pp. 34-38, doi: 10.1109/ICACCS48705.2020.9074283.

S. Mahanta, T. Ramakrishnudu, R. R. Jha and N. Tailor, "Urban Air Quality Prediction Using Regression Analysis," TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), Kochi, India, 2019, pp. 1118-1123, doi: 10.1109/TENCON.2019.8929517.

U. Mahalingam, K. Elangovan, H. Dobhal, C. Valliappa, S. Shrestha and G. Kedam, "A Machine Learning Model for Air Quality Prediction for Smart Cities," 2019 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET), Chennai, India, 2019, pp. 452-457, doi: 10.1109/WiSPNET45539.2019.9032734.

Y. Su, "Prediction of air quality based on Gradient Boosting Machine Method," 2020 International Conference on Big Data and Informatization Education (ICBDIE), Zhangjiajie, China, 2020, pp. 395-397, doi: 10.1109/ICBDIE50010.2020.00099.

James G., Witten D., Hastie T., Tibshirani R. (2013). An Introduction to Statistical Learning. Springer.

Pallant, Julie. Ebook: SPSS Survival Manual: a Step by Step Guide to Data Analysis Using IBM SPSS, McGraw-Hill Education, 2020

Olson, D. L., & Delen, D. (2008). Advanced data mining techniques. Chap 9. Springer Science & Business Media.




How to Cite

M. Londhe, “Data Mining and Machine Learning Approach for Air Quality Index Prediction”, Int J Eng and Appl Phys, vol. 1, no. 2, pp. 136–153, May 2021.