ABSTRACT
Since air pollution seriously affects human heath and daily life, the air quality prediction has attracted increasing attention and become an active and important research topic. In this paper, we present AccuAir, our winning solution to the KDD Cup 2018 of Fresh Air, where the proposed solution has won the 1st place in two tracks, and the 2nd place in the other one. Our solution got the best accuracy on average in all the evaluation days. The task is to accurately predict the air quality (as indicated by the concentration of PM2.5, PM10 or O3) of the next 48 hours for each monitoring station in Beijing and London. Aiming at a cutting-edge solution, we first presents an analysis of the air quality data, identifying the fundamental challenges, such as the long-term but suddenly changing air quality, and complex spatial-temporal correlations in different stations. To address the challenges, we carefully design both global and local air quality features, and develop three prediction models including LightGBM, Gated-DNN and Seq2Seq, each with novel ingredients developed for better solving the problem. Specifically, a spatial-temporal gate is proposed in our Gated-DNN model, to effectively capture the spatial-temporal correlations as well as temporal relatedness, making the prediction more sensitive to spatial and temporal signals. In addition, the Seq2Seq model is adapted in such a way that the encoder summarizes useful historical features while the decoder concatenate weather forecast as input, which significantly improves prediction accuracy. Assembling all these components together, the ensemble of three models outperforms all competing methods in terms of the prediction accuracy of 31 days average, 10 days average and 24-48 hours.
Supplemental Material
- Phillip Boyle and Marcus Frean. 2005. Multiple output Gaussian process regression. (2005).Google Scholar
- L Bruckman. 1993. Overview of the enhanced geocoded emissions modeling and projection (enhanced GEMAP) system. Regional Photochemical Measurement and Modeling Studies. Volume , Vol. 2 (1993), 8--12.Google Scholar
- William R Burrows, Mario Benjamin, Stephen Beauchamp, Edward R Lord, Douglas McCollor, and Bruce Thomson. 1995. CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. Journal of applied meteorology , Vol. 34, 8 (1995), 1848--1862.Google ScholarCross Ref
- Jianjun Chen, Jin Lu, Jeremy C. Avise, John A. DaMassa, Michael J. Kleeman, and Ajith P. Kaduwela. 2014. Seasonal modeling of PM2.5 in California's San Joaquin Valley. Atmospheric Environment , Vol. 92 (2014), 182 -- 190.Google ScholarCross Ref
- Cristiana Croitoru and Ilinca Nastase. 2018. A state of the art regarding urban air quality prediction models. In E3S Web of Conferences, Vol. 32. EDP Sciences, 01010.Google Scholar
- Xiao Feng, Qi Li, Yajie Zhu, Junxiong Hou, Lingyan Jin, and Jingjie Wang. 2015. Artificial neural networks forecasting of PM2. 5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmospheric Environment, Vol. 107 (2015), 118--128.Google ScholarCross Ref
- Xiao Feng, Qi Li, Yajie Zhu, Jingjie Wang, Heming Liang, and Ruofeng Xu. 2014. Formation and dominant factors of haze pollution over Beijing and its peripheral areas in winter. Atmospheric Pollution Research, Vol. 5, 3 (2014), 528--538.Google ScholarCross Ref
- Vitor Campanholo Guizilini and Fabio Tozeto Ramos. 2015. A Nonparametric Online Model for Air Quality Prediction. In AAAI. 651--657. Google ScholarDigital Library
- Jaein I. Jeong, Rokjin J. Park, Jung-Hun Woo, Young-Ji Han, and Seung-Muk Yi. 2011. Source contributions to carbonaceous aerosol concentrations in Korea. Atmospheric Environment , Vol. 45, 5 (2011), 1116 -- 1125.Google ScholarCross Ref
- Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3146--3154. Google ScholarDigital Library
- David Krueger and Roland Memisevic. 2015. Regularizing RNNs by Stabilizing Activations. CoRR , Vol. abs/1511.08400 (2015).Google Scholar
- Xiang Li, Ling Peng, Yuan Hu, Jing Shao, and Tianhe Chi. 2016. Deep learning architecture for air quality predictions. Environmental Science and Pollution Research , Vol. 23, 22 (2016), 22408--22417.Google ScholarCross Ref
- Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Swish: a self-gated activation function. arXiv preprint arXiv:1710.05941 (2017).Google Scholar
- Rouzbeh Shad, Mohammad Saadi Mesgari, Arefeh Shad, et almbox. 2009. Predicting air pollution using fuzzy genetic linear membership kriging in GIS. Computers, environment and urban systems , Vol. 33, 6 (2009), 472--481.Google Scholar
- Xuan Song, Hiroshi Kanasugi, and Ryosuke Shibasaki. 2016. DeepTransport: Prediction and Simulation of Human Mobility and Transportation Mode at a Citywide Level.. In IJCAI, Vol. 16. 2618--2624. Google ScholarDigital Library
- Zheng Yan Jie Lu Guangquan Zhang Wang, Bin and Tianrui Li. 2018. Deep Multi-task Learning for Air Quality Prediction. In International Conference on Neural Information Processing. Springer, Cham, 93--103.Google Scholar
- Xiuwen Yi, Junbo Zhang, Zhaoyuan Wang, Tianrui Li, and Yu Zheng. 2018. Deep Distributed Fusion Network for Air Quality Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). 965--973. Google ScholarDigital Library
- Jing Yuan, Yu Zheng, and Xing Xie. 2012. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 186--194. Google ScholarDigital Library
- Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4--9, 2017, San Francisco, California, USA. 1655--1661. Google ScholarDigital Library
- Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, Xiuwen Yi, and Tianrui Li. 2018. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artif. Intell. , Vol. 259 (2018), 147--166.Google ScholarCross Ref
- Y Zheng, F Liu, and HP Hsieh. 2013. When urban air quality inference meets big data. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining pp.(1436--1444). ACM . Google ScholarDigital Library
- Julie Yixuan Zhu, Yu Zheng, Xiuwen Yi, and Victor OK Li. 2016. A gaussian bayesian model to identify spatio-temporal causalities for air pollution based on urban big data. In Computer Communications Workshops (INFOCOM WKSHPS), 2016 IEEE Conference on. IEEE, 3--8.Google ScholarCross Ref
Index Terms
- AccuAir: Winning Solution to Air Quality Prediction for KDD Cup 2018
Recommendations
Deep Distributed Fusion Network for Air Quality Prediction
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAccompanying the rapid urbanization, many developing countries are suffering from serious air pollution problem. The demand for predicting future air quality is becoming increasingly more important to government's policy-making and people's decision ...
Research on Air Quality Prediction Model Based on Bidirectional Gated Recurrent Unit and Attention Mechanism
ICAIP '20: Proceedings of the 4th International Conference on Advances in Image ProcessingA method of air quality prediction based on deep learning is proposed in this paper, that is an air quality prediction model combining bidirectional gated recurrent unit and attention mechanism. Taking cities with air quality monitoring stations as ...
TIP-Air: Tracking Pollution Transfer for Accurate Air Quality Prediction
UbiComp/ISWC '21 Adjunct: Adjunct Proceedings of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on Wearable ComputersAir quality is of vital importance to human health. Accurately predicting air quality, especially its sudden changes, is highly valuable for citizens and governments to make personal and local decisions, design intelligent policies and control ...
Comments