skip to main content
10.1145/3292500.3330787acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

AccuAir: Winning Solution to Air Quality Prediction for KDD Cup 2018

Published:25 July 2019Publication History

ABSTRACT

Since air pollution seriously affects human heath and daily life, the air quality prediction has attracted increasing attention and become an active and important research topic. In this paper, we present AccuAir, our winning solution to the KDD Cup 2018 of Fresh Air, where the proposed solution has won the 1st place in two tracks, and the 2nd place in the other one. Our solution got the best accuracy on average in all the evaluation days. The task is to accurately predict the air quality (as indicated by the concentration of PM2.5, PM10 or O3) of the next 48 hours for each monitoring station in Beijing and London. Aiming at a cutting-edge solution, we first presents an analysis of the air quality data, identifying the fundamental challenges, such as the long-term but suddenly changing air quality, and complex spatial-temporal correlations in different stations. To address the challenges, we carefully design both global and local air quality features, and develop three prediction models including LightGBM, Gated-DNN and Seq2Seq, each with novel ingredients developed for better solving the problem. Specifically, a spatial-temporal gate is proposed in our Gated-DNN model, to effectively capture the spatial-temporal correlations as well as temporal relatedness, making the prediction more sensitive to spatial and temporal signals. In addition, the Seq2Seq model is adapted in such a way that the encoder summarizes useful historical features while the decoder concatenate weather forecast as input, which significantly improves prediction accuracy. Assembling all these components together, the ensemble of three models outperforms all competing methods in terms of the prediction accuracy of 31 days average, 10 days average and 24-48 hours.

Skip Supplemental Material Section

Supplemental Material

p1842-luo.mp4

mp4

626.1 MB

References

  1. Phillip Boyle and Marcus Frean. 2005. Multiple output Gaussian process regression. (2005).Google ScholarGoogle Scholar
  2. L Bruckman. 1993. Overview of the enhanced geocoded emissions modeling and projection (enhanced GEMAP) system. Regional Photochemical Measurement and Modeling Studies. Volume , Vol. 2 (1993), 8--12.Google ScholarGoogle Scholar
  3. William R Burrows, Mario Benjamin, Stephen Beauchamp, Edward R Lord, Douglas McCollor, and Bruce Thomson. 1995. CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. Journal of applied meteorology , Vol. 34, 8 (1995), 1848--1862.Google ScholarGoogle ScholarCross RefCross Ref
  4. Jianjun Chen, Jin Lu, Jeremy C. Avise, John A. DaMassa, Michael J. Kleeman, and Ajith P. Kaduwela. 2014. Seasonal modeling of PM2.5 in California's San Joaquin Valley. Atmospheric Environment , Vol. 92 (2014), 182 -- 190.Google ScholarGoogle ScholarCross RefCross Ref
  5. Cristiana Croitoru and Ilinca Nastase. 2018. A state of the art regarding urban air quality prediction models. In E3S Web of Conferences, Vol. 32. EDP Sciences, 01010.Google ScholarGoogle Scholar
  6. Xiao Feng, Qi Li, Yajie Zhu, Junxiong Hou, Lingyan Jin, and Jingjie Wang. 2015. Artificial neural networks forecasting of PM2. 5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmospheric Environment, Vol. 107 (2015), 118--128.Google ScholarGoogle ScholarCross RefCross Ref
  7. Xiao Feng, Qi Li, Yajie Zhu, Jingjie Wang, Heming Liang, and Ruofeng Xu. 2014. Formation and dominant factors of haze pollution over Beijing and its peripheral areas in winter. Atmospheric Pollution Research, Vol. 5, 3 (2014), 528--538.Google ScholarGoogle ScholarCross RefCross Ref
  8. Vitor Campanholo Guizilini and Fabio Tozeto Ramos. 2015. A Nonparametric Online Model for Air Quality Prediction. In AAAI. 651--657. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jaein I. Jeong, Rokjin J. Park, Jung-Hun Woo, Young-Ji Han, and Seung-Muk Yi. 2011. Source contributions to carbonaceous aerosol concentrations in Korea. Atmospheric Environment , Vol. 45, 5 (2011), 1116 -- 1125.Google ScholarGoogle ScholarCross RefCross Ref
  10. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3146--3154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David Krueger and Roland Memisevic. 2015. Regularizing RNNs by Stabilizing Activations. CoRR , Vol. abs/1511.08400 (2015).Google ScholarGoogle Scholar
  12. Xiang Li, Ling Peng, Yuan Hu, Jing Shao, and Tianhe Chi. 2016. Deep learning architecture for air quality predictions. Environmental Science and Pollution Research , Vol. 23, 22 (2016), 22408--22417.Google ScholarGoogle ScholarCross RefCross Ref
  13. Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Swish: a self-gated activation function. arXiv preprint arXiv:1710.05941 (2017).Google ScholarGoogle Scholar
  14. Rouzbeh Shad, Mohammad Saadi Mesgari, Arefeh Shad, et almbox. 2009. Predicting air pollution using fuzzy genetic linear membership kriging in GIS. Computers, environment and urban systems , Vol. 33, 6 (2009), 472--481.Google ScholarGoogle Scholar
  15. Xuan Song, Hiroshi Kanasugi, and Ryosuke Shibasaki. 2016. DeepTransport: Prediction and Simulation of Human Mobility and Transportation Mode at a Citywide Level.. In IJCAI, Vol. 16. 2618--2624. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zheng Yan Jie Lu Guangquan Zhang Wang, Bin and Tianrui Li. 2018. Deep Multi-task Learning for Air Quality Prediction. In International Conference on Neural Information Processing. Springer, Cham, 93--103.Google ScholarGoogle Scholar
  17. Xiuwen Yi, Junbo Zhang, Zhaoyuan Wang, Tianrui Li, and Yu Zheng. 2018. Deep Distributed Fusion Network for Air Quality Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). 965--973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jing Yuan, Yu Zheng, and Xing Xie. 2012. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 186--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4--9, 2017, San Francisco, California, USA. 1655--1661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, Xiuwen Yi, and Tianrui Li. 2018. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artif. Intell. , Vol. 259 (2018), 147--166.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y Zheng, F Liu, and HP Hsieh. 2013. When urban air quality inference meets big data. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining pp.(1436--1444). ACM . Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Julie Yixuan Zhu, Yu Zheng, Xiuwen Yi, and Victor OK Li. 2016. A gaussian bayesian model to identify spatio-temporal causalities for air pollution based on urban big data. In Computer Communications Workshops (INFOCOM WKSHPS), 2016 IEEE Conference on. IEEE, 3--8.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. AccuAir: Winning Solution to Air Quality Prediction for KDD Cup 2018

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      July 2019
      3305 pages
      ISBN:9781450362016
      DOI:10.1145/3292500

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader