skip to main content
10.1145/3397271.3401037acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Domain-Adaptive Neural Automated Essay Scoring

Published:25 July 2020Publication History

ABSTRACT

Automated essay scoring (AES) is a promising, yet challenging task. Current state-of-the-art AES models ignore the domain difference and cannot effectively leverage data from different domains. In this paper, we propose a domain-adaptive framework to improve the domain adaptability of AES models. We design two domain-independent self-supervised tasks and jointly train them with the AES task simultaneously. The self-supervised tasks enable the model to capture the shared knowledge across different domains and act as the regularization to induce a shared feature space. We further propose to enhance the model's robustness to domain variation via a novel domain adversarial training technique. The main idea of the proposed domain adversarial training is to train the model with small well-designed perturbations to make the model robust to domain variation. We obtain the perturbation via a variation of the Fast Gradient Sign Method (FGSM). Our approach achieves new state-of-the-art performance in both in-domain and cross-domain experiments on the ASAP dataset. We also show that the proposed domain adaptation framework is architecture-free and can be successfully applied to different models.

Skip Supplemental Material Section

Supplemental Material

3397271.3401037.mp4

mp4

28.6 MB

References

  1. Dimitrios Alikaniotis, Helen Yannakoudakis, and Marek Rei. 2016. Automatic Text Scoring Using Neural Networks. (2016). https://doi.org/10.18653/v1/p16--1068Google ScholarGoogle Scholar
  2. Martín Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein GAN. CoRR abs/1701.07875 (2017). arXiv:1701.07875 http://arxiv.org/abs/1701.07875Google ScholarGoogle Scholar
  3. Yigal Attali and Jill Burstein. 2006. Automated essay scoring with e-rater® V. 2. The Journal of Technology, Learning and Assessment 4, 3 (2006).Google ScholarGoogle Scholar
  4. Fabio Maria Carlucci, Antonio D'Innocente, Silvia Bucci, Barbara Caputo, and Tatiana Tommasi. 2019. Domain Generalization by Solving Jigsaw Puzzles. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2229--2238. https://doi.org/10.1109/CVPR.2019.00233Google ScholarGoogle ScholarCross RefCross Ref
  5. Xilun Chen and Claire Cardie. 2018. Multinomial Adversarial Networks for Multi-Domain Text Classification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), Marilyn A. Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 1226--1240. https://doi.org/10. 18653/v1/n18-1111Google ScholarGoogle ScholarCross RefCross Ref
  6. Xilun Chen, Yu Sun, Ben Athiwaratkun, Claire Cardie, and Kilian Q. Weinberger. 2018. Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification. Trans. Assoc. Comput. Linguistics 6 (2018), 557--570. https://transacl.org/ojs/index.php/tacl/article/view/1413Google ScholarGoogle ScholarCross RefCross Ref
  7. Madalina Cozma, Andrei M. Butnaru, and Radu Tudor Ionescu. 2018. Automated essay scoring with string kernels and word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, 503--509. https://doi.org/10.18653/v1/P18--2080Google ScholarGoogle ScholarCross RefCross Ref
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. https://doi.org/10.18653/v1/n19-1423Google ScholarGoogle Scholar
  9. Fei Dong and Yue Zhang. 2016. Automatic Features for Essay Scoring - An Empirical Study. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1- 4, 2016, Jian Su, Xavier Carreras, and Kevin Duh (Eds.). The Association for Computational Linguistics, 1072--1077. https://doi.org/10.18653/v1/d16-1115Google ScholarGoogle ScholarCross RefCross Ref
  10. Fei Dong, Yue Zhang, and Jie Yang. 2017. Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada, August 3-4, 2017, Roger Levy and Lucia Specia (Eds.). Association for Computational Linguistics, 153--162. https://doi.org/10.18653/v1/K17-1017Google ScholarGoogle ScholarCross RefCross Ref
  11. Youmna Farag, Helen Yannakoudakis, and Ted Briscoe. 2018. Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), Marilyn A.Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 263--271. https://doi.org/10.18653/v1/n18-1024Google ScholarGoogle ScholarCross RefCross Ref
  12. Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor S. Lempitsky. 2016. Domain-Adversarial Training of Neural Networks. J. Mach. Learn. Res. 17 (2016), 59:1--59:35. http://jmlr.org/papers/v17/15-239.htmlGoogle ScholarGoogle Scholar
  13. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.). 2672--2680. http://papers. nips.cc/paper/5423-generative-adversarial-netsGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. (2015). http://arxiv.org/abs/1412.6572Google ScholarGoogle Scholar
  15. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735--1780. https://doi.org/10.1162/neco.1997.9. 8.1735Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cancan Jin, Ben He, Kai Hui, and Le Sun. 2018. TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, 1088--1097. https://doi.org/10.18653/v1/P18-1100Google ScholarGoogle ScholarCross RefCross Ref
  17. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. (2015). http://arxiv.org/abs/1412.6980Google ScholarGoogle Scholar
  18. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger (Eds.). 1106--1114. http://papers.nips.cc/paper/4824-imagenet-classification-with-deepconvolutional- neural-networksGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jiawei Liu, Yang Xu, and Lingzhe Zhao. 2019. Automated Essay Scoring based on Two-Stage Learning. CoRR abs/1901.07744 (2019). arXiv:1901.07744 http://arxiv.org/abs/1901.07744Google ScholarGoogle Scholar
  20. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial Multi-task Learning for Text Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, 1--10. https://doi.org/10.18653/v1/P17- 1001Google ScholarGoogle ScholarCross RefCross Ref
  21. Ryo Masumura, Yusuke Shinohara, Ryuichiro Higashinaka, and Yushi Aono. 2018. Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun'ichi Tsujii (Eds.). Association for Computational Linguistics, 633--639. https://doi.org/10.18653/ v1/d18--1064Google ScholarGoogle ScholarCross RefCross Ref
  22. Takeru Miyato, Andrew M. Dai, and Ian J. Goodfellow. 2017. Adversarial Training Methods for Semi-Supervised Text Classification. (2017). https://openreview. net/forum?id=r1X3g2_xlGoogle ScholarGoogle Scholar
  23. Farah Nadeem, Huy Nguyen, Yang Liu, and Mari Ostendorf. 2019. Automated Essay Scoring with Discourse-Aware Neural Models. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, BEA@ACL 2019, Florence, Italy, August 2, 2019, Helen Yannakoudakis, Ekaterina Kochmar, Claudia Leacock, Nitin Madnani, Ildikó Pilán, and Torsten Zesch (Eds.). Association for Computational Linguistics, 484--493. https://doi.org/10.18653/ v1/w19-4450Google ScholarGoogle ScholarCross RefCross Ref
  24. Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI (Lecture Notes in Computer Science, Vol. 9910), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, 69--84. https://doi.org/10.1007/978- 3-319-46466-4_5Google ScholarGoogle Scholar
  25. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). ACL, 1532--1543. https://doi.org/10.3115/v1/d14-1162Google ScholarGoogle Scholar
  26. Peter Phandi, Kian Ming Adam Chai, and Hwee Tou Ng. 2015. Flexible Domain Adaptation for Automated Essay Scoring Using Correlated Linear Regression. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, Lluís Màrquez, Chris Callison-Burch, Jian Su, Daniele Pighin, and Yuval Marton (Eds.). The Association for Computational Linguistics, 431--439. https://doi.org/10.18653/ v1/d15-1049Google ScholarGoogle ScholarCross RefCross Ref
  27. Kaveh Taghipour and Hwee Tou Ng. 2016. A Neural Approach to Automated Essay Scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, Jian Su, Xavier Carreras, and Kevin Duh (Eds.). TheAssociation for Computational Linguistics, 1882--1891. https://doi.org/10.18653/v1/d16-1193Google ScholarGoogle ScholarCross RefCross Ref
  28. Yi Tay, Minh C. Phan, Luu Anh Tuan, and Siu Cheung Hui. 2018. SkipFlow: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 5948--5955. https://www.aaai.org/ ocs/index.php/AAAI/AAAI18/paper/view/16431Google ScholarGoogle ScholarCross RefCross Ref
  29. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008. http://papers.nips.cc/paper/7181-attention-is-all-you-needGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yucheng Wang, Zhongyu Wei, Yaqian Zhou, and Xuanjing Huang. 2018. Automatic Essay Scoring Incorporating Rating Schema via Reinforcement Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun'ichi Tsujii (Eds.). Association for Computational Linguistics, 791--797. https://doi.org/10.18653/v1/d18--1090Google ScholarGoogle ScholarCross RefCross Ref
  31. Helen Yannakoudakis, Ted Briscoe, and Ben Medlock. 2011. A New Dataset and Method for Automatically Grading ESOL Texts. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, Dekang Lin, Yuji Matsumoto, and Rada Mihalcea (Eds.). The Association for Computer Linguistics, 180--189. https://www.aclweb.org/anthology/P11-1019/Google ScholarGoogle Scholar

Index Terms

  1. Domain-Adaptive Neural Automated Essay Scoring

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2020
        2548 pages
        ISBN:9781450380164
        DOI:10.1145/3397271

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader