skip to main content
10.1145/3038912.3052671acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks

Authors Info & Claims
Published:03 April 2017Publication History

ABSTRACT

Current Adverse Drug Events (ADE) surveillance systems are often associated with a sizable time lag before such events are published. Online social media such as Twitter could describe adverse drug events in real-time, prior to official reporting. Deep learning has significantly improved text classification performance in recent years and can potentially enhance ADE classification in tweets. However, these models typically require large corpora with human expert-derived labels, and such resources are very expensive to generate and are hardly available. Semi-supervised deep learning models, which offer a plausible alternative to fully supervised models, involve the use of a small set of labeled data and a relatively larger collection of unlabeled data for training. Traditionally, these models are trained on labeled and unlabeled data from similar topics or domains. In reality, millions of tweets generated daily often focus on disparate topics, and this could present a challenge for building deep learning models for ADE classification with random Twitter stream as unlabeled training data. In this work, we build several semi-supervised convolutional neural network (CNN) models for ADE classification in tweets, specifically leveraging different types of unlabeled data in developing the models to address the problem. We demonstrate that, with the selective use of a variety of unlabeled data, our semi-supervised CNN models outperform a strong state-of-the-art supervised classification model by +9.9% F1-score. We evaluated our models on the Twitter data set used in the PSB 2016 Social Media Shared Task. Our results present the new state-of-the-art for this data set.

References

  1. E. Benzschawel. Identifying potential adverse drug events in tweets using bootstrapped lexicons. Master's thesis, Brandeis University, 5 2016.Google ScholarGoogle Scholar
  2. T. Berg-Kirkpatrick, D. Burkett, and D. Klein. An empirical investigation of statistical significance in nlp. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32:D267--D270, 2004.Google ScholarGoogle Scholar
  4. E. G. Brown, L. Wood, and S. Wood. The medical dictionary for regulatory activities (meddra). Drug Safety, 20(2):109--117, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  5. H.-J. Dai, M. Touray, J. Jonnagaddala, and S. Syed-Abdul. Feature engineering for recognizing adverse drug reactions from twitter posts. Information, 7(2):27, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. N. dos Santos and M. Gatti. Deep convolutional neural networks for sentiment analysis of short texts. In COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 2014.Google ScholarGoogle Scholar
  7. D. Egger, F. Uzdilli, M. Cieliebak, and L. Derczynski. Adverse drug reaction detection using an adapted sentiment classifier. In Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, 2016.Google ScholarGoogle Scholar
  8. C. C. Freifeld, J. S. Brownstein, C. M. Menone, W. Bao, R. Filice, T. Kass-Hout, and N. Dasgupta. Digital drug safety surveillance: Monitoring pharmaceutical products in twitter. Drug Safety, 37(5):343--350, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  9. H. Gurulingappa, A. M. Rajput, A. Roberts, J. Fluck, M. Hofmann-Apitius, and L. Toldo. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. Journal of Biomedical Informatics, pages 885--892, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. A. Hasan, Y. Ling, J. Liu, and O. Farri. Exploiting neural embeddings for social media data analysis. In Proceedings of The Twenty-Fourth Text REtrieval Conference, TREC 2015, Gaithersburg, Maryland, USA, November 17-20, 2015, 2015.Google ScholarGoogle Scholar
  11. R. Johnson and T. Zhang. Semi-supervised convolutional neural networks for text categorization via region embedding. In Proceedings of the 29th Annual Conference on Advances in Neural Information Processing Systems (NIPS), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Johnson and T. Zhang. Supervised and semi-supervised text categorization using lstm for region embeddings. arXiv preprint arXiv:1602.02373, 2016.Google ScholarGoogle Scholar
  13. J. Jonnagaddala, T. R. Jue, and H. Dai. Binary classification of twitter posts for adverse drug reactions. In Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, Big Island, HI, USA, 2016.Google ScholarGoogle Scholar
  14. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016.Google ScholarGoogle Scholar
  15. N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  16. N. Kang, B. Singh, C. Bui, Z. Afzal, E. M. van Mulligen, and J. A. Kors. Knowledge-based extraction of adverse drug events from biomedical text. BMC bioinformatics, 15(1):1, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  17. S. Karimi, A. Metke-Jimenez, M. Kemp, and C. Wang. Cadec: A corpus of adverse drug event annotations. Journal of Biomedical Informatics, 55:73--81, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Kim. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Lardon, R. Abdellaoui, F. Bellet, H. Asfari, J. Souvignet, N. Texier, M. C. Jaulent, M. N. Beyens, A. Burgun, and C. Bousquet. Adverse drug reaction identification and extraction in social media: A scoping review. Journal of Medical Internet Research, 17(7):e171, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. Leaman, R. I. Dogan, and Z. Lu. Dnorm: disease name normalization with pairwise learning to rank. Bioinformatics, 29(22):2909--2917, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Leaman, R. Khare, and Z. Lu. Challenges in clinical natural language processing for automated disorder normalization. Journal of Biomedical Informatics, 57:28--37, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Lecun and Y. Bengio. Convolutional Networks for Images, Speech and Time Series. The MIT Press, 1995.Google ScholarGoogle Scholar
  23. J. Y. Lee and F. Dernoncourt. Sequential short-text classification with recurrent and convolutional neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  24. K. Lee, A. Agrawal, and A. Choudhary. Real-time disease surveillance using twitter data: Demonstration on flu and cancer. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '13, pages 1474--1477, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Lee, A. Agrawal, and A. Choudhary. Mining social media streams to improve public health allergy surveillance. In 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 815--822, Aug 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Lee, D. Palsetia, R. Narayanan, M. M. A. Patwary, A. Agrawal, and A. Choudhary. Twitter trending topic classification. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 251--258, Dec 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Limsopatham and N. Collier. Normalising medical concepts in social media texts by learning semantic representation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  28. A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In ACL, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Nikfarjam, A. Sarker, K. O'Connor, R. Ginn, and G. Gonzalez. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. Journal of the American Medical Informatics Association, 22:671--681, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  32. B. Ofoghi, S. Siddiqui, and K. Verspoor. Read-biomed-ss: Adverse drug reaction classification of microblogs using emotional and conceptual enrichment. In Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, 2016.Google ScholarGoogle Scholar
  33. V. Plachouras, J. L. Leidner, and A. G. Garrow. Quantifying self-reported adverse drug events on twitter: Signal and topic analysis. In Proceedings of the 7th 2016 International Conference on Social Media & Society, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Rastegar-Mojarad, R. K. Elayavilli, Y. Yu, and H. Liu. Detecting signals in noisy data-can ensemble classifiers help identify adverse drug reaction in tweets. In Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, 2016.Google ScholarGoogle Scholar
  35. H. Sampathkumar, X. Chen, and B. Luo. Mining adverse drug reactions from online healthcare forums using hidden markov model. BMC Medical Informatics and Decision Making, 14(1):1--18, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  36. A. Sarker, R. E. Ginn, A. Nikfarjam, K. O'Connor, K. Smith, S. Jayaraman, T. Upadhaya, and G. Gonzalez. Utilizing social media data for pharmacovigilance: A review. Journal of Biomedical Informatics, 54:202--212, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Sarker and G. Gonzalez. Portable automatic text classification for adverse drug reaction detection via multi-corpus training. Journal of Biomedical Informatics, 53:196--207, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Tang, B. Qin, and T. Liu. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  39. Unified medical language systemMakeUppercase umls metathesaurus. https://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html. {accessed September-2016}.Google ScholarGoogle Scholar
  40. S. J. Yeleswarapu, A. Rao, T. Joseph, V. Saipradeep, and R. Srinivasan. A pipeline to extract drug-adverse event. BMC Med. Inf. & Decision Making, 14:13, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  41. X. Zhang, J. Zhao, and Y. LeCun. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Z. Zhang, J.-Y. Nie, and X. Zhang. An ensemble method for binary classification of adverse drug reactions from social media. Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, 2016.Google ScholarGoogle Scholar

Index Terms

  1. Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader