ABSTRACT
This paper presents a case study of analyzing and improving intercoder reliability in discourse tagging using statistical techniques. Bias-corrected tags are formulated and successfully used to guide a revision of the coding manual and develop an automatic classifier.
- J. Badsberg. 1995. An Environment for Graphical Models. Ph.D. thesis, Aalborg University.Google Scholar
- R. F. Bales. 1950. Interaction Process Analysis. University of Chicago Press, Chicago, ILL.Google Scholar
- Ann Banfield. 1982. Unspeakable Sentences: Narration and Representation in the Language of Fiction. Routledge & Kegan Paul, Boston.Google Scholar
- S. Bergler. 1992. Evidential Analysis of Reported Speech. Ph.D. thesis, Brandeis University. Google ScholarDigital Library
- Y. M. Bishop, S. Fienberg, and P. Holland. 1975. Discrete Multivariate Analysis: Theory and Practice. The MIT Press, Cambridge.Google Scholar
- R. Bruce and J. Wiebe. 1998. Word sense distinguishability and inter-coder agreement. In Proc. 3rd Conference on Empirical Methods in Natural Language Processing (EMNLP-98), pages 53--60, Granada, Spain, June. ACL SIGDAT.Google Scholar
- R. Bruce and J. Wiebe. 1999. Decomposable modeling in natural language processing. Computational Linguistics, 25(2). Google ScholarDigital Library
- R. Bruce and J. Wiebe. to appear. Recognizing subjectivity: A case study of manual tagging. Natural Language Engineering. Google ScholarDigital Library
- J. Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2): 249--254. Google ScholarDigital Library
- W. Chafe. 1986. Evidentiality in English conversation and academic writing. In Wallace Chafe and Johanna Nichols, editors, Evidentiality: The Linguistic Coding of Epistemology, pages 261--272. Ablex, Norwood, NJ.Google Scholar
- P. Cheeseman and J. Stutz. 1996. Bayesian classification (AutoClass): Theory and results. In Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press. Google ScholarDigital Library
- J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Meas., 20: 37--46.Google ScholarCross Ref
- A. P. Dawid and A. M. Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28: 20--28.Google ScholarCross Ref
- A. Dempster, N. Laird, and D. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39 (Series B): 1--38.Google Scholar
- T. Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1): 75--102. Google ScholarDigital Library
- L. Goodman. 1974. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61: 2: 215--231.Google ScholarCross Ref
- V. Hatzivassiloglou and K. McKeown. 1997. Predicting the semantic orientation of adjectives. In ACL-EACL 1997, pages 174--181, Madrid, Spain, July. Google ScholarDigital Library
- Eduard Hovy. 1987. Generating Natural Language under Pragmatic Constraints. Ph.D. thesis, Yale University. Google ScholarDigital Library
- D. Jurafsky, E. Shriberg, and D. Biasca. 1997. Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual, draft 13. Technical Report 97-01, University of Colorado Institute of Cognitive Science.Google Scholar
- M.-Y. Kan, J. L. Klavans, and K. R. McKeown. 1998. Linear segmentation and segment significance. In Proc. 6th Workshop on Very Large Corpora (WVLC-98), pages 197--205, Montreal, Canada, August. ACL SIGDAT.Google Scholar
- K. Krippendorf. 1980. Content Analysis: An Introduction to its Methodology. Sage Publications, Beverly Hills.Google Scholar
- P. Lazarsfeld. 1966. Latent structure analysis. In S. A. Stouffer, L. Guttman, E. Suchman, P. Lazarfeld, S. Star, and J. Claussen, editors, Measurement and Prediction. Wiley, New York.Google Scholar
- D. Litman. 1996. Cue phrase classification using machine learning. Journal of Artificial Intelligence Research, 5: 53--94. Google ScholarDigital Library
- M. Marcus, Santorini, B., and M. Marcinkiewicz. 1993. Building a large annotated corpus of English: The penn treebank. Computational Linguistics, 19(2): 313--330. Google ScholarDigital Library
- Ted Pedersen and Rebecca Bruce. 1998. Knowledge lean word-sense disambiguation. In Proc. of the 15th National Conference on Artificial Intelligence (AAAI-98), Madison, Wisconsin, July. Google ScholarDigital Library
- R. Quirk, S. Greenbaum, G. Leech, and J. Svartvik. 1985. A Comprehensive Grammar of the English Language. Longman, New York.Google Scholar
- T. Read and N. Cressie. 1988. Goodness-of-fit Statistics for Discrete Multivariate Data. Springer-Verlag Inc., New York, NY.Google Scholar
- K. Samuel, S. Carberry, and K. Vijay-Shanker. 1998. Dialogue act tagging with transformation-based learning. In Proc. COLING-ACL 1998, pages 1150--1156, Montreal, Canada, August. Google ScholarDigital Library
- T. A. van Dijk. 1988. News as Discourse. Lawrence Erlbaum, Hillsdale, NJ.Google Scholar
- J. Wiebe, R. Bruce, and L. Duan. 1997. Probabilistic event categorization. In Proc. Recent Advances in Natural Language Processing (RANLP-97), pages 163--170, Tsigov Chark, Bulgaria, September.Google Scholar
- J. Wiebe, K. McKeever, and R. Bruce. 1998. Mapping collocational properties into machine learning features. In Proc. 6th Workshop on Very Large Corpora (WVLC-98), pages 225--233, Montreal, Canada, August. ACL SIGDAT.Google Scholar
- J. Wiebe, J. Klavans, and M. Y. Kan. in preparation. Verb profiles for subjectivity judgments and text classification.Google Scholar
- J. Wiebe. 1994. Tracking point of view in narrative. Computational Linguistics, 20(2): 233--287. Google ScholarDigital Library
- Development and use of a gold-standard data set for subjectivity classifications
Recommendations
Analysing Wikipedia and gold-standard corpora for NER training
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational LinguisticsNamed entity recognition (ner) for English typically involves one of three gold standards: muc, conll, or bbn, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated ...
Pre-annotating Clinical Notes and Clinical Trial Announcements for Gold Standard Corpus Development: Evaluating the Impact on Annotation Speed and Potential Bias
HISB '12: Proceedings of the 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems BiologyIn this study our aim was to present a series of experiments to evaluate the impact of pre-annotation: (1) on the speed of manual annotation of clinical notes and clinical trial announcements; and (2) test for potential bias if pre-annotation is ...
A gold-standard social media corpus for urban issues
SAC '17: Proceedings of the Symposium on Applied ComputingThis paper introduces a gold-standard corpus extracted from manually labeled tweets concerning urban issues. The main contribution is to provide a labeled tweet dataset which can be useful for building machine-learning classifiers in the urban issues ...
Comments