Abstract
Many requirements documents are written in natural language (NL). However, with the flexibility of NL comes the risk of introducing unwanted ambiguities in the requirements and misunderstandings between stakeholders. In this paper, we describe an automated approach to identify potentially nocuous ambiguity, which occurs when text is interpreted differently by different readers. We concentrate on anaphoric ambiguity, which occurs when readers may disagree on how pronouns should be interpreted. We describe a number of heuristics, each of which captures information that may lead a reader to favor a particular interpretation of the text. We use these heuristics to build a classifier, which in turn predicts the degree to which particular interpretations are preferred. We collected multiple human judgements on the interpretation of requirements exhibiting anaphoric ambiguity and showed how the distribution of these judgements can be used to assess whether a particular instance of ambiguity is nocuous. Given a requirements document written in natural language, our approach can identify sentences that contain anaphoric ambiguity, and use the classifier to alert the requirements writer of text that runs the risk of misinterpretation. We report on a series of experiments that we conducted to evaluate the performance of the automated system we developed to support our approach. The results show that the system achieves high recall with a consistent improvement on baseline precision subject to some ambiguity tolerance levels, allowing us to explore and highlight realistic and potentially problematic ambiguities in actual requirements documents.
Similar content being viewed by others
Notes
As with programming, requirements are inherently an ill-conditioned problem; small changes in their interpretation can lead to hugely differently behaviour of the developed systems. We do not address here the issue of estimating the effect of different interpretations caused by ambiguity.
Plural: anaphora.
Our examples are adapted (in many cases abbreviated) from our collection of requirements documents. We render anaphora in bold and underline antecedent candidates.
In requirements documents, the sentences that describe requirements are not generally specified by the writer. So anaphora instances are collected based on the whole text of the document other than some specific requirements sentences.
Rogue judgments are errors made by judges through carelessness or by accident, rather than judgments that reflect a genuine difference of opinion.
A headword is the main word of a phrase, and the other words in that phrase modify it. For example, for the noun phrase ‘the CCSDS parameter’, ‘parameter’ is the headword, and ‘CCSDS’ is a noun modifier for the headword.
Proper names are names of persons, places, or certain special things. They are typically capitalized nouns, such as ‘London’, ‘John Hunter’.
In fact, as part of our future work we intend to integrate the technology in the popular DOORS requirements management tool.
References
Achour CB, Rolland C, Souveyet C, Maiden NAM (1999) Guiding use case authoring: results of an empirical study. In: Proceedings of 7th IEEE international Requirements Engineering conference (RE’99), pp 36–43
Aone C, Bennet SW (1996) Applying machine learning to anaphora resolution. In: Connectionist, statistical and symbolic approaches to learning for natural language processing, pp 302–314
Berry DM, Kamsties E, Krieger MM (2003) From contract drafting to software specification: linguistic sources of ambiguity
Berry D, Kamsties E (2005) The syntactically dangerous all and plural in specifications. IEEE Softw 22:55–57
Berry D, Bucchiarone A, Gnesi S, Lami G, Trentanni G (2006) A new quality model for natural language requirements specifications. In: Proceedings of the international workshop on Requirements Engineering: Foundation of Software Quality (REFSQ)
Boyd S, Zowghi D, Farroukh A (2005) Measuring the expressiveness of a constrained natural language: an empirical study. In: Proceedings of the 13th IEEE international conference on Requirements Engineering (RE’05), Washington, DC, pp 339–352
Brennan SE, Friedman MW, Pollard C (1987) A centering approach to pronouns. In: Proceedings of the 25th annual meeting of the Association for Computational Linguistics (ACL), pp 155–162
Castaño J, Zhang J, Pustejovsky H (2002) Anaphora resolution in biomedical literature. In: Proceedings of international symposium on reference resolution
Chantree F, Nuseibeh B, de Roeck A, Willis A (2006) Identifying nocuous ambiguities in natural language requirements. In: Proceedings of 14th IEEE international Requirements Engineering conference (RE’06), Minneapolis, USA, pp 59–68
Dagan I, Itai A (1990) Automatic processing of large corpora for the resolution of anaphora references. In: Proceedings of the 13th international conference on Computational Linguistics (COLING’90), pp 1–3
Denber M (1998) Automatic resolution of anaphora in English. Technical report, Eastman Kodak Co
Fabbrini F, Fusani M, Gnesi S, Lami G (2001) The linguistic approach to the natural language requirements, quality: benefits of the use of an automatic tool. In: Proceedings of the twenty sixth annual IEEE computer society—NASA GSFC software engineering workshop, pp 97–105
Fantechi A, Gnesi S, Lami G, Maccari A (2003) Applications of linguistic techniques for use case analysis. Requir Eng J 8(9):161–170
Fuchs NE, Schwitter R (1995) Specifying logic programs in controlled natural language. In: Proceedings of the workshop on computational logic for natural language processing, pp 3–5
Futrelle RP (1999) Ambiguity in visual language theory and its role in diagram parsing. In: Proceedings of the IEEE symposium on visual languages (VL’99), IEEE Computer Society, p 172
Gause DC, Weinberg GM (1989) Exploring requirements: quality before design. Dorset House, New York
Gervasi V, Zowghi D (2010) On the role of ambiguity in RE. In: Proceedings of the 16th international conference on Requirements Engineering: Foundation for Software Quality (REFSQ), pp 248–254
Gnesi S, Lami G, Trentanni G, Fabbrini F, Fusani M (2005) An automatic tool for the analysis of natural language requirements. Int J Comput Syst Sci Eng (IJCSSE) 2(1):53–62
Goldin L, Berry DM (1997) AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom Softw Eng 4(4):375–412
Grosz BJ, Joshi AK, Weinstein S (1995) Centering: a framework for modeling the local coherence of discourse. Comput Linguist 21(2):203–226
Harter DE, Krishnan MS, Slaughter SA (1998) The life cycle effects of software process improvement: a longitudinal analysis. In: Proceedings of the international conference on information systems, association for information systems, pp 346–351
Iida R, Inui K, Matsumoto Y (2005) Anaphora resolution by antecedent identification followed by anaphoricity determination. ACM Trans Asian Lang Inf Process (TALIP) 4(4):417–434
Kamsties E, Berry D, Paech B (2001) Detecting ambiguities in requirements documents using inspections. In: Proceedings of the first Workshop on Inspection in Software Engineering (WISE’01), pp 68–80
Kaiya H, Saeki M (2006) Using domain ontology as domain knowledge for requirements elicitation. In: Proceedings of 14th IEEE international Requirements Engineering conference (RE’06) pp 186–195
Keren G (1992) Improving decisions and judgments: the desirable versus the feasible. In: Wright G, Bolger F (eds) Expertise and decision support. Plenum Press, Berlin, pp 25–46
Kilgarriff A, Rychly P, Smrz P, Tugwell D (2004) The sketch engine. In: Proceedings of the eleventh European Association for Lexicography (EURALEX), pp 105–116
Kim J, Jong CP (2004) BioAR: anaphora resolution for relating protein names to proteome database entries. In: Proceedings of ACL workshop on reference resolution and its applications, pp 79–86
Kiyavitskaya N, Zeni N, Mich L, Berry DM (2008) Requirements for tools for ambiguity identification and measurement in natural language requirements specifications. Requir Eng J 13:207–240
Klebanov B, Wiemer-Hastings PM (2002) Using LSA for pronominal anaphora resolution. In: Proceedings of the third international conference of Computational Linguistics and Intelligent Text Processing (CICLing 2002), Mexico City, Mexico, pp 197–199
Kotonya G, Sommerville I (1998) Requirements engineering processes and techniques. Wiley, New York
Lappin S, Leass H (1994) An algorithm for pronominal anaphora resolution. Comput Linguist 20(4):535–561
Mich L, Garigliano R (2000) Ambiguity measures in requirement engineering. In: Proceedings of international conference on software—theory and practice (ICS2000), pp 39–48
Mich L, Garigliano R (2002) NL-OOPS: a requirements analysis tool based on natural language processing. In: Proceedings of third international conference on data mining, pp 321–330
Mich L, Franch M, Inverardi PN (2004) Market research for requirements analysis using linguistic tools. Requir Eng J 9:40–56
Mitkov R (1998) Robust pronoun resolution with limited knowledge. In: Proceedings of the 18th international conference on Computational Linguistics (COLING’98)/ACL’98, Montreal, Canada, pp 869–875
Mitkov R (2002) Anaphora resolution. Longman, London
Morgan R, Garigliano R, Callaghan P, Poria S, Smith M, Urbanowicz A, Collingham R, Costantino M, Cooper C (1995) Description of the LOLITA system as used in MUC-6. In: Proceedings of the sixth message understanding conference (MUC-6, 1995)
Ng V, Cardie C (2002) Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 104–111
Ng V (2010) Supervised noun phrase coreference research: the first fifteen years. In: Proceedings of the 48nd annual meeting of the Association for Computational Linguistics (ACL-2010), pp 1396–1411
Oliver DE, Bhalotia G, Schwartz AS, Altman RB, Hearst MA (2004) Tools for loading Medline into a local relational database. BMC Bioinform 5:146
Paul M, Yamamoto K, Sumita E (1999) Corpus-based anaphora resolution towards antecedent preference. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics, workshop “coreference and it’s applications”, pp 47–52
Ponzetto SP, Poesio M (2009) State-of-the-art NLP approaches to coreference resolution: theory and practical recipes. In: tutorial abstracts of ACL-IJCNLP 2009, p 6
Poesio M, Artstein R (2008) Introduction to the special issue on ambiguity and semantic judgements. Res Lang Comput 6:241–245
Saggion H, Carvalho A (1994) Anaphora resolution in a machine translation system. In: Proceedings of the international conference on machine translation, pp 1–14
Schneider GM, Martin J, Tsai WT (1992) An experimental study of fault detection in user requirements documents. ACM Trans Softw Eng Methodol 1(2):188–204
Soon WM, Ng HT, Lim DCY (2001) A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27:521–544
Strube M, Muller C (2003) A machine learning approach to pronoun resolution in spoken dialogue. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics (ACL), pp 168–175
Sussman SW, Guinan PJ (1999) Antidotes for high complexity and ambiguity in software development. Inf Manage 36:23–35
Tetreault JR (2001) A corpus-based evaluation of centering and pronoun resolution. Comput Linguist 27(4):507–520
Tsuruoka Y, Tateishi Y, Kim J, Ohta T, McNaught J, Ananiadou S (2005) Developing a robust part-of-speech tagger for biomedical text. In: Advances in informatics, pp 382–392
van Rossum W (1997) The implementation of technologies in intensive care units: ambiguity, uncertainty and organizational reactions. Technical report research report 97B51, Research Institute SOM (Systems, Organisations and Management), University of Groningen, Groningen, The Netherlands. http://irs.ub.rug.nl/ppn/165660821 or http://ideas.repec.org/p/dgr/rugsom/97b51.html#download
Wasow T, Perfors A, Beaver D (2003) The puzzle of ambiguity. In: Orgun O, Sells P (eds) Morphology and the web of grammar: essays in memory of Steven G. Lapointe
Walker M, Joshi A, Prince E (1998) Centering theory in discourse. Oxford University Press, Oxford
Willis A, Chantree F, de Roeck A (2008) Automatic identification of nocuous ambiguity. Res Lang Comput 6(3–4):1–23
Wilson WM, Rosenberg LH, Hyatt LE (1997) Automated analysis of requirement specifications. In: Proceedings of the nineteenth International Conference on Software Engineering (ICSE), pp 161–171
Yang H, de Roeck A, Willis A, Nuseibeh B (2010) A methodology for automatic identification of nocuous ambiguity. In: The 23th international conference on Computational Linguistics (COLING’10), pp 1218–1226
Yang H, Willis A, de Roeck A, Nuseibeh B (2010) Automatic detection of nocuous coordination ambiguities in natural language requirements. In: The 25th IEEE/ACM international conference on Automated Software Engineering (ASE’2010), pp 53–62
Yang H, de Roeck A, Gervasi V., Willis A, Nuseibeh B (2010) Extending nocuous ambiguity analysis for anaphora in natural language requirements. In: Proceedings of 18th IEEE international Requirements Engineering conference (RE’10), pp 25–34
Acknowledgments
This work was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) as part of the MaTREx project (EP/F068859/1), and by the Science Foundation Ireland (SFI grant 03/CE2/I303_1). We are grateful to our research partners at Lancaster University for their input, and to Ian Alexander for his practical insights and guidance. Moreover, we also wish to acknowledge the anonymous reviewers’ insightful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, H., de Roeck, A., Gervasi, V. et al. Analysing anaphoric ambiguity in natural language requirements. Requirements Eng 16, 163–189 (2011). https://doi.org/10.1007/s00766-011-0119-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00766-011-0119-y