The vocabulary problem in human-system communication

Authors:
G. W. Furnas

Bell Communications Research, Inc., Morristown, NJ

Bell Communications Research, Inc., Morristown, NJ
View Profile

,
T. K. Landauer

Bell Communications Research, Inc., Morristown, NJ

Bell Communications Research, Inc., Morristown, NJ
View Profile

,
L. M. Gomez

Bell Communications Research, Inc., Morristown, NJ

Bell Communications Research, Inc., Morristown, NJ
View Profile

,
S. T. Dumais

Bell Communications Research, Inc., Morristown, NJ

Bell Communications Research, Inc., Morristown, NJ
View Profile

Authors Info & Claims

Communications of the ACM Volume 30 Issue 11Nov. 1987pp 964–971https://doi.org/10.1145/32206.32212

Published:01 November 1987Publication History

Communications of the ACM

Abstract

In almost all computer applications, users must enter correct words for the desired objects or actions. For success without extensive training, or in first-tries for new targets, the system must recognize terms that will be chosen spontaneously. We studied spontaneous word choice for objects in five application-related domains, and found the variability to be surprisingly large. In every case two people favored the same term with probability <0.20. Simulations show how this fundamental property of language limits the success of various design methodologies for vocabulary-driven interaction. For example, the popular approach in which access is via one designer's favorite single word will result in 80-90 percent failure rates in many common situations. An optimal strategy, unlimited aliasing, is derived and shown to be capable of several-fold improvements.

References

1 Card. S., Moran. T.P. and Newell, A. The Psychology of Hunra~~- Compufer Interaction. Lawrence Erlbaum Associates. Hillsdale. N.J. 1983. Google ScholarDigital Library
2 Dumais. ST., and Landauer. T.K. Describing categories of objects for menu retrieval systems. Brhavior Research Methods. lrrslrunmrts. b Compu~us, 16. 2 (Apr. 1984). 242-248.Google Scholar
3 Furnas. G.W. Experience with an adaptive indexing scheme. Hunla~~ Factors in Computer Sysfenls, CHI '85 Proceedings. Conference held in San Francisco, CA, April 15-18.1985. 131-135. Google ScholarDigital Library
4 Furnas, G.W., Landauer, T.K. Gomez. L.M. and Dumais. ST. Statistical semantics: Analysis of the potential performance of key-word information systems. Bell System Technical /oumal. 62. 6 (Jul.-Aug. 1983). 1753-1806.Google Scholar
5 Gomez. L.M. and Lochbaum, CC. People can retrieve more objects with enriched key-word vocabularies. But is there a human performance cost? In B. Shackel (Ed.) Human-Computer Inleractm- Interact '84, North-Holland. Amsterdam. 257-261.Google Scholar
6 Good, M.D., Whiteside. J.A. Wixon. D. R. and Jones. S.J. Building a user-derived interface. Comn~un. ACM, 27, 10 (Oct. 1984). 1032-1043. Google ScholarDigital Library
7 Herdan. G. Type Tokerr Mathematics: A Textbook of Mathenratical Linguisfics, S-Gravenhage. Mouton. 1960.Google Scholar
8 { Landauer. T.K., Galotti. K. and Hartwell. S. Natural command names and initial learning: A study of text editing terms. Conrntun. ACM, 26, 7 (Jul. 1983). 495-503. Google ScholarDigital Library
9 Reisner. P. Construction of a growing thesaurus by conversational interaction in a man-machine system. Proceedings of the American Dmmw~fafim Insfifute. 26th Annual Meeting. Chicago, Ill. October 1963.Google Scholar
10 Reisner. P. Evaluation of a 'Growing Thesaurus'. Research Paper RC-1662. August 9. 1966. IBM Watson Research Center. Yorktown Heights, N.Y.Google Scholar
11 { Roberts. T.L. and Moran. T.P. The evaluation of text editors: Methodology and empirical results. C~vvn~utt. ACM, 26. 4 (Apr. 1983). 265-283. Google ScholarDigital Library
12 Sparck-Jones. K. A Statistical interpretation of term specificity and its application in retrieval. I. LXX. 28. 1 (Mar. 1972). 11-21.Google Scholar
13 Whalen, T. and Latremouille. S. The effectiveness of a treestructured index when the existence of information is uncertain. Tcledort Behavioral Research 2: The Drsip of Vidrotex Trre Itldiccs. Ottawa, Canada: Department of Communications. (May 1981). pp. 3-12.Google Scholar
14 Zipf, G.K. Hunra~l Behavior wd the Prirmple of Least Effort. AII Irtfnrdurtuw fn Human Emlogy. Addison-Wesley. Reading. Mass. 1949.Google Scholar

Index Terms

The vocabulary problem in human-system communication

Recommendations

Building Medium-Vocabulary Isolated-Word Lithuanian HMM Speech Recognition System

In this paper, the opening work on the development of a Lithuanian HMM speech recognition system is described. The triphone single-Gaussian HMM speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using HTK toolkit. ...
Read More
Automatic detection of new words in a large vocabulary continuous speech recognition system
HLT '89: Proceedings of the workshop on Speech and Natural Language

In practical large vocabulary speech recognition systems, it is nearly impossible for a speaker to remember which words are in the vocabulary. The probability of the speaker using words outside the vocabulary can be quite high. For the case when a ...
Read More
Unlimited vocabulary speech recognition for agglutinative languages
HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem is that words are generally built by concatenating several prefixes and suffixes to the ...
Read More

Reviews

Reviewer: Richard S. Marcus

.abstract In almost all computer applications, users must enter correct words for the desired objects or actions. For success without extensive training, or in first-tries for new targets, the system must recognize terms that will be chosen spontaneously. We studied spontaneous word choice for objects in five application-related domains, and found the variability suprisingly large. In every case two people favored the same term with probability < 0.20. Simulations show how this fundamental property of language limits the success of various design methodologies for vocabulary-driven interaction. For example the popular approach in which access is via one designers favorite single word will result in 80–90 percent failure rates in many common situations. An optimal strategy, unlimited aliasing, is derived and shown to be capable of several-fold improvements. — Authors Abstract This is an excellent paper for anyone interested in interface design where users choice of words is involved, especially those who believe that vocabulary is not a problem. The authors show that a few aliases (synonymous terms) can improve the success of spontaneous selection markedly, and they suggest that unlimited aliasing is the optimum solution. Three approaches to identify good alternate terms are suggested: (1) having a few users supply a “fair number” of terms apiece (say, 3–6); (2) extracting words from the text of descriptions of objects (a la full-text indexing of documents); and (3) adaptively, by noting what new terms users attempt to apply in operation of the system. The authors recognize that there is an imprecision problem, in that one term can be selected by different users to mean different objects. However, they point out that many aliases may be more precise terms than the common terms for which they substitute and thus may actually improve precision. In any case, the authors point out that there are techniques for managing the ambiguities. Their preferred method is interactively to display choices, ordered by frequency of occurence, for the user in order to enable disambiguation. The authors note that effective disambiguation may require good system explanations, which is a problem in itself. The authors also note other possible disambiguation methods (multiterm Boolean expressions, formal query languages, and natural language understanding) but turn away from these as being difficult to implement and not very successful. While I recognize the cogent analysis of much of this paper, I might question the strong emphasis on spontaneous selection. Perhaps some pre-selection mediation by the system (e.g., via menus) could avoid much post-selection disambiguation. More generally, the authors apparent aversion to considering a combination of methods in approaching this problem is questionable, although I recognize that any one of those denigrated by the authors may be inferior to unlimited aliasing as a single solution.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 30, Issue 11
Nov. 1987
87 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/32206
Editor:
Peter J. Denning
NASA Ames Research Center, Moffett Field, CA
Issue’s Table of Contents
Copyright © 1987 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 1987
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 958
  Total Citations
  View Citations
- 6,663
  Total Downloads
- Downloads (Last 12 months)645
- Downloads (Last 6 weeks)78
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The vocabulary problem in human-system communication

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Building Medium-Vocabulary Isolated-Word Lithuanian HMM Speech Recognition System

Automatic detection of new words in a large vocabulary continuous speech recognition system

Unlimited vocabulary speech recognition for agglutinative languages

Reviews

Access critical reviews of Computing literature here