Top

Knowledge and Information Systems

Published in:

07-12-2016 | Regular Paper

Evaluating intelligent knowledge systems: experiences with a user-adaptive assistant agent

Authors: Pauline M. Berry, Thierry Donneau-Golencer, Khang Duong, Melinda Gervasio, Bart Peintner, Neil Yorke-Smith

Published in: Knowledge and Information Systems | Issue 2/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This article examines experiences in evaluating a user-adaptive personal assistant agent designed to assist a busy knowledge worker in time management. We examine the managerial and technical challenges of designing adequate evaluation and the tension of collecting adequate data without a fully functional, deployed system. The CALO project was a seminal multi-institution effort to develop a personalized cognitive assistant. It included a significant attempt to rigorously quantify learning capability, which this article discusses for the first time, and ultimately the project led to multiple spin-outs including Siri. Retrospection on negative and positive experiences over the 6 years of the project underscores best practice in evaluating user-adaptive systems. Lessons for knowledge system evaluation include: the interests of multiple stakeholders, early consideration of evaluation and deployment, layered evaluation at system and component levels, characteristics of technology and domains that determine the appropriateness of controlled evaluations, implications of ‘in-the-wild’ versus variations of ‘in-the-lab’ evaluation, and the effect of technology-enabled functionality and its impact upon existing tools and work practices. In the conclusion, we discuss—through the lessons illustrated from this case study of intelligent knowledge system evaluation—how development and infusion of innovative technology must be supported by adequate evaluation of its efficacy.

previous article The (black) art of runtime evaluation: Are we comparing algorithms or implementations?

next article FIU-Miner (a fast, integrated, and user-friendly system for data mining) and its applications

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

While PTIME can be seen as a type of recommender system, evaluating a task-oriented adaptive system such as PTIME differs significantly from evaluating a classical recommender system, due to the generative, incremental, and dynamic nature of the recommendation task.

Ackerman S (2011) The iPhone 4S’ talking assistant is a military veteran. Wired, 2011. www.wired.com/2011/10/siri-darpa-iphone/. Retrieved 26 Jan 2015

Ambite JL, Barish G, Knoblock CA, Muslea M, Oh J, Minton S (2002) Getting from here to there: Interactive planning and agent execution for optimizing travel. In: Proceedings of fourteenth conference on innovative applications of artificial intelligence (IAAI’02), pp 862–869

Ambite J-L, Chaudhri VK, Fikes R, Jenkins J, Mishra S, Muslea M, Uribe T, Yang G (2006) Design and implementation of the CALO Query Manager. In: Proceedings of eighteenth conference on innovative applications of artificial intelligence (IAAI’06), pp 1751–1758

Aylett R, Brazier F, Jennings N, Luck M, Nwana H, Preist C (1998) Agent systems and applications. Knowl Eng Rev 13(3):303–308CrossRef

Azvine B, Djian D, Tsui KC, Wobcke W (2000) The intelligent assistant: an overview. In: Intelligent systems and soft computing: prospects, tools and applications. Lecture notes in computer science, vol 1804. Springer, New York, NY, pp 215–238

Bank J, Cain Z, Shoham Y, Suen C, Ariely D (2012) Turning personal calendars into scheduling assistants. In: Extended abstracts of twenty-fourth conference on human factors in computing systems (CHI’12)

Berry PM, Gervasio M, Peintner B, Yorke-Smith N (2007) Balancing the needs of personalization and reasoning in a user-centric scheduling assistant. Technical note 561, AI Center, SRI International

Berry PM, Donneau-Golencer T, Duong K, Gervasio MT, Peintner B, Yorke-Smith N (2009a) Evaluating user-adaptive systems: lessons from experiences with a personalized meeting scheduling assistant. In: Proceedings of twenty-first conf. on innovative applications of artificial intelligence (IAAI’09), pp 40–46

Berry PM, Donneau-Golencer T, Duong K, Gervasio MT, Peintner B, Yorke-Smith N (2009b) Mixed-initiative negotiation: facilitating useful interaction between agent/owner pairs. In: Proceedings of AAMAS’09 workshop on mixed-initiative multiagent systems, pp 8–18

10.

Berry PM, Gervasio M, Peintner B, Yorke-Smith N (2011) PTIME: personalized assistance for calendaring. ACM Trans Intell Syst Technol 2(4):40:1–40:22CrossRef

11.

Bosker B (2013a) Tempo smart calendar app boasts Siri pedigree and a calendar that thinks for itself. The Huffington Post. www.huffingtonpost.com/2013/02/13/tempo-smart-calendar-app_n_2677927.html. Retrieved 30 June 2016

12.

Bosker B (2013b) SIRI RISING: the inside story of Siri’s origins—and why she could overshadow the iPhone. The Huffington Post. www.huffingtonpost.com/2013/01/22/siri-do-engine-apple-iphone_n_2499165.html. Retrieved 10 June 2013

13.

Bosse T, Memon ZA, Oorburg R, Treur J, Umair M, de Vos M (2011) A software environment for an adaptive human-aware software agent supporting attention-demanding tasks. Int J Artif Intell Tools 20(5):819–846CrossRef

14.

Brusilovsky P, Karagiannidis C, Sampson D (2004) Layered evaluation of adaptive learning systems. Int J Contin Eng Educ Lifelong Learn 14(4–5):402–421CrossRef

15.

Brusilowsky P (2001) Adaptive hypermedia. User Modell User Adapt Interact 11(1–2):87–110CrossRef

16.

Brzozowski M, Carattini K, Klemmer SR, Mihelich P, Hu J, Ng AY (2006) groupTime: preference-based group scheduling. In: Proceedings of eighteenth conference on human factors in computing systems (CHI’06), pp 1047–1056

17.

Campbell M (2009) Talking paperclip inspires less irksome virtual assistant. New Scientist, 29 July 2009

18.

Carroll JM, Rosson MB (1987) Interfacing thought: cognitive aspects of human-computer interaction. MIT Press, Cambridge

19.

Chalupsky H, Gil Y, Knoblock CA, Lerman K, Oh J, Pynadath DV, Russ TA, Tambe M (2002) Electric elves: agent technology for supporting human organizations. AI Mag 23(2):11–24

20.

Cheyer A, Park J, Giuli R (2005) IRIS: integrate, relate, infer, share. In: Proceedings of 4th international semantic web conference on workshop on the semantic desktop, p 15

21.

Christie CA, Fleischer DN (2010) Insight into evaluation practice: a content analysis of designs and methods used in evaluation studies published in North American evaluation-focused journals. Am J Eval 31(3):326–346CrossRef

22.

Cohen P (1995) Empirical methods for artificial intelligence. MIT Press, CambridgeMATH

23.

Cohen P, Howe AE (1989) Toward AI research methodology: three case studies in evaluation. IEEE Trans Syst Man Cybern 19(3):634–646CrossRef

24.

Cohen PR, Howe AE (1988) How evaluation guides AI research: the message still counts more than the medium. AI Mag 9(4):35–43

25.

Cohen PR, Cheyer AJ, Wang M, Baeg SC (1994) An open agent architecture. In: Huhns MN, Singh MP (eds) Readings in agents. Morgan Kaufmann, San Francisco, pp 197–204

26.

Cramer H, Evers V, Ramlal S, Someren M, Rutledge L, Stash N, Aroyo L, Wielinga B (2008) The effects of transparency on trust in and acceptance of a content-based art recommender. User Model User Adap Int 18(5):455–496CrossRef

27.

Davis FD, Bagozzi RP, Warshaw PR (1989) User acceptance of computer technology: a comparison of two theoretical models. Manag Sci 35:982–1003CrossRef

28.

Deans B, Keifer K, Nitz K et al (2009) SKIPAL phase 2 final technical report. Technical report 1981, SPAWAR Systems Center Pacific, San Diego

29.

Evers V, Cramer H, Someren M, Wielinga B (2010) Interacting with adaptive systemsInteractive collaborative information systems, volume 281 of studies in computational intelligence. Springer, Heidelberg

30.

Freed M, Carbonell J, Gordon G, Hayes J, Myers B, Siewiorek D, Smith S, Steinfeld A, Tomasic A (2008) RADAR: a personal assistant that learns to reduce email overload. In: Proceedings of twenty-third AAAI conference on artificial intelligence (AAAI’08), pp 1287–1293

31.

Gena C (2005) Methods and techniques for the evaluation of user-adaptive systems. Knowl Eng Rev 20(1):1–37CrossRef

32.

Grabisch M (1996) The application of fuzzy integrals in multicriteria decision making. Eur J Oper Res 89(3):445–456CrossRefMATH

33.

Graebner ME, Eisenhardt KM, Roundy PT (2010) Success and failure in technology acquisitions: lessons for buyers and sellers. Acad Manag Perspect 24(3):73–92CrossRef

34.

Greenberg S, Buxton B (2008) Usability evaluation considered harmful (some of the time). In: Proceedings of twentieth conference on human factors in computing systems (CHI’08), pp 111–120

35.

Greer J, Mark M (2016) Evaluation methods for intelligent tutoring systems revisited. Int J Artif Intell Educ 26(1):387–392CrossRef

36.

Grudin J, Palen L (1995) Why groupware succeeds: discretion or mandate? In: Proceedings of 4th European conference on computer-supported cooperative work (ECSCW’95), pp 263–278

37.

Hall J, Zeleznikow J (2001) Acknowledging insufficiency in the evaluation of legal knowledge-based systems: Strategies towards a broad based evaluation model. In: Proceedings of 8th international conference on artificial intelligence and law (ICAIL’01), pp 147–156

38.

Hitt LM, Wu DJ, Zhou X (2002) ERP investment: business impact and productivity measures. J Manag Inf Syst 19:71–98

39.

Höök K (2000) Steps to take before intelligent user interfaces become real. Interact Comput 12(4):409–426CrossRef

40.

Horvitz E, Breese J, Heckerman D, Hovel D, Rommelse K (1998) The Lumière project: Bayesian user modeling for inferring the goals and needs of software users. In: Proceedings of 14th conference on uncertainty in artificial intelligence (UAI’98), pp 256–266

41.

Jameson AD (2009) Understanding and dealing with usability side effects of intelligent processing. AI Mag 30(4):23–40

42.

Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of 22nd ACM conference on knowledge discovery and data mining (KDD’02), pp 133–142

43.

Kafali Ö, Yolum P (2016) PISAGOR: a proactive software agent for monitoring interactions. Knowl Inf Syst 47(1):215–239CrossRef

44.

Kahney L (2010) MS Office helper not dead yet. Wired, 19 April 2001. www.wired.com/science/discoveries/news/2001/04/43065?currentPage=all. Retrieved 8 Oct 2010

45.

Kjeldskov J, Skov MB (2007) Studying usability in sitro: simulating real world phenomena in controlled environments. Int J Hum Comput Interact 22(1–2):7–36CrossRef

46.

Klimt B, Yang Y (2004) The Enron corpus: a new dataset for email classification research. In: Proceedings of 15th European conference on machine learning (ECML’04), number 3201 in lecture notes in computer science. Springer, pp 217–226

47.

Knoblock CA (2006) Beyond the elves: making intelligent agents intelligent. In: Proceedings of AAAI 2006 spring symposium on what went wrong and why: lessons from AI research and applications, p 40

48.

Kokalitcheva K (2015) Salesforce acquires “smart” calendar app Tempo, which is shutting down. Fortune. www.fortune.com/2015/05/29/salesforces-acquires-tempo/. Retrieved 30 June 2016

49.

Kozierok R, Maes P (1993) A learning interface agent for scheduling meetings. In: Proceedings of international workshop on intelligent user interfaces (IUI’93), pp 81–88

50.

Krzywicki A, Wobcke W (2008) Closed pattern mining for the discovery of user preferences in a calendar assistant. In: Nguyen NT, Katarzyniak R (eds) New challenges in applied intelligence technologies. Springer, New York, pp 67–76CrossRef

51.

Langley P (1999) User modeling in adaptive interfaces. In: Proceedings of 7th international conference on user modeling (UM’99), pp 357–370

52.

Lazar J, Feng JH, Hockheiser H (2010) Research methods in human–computer interaction. Wiley, Chichester

53.

Maes P (1994) Agents that reduce work and information overload. J ACM 37(7):30–40CrossRef

54.

McCorduck P, Feigenbaum EA (1983) The fifth generation: artificial intelligence and Japan’s computer challenge to the world. Addison Wesley, Boston

55.

Mitchell T, Caruana R, Freitag D, McDermott J, Zabowski D (1994) Experience with a learning personal assistant. Commun ACM 37(7):80–91CrossRef

56.

Modi PJ, Veloso MM, Smith SF, Oh J (2004) CMRadar: a personal assistant agent for calendar management. In: Proceedings of agent-oriented information systems workshop (AOIS’04), pp 169–181

57.

Moffitt MD, Peintner B, Yorke-Smith N (2006) Multi-criteria optimization of temporal preferences. In: Proceedings of CP’06 workshop on preferences and soft constraints, pp 79–93

58.

Myers KL, Berry PM, Blythe J, Conley K, Gervasio M, McGuinness D, Morley D, Pfeffer A, Pollack M, Tambe M (2007) An intelligent personal assistant for task and time management. AI Mag 28(2):47–61

59.

Nielsen J, Levy J (1994) Measuring usability: preference vs. performance. Commun ACM 37(4):66–75CrossRef

60.

Norman DA (1994) How might people interact with agents. Commun ACM 37(7):68–71CrossRef

61.

Oh J, Smith SF (2004) Learning user preferences in distributed calendar scheduling. In: Proceedings of 5th international conference on practice and theory of automated timetabling (PATAT’04), pp 3–16

62.

Oppermann R (1994) Adaptively supported adaptivity. Int J Hum Comput Stud 40(3):455–472CrossRef

63.

Palen L (1999) Social, individual and technological issues for groupware calendar systems. In: Proceedings of eleventh conference on human factors in computing systems (CHI’99), pp 17–24

64.

Paramythis A, Weibelzahl S, Masthoff J (2010) Layered evaluation of interactive adaptive systems: framework and formative methods. User Model User Adap Interact 20(5):383–453CrossRef

65.

Peintner B, Dinger J, Rodriguez A, Myers K (2009) Task assistant: personalized task management for military environments. In: Proceedings of twenty-first conference on innovative applications of artificialintelligence (IAAI’09), pp 128–134

66.

Refanidis I, Alexiadis A (2011) Deployment and evaluation of Selfplanner, an automated individual task management system. Comput Intell 27(1):41–59MathSciNetCrossRef

67.

Refanidis I, Yorke-Smith N (2010) A constraint-based approach to scheduling an individual’s activities. ACM Trans Intell Syst Technol 1(2):121–1232CrossRef

68.

Rychtyckyj N, Turski A (2008) Reasons for success (and failure) in the development and deployment of AI systems. In: Proceedings of AAAI’08 workshop on what went wrong and why: lessons from AI research and applications, pp 25–31

69.

Schaub F, Könings B, Lang P, Wiedersheim B, Winkler C, Weber M (2014) PriCal: context-adaptive privacy in ambient calendar displays. In: Proc. of sixteeth international conference on pervasive and ubiquitous computing (UbiComp’14), pp 499–510

70.

Shakshuki EM, Hossain SM (2014) A personal meeting scheduling agent. Pers Ubiquit Comput 18(4):909–922CrossRef

71.

Shen J, Li L, Dietterich TG, Herlocker JL (2006) A hybrid learning system for recognizing user tasks from desktop activities and email messages. In: Proceedings of eighteenth international conference on intelligent user interfaces (IUI’06), pp 86–92

72.

SRI International (2013) CALO: cognitive assistant that learns and organizes. https://pal.sri.com. Retrieved 10 June 2013

73.

Steinfeld A, Bennett R, Cunningham K et al (2006) The RADAR test methodology: evaluating a multi-task machine learning system with humans in the loop. Report CMU-CS-06-125, Carnegie Mellon University

74.

Steinfeld A, Bennett R, Cunningham K, et al. (2007a) Evaluation of an integrated multi-task machine learning system with humans in the loop. In: Proceedings of 7th NIST workshop on performance metrics for intelligent systems (PerMIS’07), pp 182–188

75.

Steinfeld A, Quinones P-A, Zimmerman J, Bennett SR, Siewiorek D (2007b) Survey measures for evaluation of cognitive assistants. In: Proceedins of 7th NIST workshop on performance metrics for intelligent systems (PerMIS’07), pp 189–193

76.

Stumpf S, Rajaram V, Li L, Wong W-K, Burnett M, Dietterich T, Sullivan E, Herlocker J (2009) Interacting meaningfully with machine learning systems: three experiments. Int J Hum Comput Stud 67(8):639–662CrossRef

77.

Tambe M, Bowring E, Pearce JP, Varakantham P, Scerri P, Pynadath DV (2006) Electric Elves: what went wrong and why. In: Proceedings of AAAI 2006 spring symposium on what went wrong and why: lessons from AI research and applications, pp 34–39

78.

Van Velsen L, Van Der Geest T, Klaassen R, Steehouder M (2008) User-centered evaluation of adaptive and adaptable systems: a literature review. Knowl Eng Rev 23(3):261–281

79.

Viappiani P, Faltings B, Pu P (2006) Preference-based search using example-critiquing with suggestions. J Artif Intell Res 27:465–503MATH

80.

Wahlster W (ed) (2006) SmartKom: foundations of multimodal dialogue systems. Cognitive technologies. Springer, New York

81.

Weber J, Yorke-Smith N (2008) Time management with adaptive reminders: two studies and their design implications. In: Working Notes of CHI’08 workshop: usable artificial intelligence, pp 5–8

82.

Wobcke W, Nguyen A, Ho VH, Krzywicki A (2007) The smart personal assistant: an overview. In: Proceedings of the AAAI spring symposium on interaction challenges for intelligent assistants, pp 135–136

83.

Yorke-Smith N, Saadati S, Myers KL, Morley DN (2012) The design of a proactive personal agent for task management. Int J Artif Intell Tools 21(1):90–119CrossRef

Title: Evaluating intelligent knowledge systems: experiences with a user-adaptive assistant agent
Authors: Pauline M. Berry
Thierry Donneau-Golencer
Khang Duong
Melinda Gervasio
Bart Peintner
Neil Yorke-Smith
Publication date: 07-12-2016
Publisher: Springer London
Published in: Knowledge and Information Systems / Issue 2/2017
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-016-1011-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 2/2017

Prequential AUC: properties of the area under the ROC curve for data streams with concept drift

FIU-Miner (a fast, integrated, and user-friendly system for data mining) and its applications

The (black) art of runtime evaluation: Are we comparing algorithms or implementations?

A segment-based approach for large-scale ontology matching

Learning extremely shared middle-level image representation for scene classification

-LGP: an improved version of linear genetic programming evaluated in the Ant Trail problem

Premium Partner