ABSTRACT
This paper presents the architecture of Almond, an open, crowdsourced, privacy-preserving and programmable virtual assistant for online services and the Internet of Things (IoT). Included in Almond is Thingpedia, a crowdsourced public knowledge base of natural language interfaces and open APIs. Our proposal addresses four challenges in virtual assistant technology: generality, interoperability, privacy, and usability. Generality is addressed by crowdsourcing Thingpedia, while interoperability is provided by ThingTalk, a high-level domain-specific language that connects multiple devices or services via open APIs. For privacy, user credentials and user data are managed by our open-source ThingSystem, which can be run on personal phones or home servers. Finally, we address usability by providing a natural language interface, whose capability can be extended via training with the help of a menu-driven interface.
We have created a fully working prototype, and crowdsourced a set of 187 functions across 45 different kinds of devices. Almond is the first virtual assistant that lets users specify trigger-action tasks in natural language. Despite the lack of real usage data, our experiment suggests that Almond can understand about 40% of the complex tasks when uttered by a user familiar with its capability.
- J. Andreas, A. Vlachos, and S. Clark. Semantic parsing as machine translation. In ACL (2), pages 47--52, 2013.Google Scholar
- Apple HomeKit. http://www.apple.com/ios/home.Google Scholar
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In The semantic web, pages 722--735. Springer, 2007. Google ScholarDigital Library
- J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic parsing on freebase from question-answer pairs. In EMNLP, volume 2, page 6, 2013.Google Scholar
- J. Berant and P. Liang. Semantic parsing via paraphrasing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14), pages 1415--1425, 2014.Google ScholarCross Ref
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247--1250. ACM, 2008. Google ScholarDigital Library
- D. L. Chen and R. J. Mooney. Learning to interpret natural language navigation instructions from observations. In AAAI, volume 2, pages 1--2, 2011. Google ScholarDigital Library
- C. Dixon, R. Mahajan, S. Agarwal, A. Brush, B. Lee, S. Saroiu, and P. Bahl. An operating system for the home. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pages 337--352, 2012. Google ScholarDigital Library
- L. Dong and M. Lapata. Language to logical form with neural attention. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-16), pages 33--34, 2016.Google ScholarCross Ref
- Google Weave. https://developers.google.com/weave.Google Scholar
- M. Gordon and C. Breazeal. Designing a virtual assistant for in-car child entertainment. In Proceedings of the 14th International Conference on Interaction Design and Children, pages 359--362. ACM, 2015. Google ScholarDigital Library
- P. H. Harvey, E. Currie, P. Daryanani, and J. C. Augusto. Enhancing student support with a virtual assistant. In International Conference on E-Learning, E-Education, and Online Training, pages 101--109. Springer, 2015.Google Scholar
- J. Huang and M. Cakmak. Supporting mental model accuracy in trigger-action programming. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 215--225. ACM, 2015. Google ScholarDigital Library
- If This Then That. http://ifttt.com.Google Scholar
- R. J. Kate, Y. W. Wong, and R. J. Mooney. Learning to transform natural to formal languages. In Proceedings of the National Conference on Artificial Intelligence, volume 20, page 1062, 2005. Google ScholarDigital Library
- P. Kenny, T. Parsons, J. Gratch, and A. Rizzo. Virtual humans for assisted health care. In Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments, page 6. ACM, 2008. Google ScholarDigital Library
- W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. The human genome browser at UCSC. Genome research, 12(6):996--1006, 2002.Google Scholar
- C. Liu, X. Chen, E. C. Shin, M. Chen, and D. Song. Latent attention for if-then program synthesis. In Advances in Neural Information Processing Systems, pages 4574--4582, 2016.Google Scholar
- S. Mayer, N. Inhelder, R. Verborgh, R. Van de Walle, and F. Mattern. Configuration of smart environments made simple: Combining visual modeling with semantic metadata and reasoning. In Internet of Things (IOT), 2014 International Conference on the, pages 61--66. IEEE, 2014.Google ScholarCross Ref
- Nest. https://developer.nest.com.Google Scholar
- P. Pasupat and P. Liang. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53nd Annual Meeting of the Association for Computational Linguistics (ACL-15), pages 1470--1480, 2015.Google ScholarCross Ref
- C. Quirk, R. Mooney, and M. Galley. Language to code: Learning semantic parsers for if-this-then-that recipes. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-15), pages 878--888, 2015.Google ScholarCross Ref
- Samsung SmartThings. http://www.smartthings.com.Google Scholar
- Sportradar. http://sportradar.us.Google Scholar
- K. N. Truong, E. M. Huang, and G. D. Abowd. Camp: A magnetic poetry interface for end-user programming of capture applications for the home. In International Conference on Ubiquitous Computing, pages 143--160. Springer, 2004.Google ScholarCross Ref
- B. Ur, E. McManus, M. Pak Yong Ho, and M. L. Littman. Practical trigger-action programming in the smart home. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 803--812. ACM, 2014. Google ScholarDigital Library
- M. Walch, M. Rietzler, J. Greim, F. Schaub, B. Wiedersheim, and M. Weber. homeblox: making home automation usable. In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 295--298. ACM, 2013. Google ScholarDigital Library
- Y. Wang, J. Berant, and P. Liang. Building a semantic parser overnight. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-15), pages 1332--1342, 2015.Google ScholarCross Ref
- E. Weisstein et al. Wolfram mathworld, 2007.Google Scholar
- C. Xiao, M. Dymetman, and C. Gardent. Sequence-based structured prediction for semantic parsing. Proceedings Association For Computational Linguistics, Berlin, pages 1341--1350, 2016.Google Scholar
- L. S. Zettlemoyer and M. Collins. Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pages 658--666. AUAI Press, 2005. Google ScholarDigital Library
Index Terms
- Almond: The Architecture of an Open, Crowdsourced, Privacy-Preserving, Programmable Virtual Assistant
Recommendations
When Virtual Reality Meets Internet of Things in the Gym: Enabling Immersive Interactive Machine Exercises
With the advent of immersive virtual reality (VR) head-mounted displays (HMD), we envision that immersive VR will revolutionize the personal fitness experience in our daily lives. Toward this vision, we present JARVIS, a virtual exercise assistant that ...
Exploring the Potential of Speech-based Virtual Assistants in Mixed Reality Applications for People with Cognitive Disabilities
AVI '20: Proceedings of the International Conference on Advanced Visual InterfacesMixed Reality (MR) has been receiving increasing interest in the rehabilitation of people with Cognitive Disabilities. The power of MR in the context of therapies is the possibility to maintain a physical and psychological relationship with the ...
Exploring requirements and opportunities of conversational user interfaces for the cognitively impaired
MobileHCI '18: Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services AdjunctInteracting with traditional user interfaces can be challenging for people with cognitive impairments. Speech-based conversational interfaces and virtual assistants such as Amazon's Alexa and Apple's Siri might provide great potential for this user ...
Comments