Analysing interactive voice services

doi:10.1016/j.comnet.2004.03.005

Computer Networks

Volume 45, Issue 5, 5 August 2004, Pages 665-685

https://doi.org/10.1016/j.comnet.2004.03.005 Get rights and content

Abstract

IVR (Interactive Voice Response) services are increasingly prevalent in automated telephone enquiry systems. VoiceXML (Voice eXtensible Markup Language) has become one of the leading languages for IVR. The nature of IVR services is introduced, along with an explanation of how they are represented in VoiceXML. However a VoiceXML description is at a low level, so it is difficult to gain an overview of the service that is offered. There is also no rigorous way to check the integrity of an IVR application.

Cress (Chisel Representation Employing Systematic Specification) is a graphical notation for describing services in an abstract, language-independent manner. For this paper, IVR services are described with Cress, and translated into Lotos (Language Of Temporal Ordering Specification) for automated analysis. Because of the infinite state space, it is not practicable to formally verify the generated specifications. Instead, the focus is on more practical solutions. The properties of a specification are checked by including observer processes to monitor undesirable situations like repeatedly prompting the user for input. Mustard (Multiple-Use Scenario Test And Refusal Description) is introduced as a language for defining scenario-based tests of services. The approach is illustrated with sample tests of IVR services. It is seen how Mustard helps to build confidence in an IVR application.

The paper also introduces a feature concept for IVR, and discusses feature interaction in this context. General categories of IVR feature interaction are presented. It is shown how Cress and Mustard combine to help discover interactions among IVR features.

Introduction

IVR (Interactive Voice Response) services have been developed during the past decade to provide a more satisfactory alternative to touch-tone systems. Touch-tone enquiry systems (`press 2 for sales') are often disliked by users due to their inflexible and crude interfaces. IVR allows users to do what they expect in a telephone call, namely to speak and to listen. IVR is convenient for users on the move, who may have little more than a mobile telephone. Although WAP (Wireless Access Protocol) is intended to provide Web browsing for mobile users, it has seen only limited use. Some categories of users (e.g., the partially sighted or those without Internet access) are also disadvantaged if information is provided only via the Web.

Although IVR is not new, it was initially supported by a variety of proprietary solutions. VoiceXML (Voice eXtensible Markup Language [30]) has been an important development in the standardisation of IVR. There are competing standards for IVR, but VoiceXML seems to have attracted the most support. The basic idea of VoiceXML is that users `fill in' fields of forms by speaking in response to prompts. VoiceXML platforms usually include sophisticated support for TTS (Text To Speech, i.e., synthesised speech output) and STT (Speech To Text, i.e., speech recognition). The completed information is then typically submitted to a program or database for further processing. VoiceXML lends itself to a wide variety of applications such as news and sports information, telephone banking, sales enquiries and orders, and travel bookings. For an application such as banking, VoiceXML could provide a voice-based front-end to an existing bank system. There could also be other front-ends to the same system, e.g., for Web browsing or WAP access.

As an application of XML, VoiceXML is textual in form. However, most commercial packages (e.g., Covigo Studio, Nuance V-Builder, Voxeo Designer) provide a graphical representation. VoiceXML has a nested, hierarchical structure that most packages reflect in graphical form. Some representations emphasise the relationship among VoiceXML elements, e.g., the flow of control among the fields of a form. Commercial packages are (not surprisingly) very close to VoiceXML since their aim is direct support of scripting with VoiceXML. As a programming language, VoiceXML focuses on how an IVR service is realised and not what it does. It can therefore be difficult to get a clear overview from VoiceXML of an IVR service.

It is easy, and even common, to write VoiceXML scripts that have implicit loops and complicated logic. To some extent, VoiceXML encourages this because its form interpretation algorithm requires multiple passes through a form. The consequences of certain VoiceXML constructs may not be immediately obvious, e.g., they may cause an indefinite loop.

VoiceXML adopts a pragmatic and programmatic approach to development. There is no way to formally check or analyse a VoiceXML script. Instead, VoiceXML must be debugged using traditional software engineering methods.

VoiceXML applications are essentially single scripts, though these can be made up from a number of individual documents (i.e., files). VoiceXML supports unconditional transfers (goto) and subroutine-like calls (subdialog) to other documents. However there is no equivalent of a feature. In fact, VoiceXML does not even use the term service.

In telephony, services are often composed from self-contained features. A feature is an additional function that is triggered automatically (e.g., call diversion or call blocking). From the developer's point of view, a feature is triggered by certain conditions and is not explicitly called at some point in the call processing code. Features can therefore easily add supplementary capabilities to basic call processing. The value of features has been amply demonstrated in the IN (Intelligent Network).

Cress (Chisel Representation Employing Systematic Specification) is a front-end for defining and formalising services. Cress was initially based on the industrial notation Chisel developed by BellCore [1]. However, Cress has been considerably extended since its beginnings. In particular, it supports the notion of plug-in domains: the vocabulary and concepts required for each application area are defined separately. Cress has been demonstrated on services from the IN (Intelligent Network [24]), Internet telephony [25], [27], and IVR (Interactive Voice Response [27], [28]).

Cress aims to combine the advantages of an accessible graphical notation, analysis via translation to formal languages, and realisation via translation to implementation languages. That is, the same service diagrams can be used for multiple purposes. Cress is neutral with respect to the target language. For formal analysis, Cress diagrams are automatically translated to Lotos (Language Of Temporal Ordering Specification [11]) or to SDL (Specification and Description Language [13]); see [28] and [26], respectively. For implementation, Cress diagrams are automatically translated to Perl (for SIP services) or to VoiceXML (for IVR services); see [25] and [27], [28], respectively.

For IVR services, Cress is intended to complement existing VoiceXML platforms. In particular, Cress offers the following:

•
Cress is a platform-independent graphical notation for a substantial (but not complete) proportion of IVR applications. A Cress service is represented at a more abstract level than VoiceXML, making it easier to gain an overview of the service. VoiceXML is merely a target language for Cress, so it should be possible to translate Cress diagrams into other IVR languages.
•
Cress supports features and services. These are not directly recognised in IVR, so their addition provides useful extra capabilities. Without features, IVR applications have to explicitly call supplementary capabilities.
•
It can be difficult to check whether a realistic IVR application will behave correctly in all circumstances (e.g., will not stop prematurely or loop indefinitely). Through translation to a formal language, Cress supports rigorous analysis of IVR services. Cress is also accompanied by a scenario-based testing language that is used to validate IVR applications. The same approach also contributes to detecting feature interactions.
•
VoiceXML is not formally defined. Some concepts are only vaguely described (e.g., event handling) and some are loosely defined (e.g., the semantics of expressions and variables). Through translation to a formal language, Cress contributes to a more precise understanding of VoiceXML.

Graphical notations for services are, of course, fairly common. Although it has a graphical form, SDL (Specification and Description Language [13]) is a general-purpose language that was not designed particularly to represent communications services. MSCs (Message Sequence Charts [12]) are higher-level and more straightforward in their representation of services. UCMs (Use Case Maps [2]) have been used to describe communications services graphically. However none of these approaches has support for specific domains, and they cannot be translated into a range of languages. Perhaps surprisingly, there does not appear to have been other work on graphical or formal specification of IVR services.

As noted earlier, there are a number of commercial tools for VoiceXML. These offer rather more complete support for IVR than Cress. However they are focused on VoiceXML only, and do not offer any kind of formal analysis. Their (graphical) representations of services are very close to VoiceXML, so they are useful only to specialists. Fig. 1 is an example of what VoiceXML looks like in a commercial tool; this corresponds to the Donation service described by Cress in Fig. 2.

Commercial VoiceXML tools do not support rigorous analysis of IVR services. The translation of Cress into Lotos or SDL gives formal meaning to IVR service descriptions. The translation provides access to any analytic technique based on these languages. Among these, the author's own approach [23] is one of several that might be used.

Feature interaction in telephony is a much studied issue (e.g., [7]). The basic problem is that independently designed features can interfere with each other. It has been shown that feature interactions occur in a variety of other domains such as building control [15], email [5], [9], Internet telephony [14], [25], lift control [17], mobile communication [32], multimedia [4], [22], policies [19], and the web [31]. The work reported here shows how feature interaction can arise with IVR.

The new contributions made by this paper are the application of Cress to IVR services and features, the rigorous analysis of IVR applications, and the analysis of feature interactions in IVR. Section 2 introduces IVR and its realisation using VoiceXML. Section 3 gives an overview of the Cress notation as used to describe IVR services. Section 4 describes how IVR services are analysed, including the use of observer processes and a specialised test notation. Section 5 discusses the nature of feature interaction in IVR, and shows how Cress can be used to discover feature interactions.

Section snippets

Interactive voice response systems

As an example of IVR, the following hypothetical dialogue might occur with a telephone banking system:

System:
You have called the Automated Phone Bank.
What would you like to do?
User:
Silence
System:
You can ask for your balance, request a statement, or close your account
User:
My balance please
System:
What is your account number?
User:
Four eight five six seven one
System:
There is no account with this number, please try again
User:
Four three five six seven one
System:
What is the PIN for this account?
User:
Five three eight one
System:
Your balance is seven hundred and fifty

The Cress notation

Cress is a graphical notation for describing the possible behaviour of a service. State is intentionally implicit in Cress because this allows more abstract descriptions to be given. Arcs between states may be guarded by event conditions or by value conditions. Cress has explicit support for defining and composing features. Cress also has plug-in vocabularies that adapt it for different application domains. These allow Cress diagrams to be thoroughly checked for syntactic and static semantic

Analysis in general

An IVR application can be executed like any script. Some commercial packages allow VoiceXML to be run in an offline IDE, while others require the script to be run by an online environment. In either case, debugging follows typical programming practice. This is, of course, time-consuming and risks undetected errors. Since Cress diagrams can be translated into Lotos and SDL, this offers new possibilities for automated analysis. For illustration, this paper concentrates on what can be done with L

Categories of IVR feature interaction

It has been seen how the integrity of an IVR application can be checked through the use of observer processes and tests. The term `feature' is used loosely in the following to mean any addition to the base application, as well as to mean a Cress feature diagram. The addition of further features to an IVR application can lead to interactions in much the same way as for telephony. However the nature of interactions is rather different for IVR. The following categories of feature interactions can

Conclusion

The nature of IVR services and their representation in VoiceXML have been explained. Cress has been introduced as a general graphical notation for services, with particular emphasis on IVR. Cress is formalised through translation to languages like Lotos (the focus of this paper) and SDL. However Cress can also be translated for implementation into languages like VoiceXML (the focus of this paper) and Perl.

Cress offers the following benefits for IVR development:

•
platform and language independence;

Acknowledgements

Nuance Corporation kindly provided an academic licence for use of Nuance V-Builder™ in this work.

Kenneth J. Turner graduated in Electrical Engineering from the University of Glasgow in 1970. He was awarded a Ph.D. from the University of Edinburgh in 1974 for his research on Pattern Recognition. Until 1986 he was employed by International Computers Ltd. as a data communications consultant. During this period he specialised in systems architecture, data communications and formal methods. This led to his appointment as Professor of Computing Science at the University of Stirling in 1987. His

References (32)

A. Pnueli
A temporal logic of concurrent programs
Theoretical Computer Science
(1981)
A.V. Aho, S. Gallagher, N.D. Griffeth, C.R. Schell, D.F. Swayne, SCF3/Sculptor with Chisel: requirements engineering...
D. Amyot, L. Charfi, N. Gorse, T. Gray, L.M.S. Logrippo, J. Sincennes, B. Stepien, T. Stepien, T. Ware, Feature...
M. Ben-Ari et al.
The temporal logic of branching time
Acta Informatica
(1983)
L. Blair, J. Pang, Feature interactions––Life beyond traditional telephony, in: M.H. Calder, E.H. Magill (Eds.),...
M. Calder, A. Miller, Generalising feature interactions in email, in: D. Amyot, L. Logrippo (Eds.), Proceedings of the...
M. Calder, C.E. Shankland. A symbolic semantics and bisimulation for full Lotos, in: M. Kim, B. Chin, S. Kang, D. Lee...
E.J. Cameron et al.
A feature-interaction benchmark for IN and beyond
IEEE Communications Magazine
(1993)
J.-C. Fernández, H. Garavel, A. Kerbrat, R. Mateescu, L. Mounier, M. Sighireanu, CADP (Cæsar Aldébaran Development...
R.J. Hall. Feature interactions in electronic mail, in: M.H. Calder, E.H. Magill (Eds.), Proceedings of the 6th Feature...

G. Holzmann, D. Peled. The state of Spin, in: Proceedings of the 8th International Conference on Computer Aided...

ISO/IEC, Information Processing Systems––Open Systems Interconnection––LOTOS––a formal description technique based on...

ITU, Message Sequence Chart (MSC), ITU-T Z.120, International Telecommunications Union, Geneva, Switzerland,...

ITU, Specification and Description Language, ITU-T Z.100, International Telecommunications Union, Geneva, Switzerland,...

J. Lennox, H. Schulzrinne, Feature interaction in Internet telephony, in: M.H. Calder, E.H. Magill (Eds.), Proceedings...

A. Metzger, C. Webel, Feature interaction detection in building control systems by means of a formal product model, in:...

Cited by (19)

Rigorous development of prompting dialogues
2011, Journal of Biomedical Informatics
Citation Excerpt :
For dialogues in general, cress supports a much richer range of constructs than is described here. For example, dialogues can deal with a wide variety of user responses, event guards, dialogue-defined events at multiple levels, configurable reprompting, and flexible data handling [49]. For people with cognitive impairment, it would be very undesirable to have complex prompts and options.
Objectives: The aim was to support people with cognitive impairment through speech-based dialogues that guide them through everyday tasks such as activities of daily living. The research objectives were to simplify the design of prompting dialogues, to automate the checking of prompting dialogues for syntactic and semantic errors, and to automate the translation of dialogue designs into a form that allows their ready deployment. Approach: Prompting dialogues are described using cress (Communication Representation Employing Systematic Specification). This is a notation and toolset that allows the flow in a service (such as a dialogue) to be defined in an understandable and graphical way. A dialogue diagram is automatically translated into a formal specification for rigorous verification and validation. Once confidence has been built in the dialogue design, the dialogue diagram is automatically translated into VoiceXML and deployed on a voice platform. Results: All key objectives of the work have been achieved. A variety of significant dialogues have been successfully represented using the cress notation. These dialogues have been automatically analysed through formal verification and validation in order to detect anomalies. Finally, the dialogues have been automatically realised on a VoiceXML platform and have been evaluated with volunteer users.
A rigorous approach to orchestrating grid services
2007, Computer Networks
Although conceived for web services, it is shown how Bpel (Business Process Execution Language) can be used to orchestrate a collection of grid services. This is achieved using the technique of Cress (Communication Representation Employing Systematic Specification) to describe the composition of grid services. Cress descriptions are automatically translated into Lotos (Language Of Temporal Ordering Specification), allowing systematic checks for interoperability and logical errors prior to implementation. Mustard (Multiple-Use Scenario Test and Refusal Description) is used to validate the generated specification against use case scenarios. The same Cress descriptions are then automatically converted into Bpel/Wsdl code for practical realisation of the composed services. Grid services are executed by Globus Toolkit 4, while their orchestration is supported by the ActiveBpel engine. The Mustard scenarios are used again to evaluate the implementation. The overall approach therefore supports rigorous development and automated creation of orchestrated grid services.
Representing and analysing composed web services using Cress
2007, Journal of Network and Computer Applications
Citation Excerpt :
The author developed Cress from early work by BellCore on Chisel (Aho et al., 1998) for describing voice services. Cress has been used to specify and analyse voice services for the IN (Intelligent Network) (Turner, 2000), SIP (Session Initiation Protocol) Internet Telephony (Turner, 2002), and IVR (Interactive Voice Response) (Turner, 2004). Service descriptions in Cress are graphical and accessible to non-specialists.
Composite web services are defined using the industry-standard language Bpel (Business Process Execution Logic). There is a strong need for graphical and automated support for this task. It is explained how Cress (Chisel Representation Employing Systematic Specification) has been extended to meet this challenge. Cress supports straightforward graphical descriptions of composite web services. Sample descriptions are presented of these. It is outlined how they are automatically implemented and systematically analysed using the target languages Bpel and Lotos (Language Of Temporal Ordering Specification).
The implementation of a secure and pervasive multimodal Web system architecture
2006, Information and Software Technology
While most users currently access Web applications from Web browser interfaces, pervasive computing is emerging and offering new ways of accessing Internet applications from any device at any location, by utilizing various modes of interfaces to interact with their end users. The PC and its back-end servers remain important in a pervasive system, and the technology could involve new ways of interfacing with a PC and/or various types of gateways to back-end servers. In this research, cellular phone was used as the pervasive device for accessing an Internet application prototype, a multimodal Web system (MWS), through voice user interface technology.
This paper describes how MWS was developed to provide a secure interactive voice channel using an Apache Web server, a voice server, and Java technology. Securing multimodal applications proves more challenging than securing traditional Internet applications. Various standards have been developed within a context of Java 2 Micro Edition (J2ME) platform to secure multimodal and wireless applications. In addition to covering these standards and their applicability to the MWS system implementation, this paper also shows that multimodal user-interface page can be generated by using XSLT stylesheet which transforms XML documents into various formats including XHTML, WML, and VoiceXML.
An Intelligent Speech Interaction Model for Mobile Teaching
2019, Proceedings - 2019 International Conference on Intelligent Transportation, Big Data and Smart City, ICITBS 2019
Workflows for quantitative data analysis in the social sciences
2015, International Journal on Software Tools for Technology Transfer

View all citing articles on Scopus

View full text

Analysing interactive voice services

Abstract

Introduction

Section snippets

Interactive voice response systems

The Cress notation

Analysis in general

Categories of IVR feature interaction

Conclusion

Acknowledgements

Theoretical Computer Science

The temporal logic of branching time

Acta Informatica

A feature-interaction benchmark for IN and beyond

IEEE Communications Magazine