Custom-built version of the Calabash-Android
Griebe and Gruhn [
44] propose a model-based approach to improve the testing of context-aware mobile applications. Their approach is based on a four-tier process system as follows:
Tier 1: UML activity diagrams models are enriched with context information using a UML profile developed for integrating context information into UML models;
Tier 2: Models are then transformed into Petri Nets for analyzing and processing structural model properties (e.g., parallel or cyclic control flows);
Tier 3: From the Petri Net representation, a platform and technology-independent system testing model is generated that includes context information relevant for the test case execution; and
Tier 4: Platform and technology-specific test cases are generated that can be executed using platform-specific automation technology (e.g., JUnit, Calabash-Android/iOS, Robotium).
To assess the proposed approach, Griebe and Gruhn have extended the Calabash
18 tool to implement it. Calabash is a test automation framework that supports the creation and execution of automated acceptance tests for Android and iOS apps without the necessity of coding skills [
108]. It works by enabling automatic UI interactions within an application such as pressing buttons, inputting text, validating responses, and so on.
Calabash is a completely free and open-source tool. It uses the Gherkin pattern. Gherkin is a writing pattern for an executable specification that, through keywords, maintains a standard for the writing of execution criteria called Given, When, and Then. In order to do so, Calabash expresses the test cases as cucumber features [
109].
A limitation of the Griebe and Gruhn approach is the need for creating a model that describes possible AUT activities. Modeling is not a widely understood activity between testers and developers, and a poorly designed model can lead to false positives or false negatives in test verdicts.
Context simulator
Vieira et al. [
37] argue that testing context-aware applications in the lab is difficult because of the number of different scenarios and situations that a user might be involved with. Hence, the Android platform provides simulation tools to support the physical sensor test. However, it is not enough to test context-aware applications only at the physical sensors level. Considering that, Vieira et al. have developed a simulator that simulates a real laboratory environment. The simulator provides support for modeling and simulation of context in different levels: physical and logical context, situations, and scenarios.
The simulation is separated into two main components: the desktop application and the mobile component:
The desktop application: it is responsible for context modeling, simulation execution, and context transmission to the mobile device
The mobile component: it receives signals from the desktop application and processes the data. The mobile component is responsible for simulating context data and for examining the reaction of the app under the simulated context
The modeling in the context simulator is made in four different levels:
1.Low-level context: the data can be acquired from hardware sensor measuring (e.g., location, light, movement or touch), named as physical context, or the data can be acquired from software applications or services (e.g., current activity of an employee determined by his calendar), named as virtual context;
2.High-level context (or logical context): the combination of low-level context and virtual context processing results in a high-level context. For example, a context “Room 001 at Fraunhofer” is identified through an aggregation of two low-level sources: GPS coordinates from “Fraunhofer” and Wi-Fi identification of “Room 001”;
3.Situation: it is the composition of high-level contexts. The situation represents the circumstances in which someone currently is. For example, the situation “Meeting 12–13 at Room 001 at Fraunhofer” is a situation composed by three high-level contexts: “Meeting” (it can be a specific date and time plus an appointment in the user’s calendar), “Room 001”, and “Fraunhofer”; and
4.Scenario: a scenario is a chain of situations for causal relations. In other words, a scenario is a time-ordered sequence of situations.
The context simulator supports a large variety of context sources, 22 contexts divided into 6 categories supporting 41 context sources [
37].
A limitation of the context simulator is that the tester must model each test case. That is, if the tester wishes to test an AUT under possible adverse situations such as weak GPS signal, receiving a phone call and changing Internet connection conditions, then the tester should model all scenarios that he/she wishes to test.
Extended AndroidRipper
Amalfitano et al. [
42] analyzed bug reports from open-source applications available at GitHub
19 and Google Code
20. From the results, they defined some use scenarios, called by them as event-patterns, that represent a use case which presents more potential to failure in context-aware applications. Some examples of event-patterns are as follows:
Loss and successive recovery of GPS signal while walking;
Network instability;
The user enables the GPS provider through the settings menu and starts walking; and
Incoming of a phone call after any other event.
Amalfitano et al. carried out an experiment with the objective of examining if, in fact, the event-patterns represent scenarios of a greater chance of context-aware application failures. Thus, they extended the tool AndroidRipper [
43]. The extended AndroidRipper is able to fire context events such as location changes, enabling/disabling of GPS, changes in orientation, acceleration changes, reception of text messages and phone calls, and shooting of photos with the camera. Both versions of AndroidRipper explore the application under test looking for crashes measuring the obtained code coverage and automatically generating Android JUnit test cases that reproduce the explored executions.
The extended AndroidRipper tool generates test cases watching for events that cause a reaction from the application. Once the events that cause a reaction are detected, the technique of Amalfitano et al. generates test cases based on event-patterns identified by the authors. Therefore, the tool does not focus on testing high-level context variations.
By studying the papers of Griebe and Gruhn [
44], Vieira et al. [
37], and Amalfitano et al. [
42], we found two papers related to testing context-aware Android applications: Mirza and Khan [
110] and Luo et al. [
111]. The corresponding tools are presented in the sequel.
ContextDrive
Mirza and Khan [
110] argues that testing context-aware applications is a difficult task due to challenges such as developing test adequacy and coverage criteria, context adaptation, context data generation, designing context-aware test cases, developing test oracle, and devising new testing techniques to test context-aware applications. In response to these challenges, they argue that context adaptation cannot be modeled using a standard notation such as the UML activity diagram. Therefore, Mirza and Khan extended the UML activity diagram, by adding a context-aware activity node, for behavior modeling of context-aware applications.
Mirza and Khan proposed a test automation framework named as ContextDrive. Its proposed model consists of six phases.
1.First phase: An UML activity diagram is used to model the application under test. In this phase, it is proposed a new element for the UML activity diagram for modeling context-aware applications;
2.Second phase: The UML activity diagram is transformed into a testing model;
3.Third phase: The test model is annotated in order to enhance readability and maintainability;
4.Fourth: Abstract test cases are generated;
5.Fifth: Abstract test cases are converted into platform-specific executable test scripts; and
6.Sixth: The test scripts are executed.
Mirza and Khan’s technique is similar to the one implemented in the tool of the “
Custom-built version of the Calabash-Android” section. Therefore, there is also the restriction that the tester has experience in UML activity diagram modeling. In addition, the tool uses static data to execute test cases. Therefore, testing situations that use a lot of sensor data becomes infeasible (i.e., testing a GPS navigator application).
TestAWARE
One of the difficulties in testing context-aware applications is the heterogeneity of context information and the difficulty and/or high cost of reproducing contextual settings. As an example, Luo et al. [
111] present a real-time fall detection application; the application detects when the user drops the mobile phone under different circumstances such as falling out of the pocket or falling out of the hand. The application is programmed to send an email to a caregiver every time a fall event is detected by the phone. For this application, testing new versions of the application is very costly. Thus, Luo et al. [
111] introduce the TestAWARE tool.
TestAWARE is able to download, replay, and emulate contextual data on either physical or emulators devices. In other words, the tool is able to obtain and replay “context” and thus provide a reliable and repeatable setting for testing context-aware applications.
Luo et al. compare their tools with other available tools. In summary, they say TestAWARE aims at a wide variety of mobile context-aware applications and testing scenarios. It is possible because TestAWARE incorporates heterogeneous data (i.e., sensory data, events, and audio), multiple data sources (i.e., online, local, and manipulated data), black-box and white-box testing, functional/non-functional property examination, and the environments of device/emulator.
A limitation of the tool is that it is not possible to create test cases without executing each test case at least once in a real device in the real scenario. That is because it is a record and replay tool, it is first necessary to record the test cases and, therefore, it is necessary to submit the AUT on a real device under each of the conditions to be tested.
Among the found studies, Moran et al. [
18,
52], Qin et al. [
75], Yongjian and Iulian [
71], Gomez et al. [
81], and Farto et al. [
83] present tools that were not intended for context-aware application testing. However, they support the testing of context-aware features.
CrashScope
Moran et al. [
18,
52] argue that one of the most difficult and important maintenance tasks is the creation and resolution of bug reports. For this reason, they introduced the CrashScope tool. The tool is capable of generating augmented crash reports with screenshots, crash reproduction steps, and captured exception stack trace, along with a script to reproduce the crash on a target device. In order to do so, CrashScope explores the application under test by performing input generation by static and dynamic analyses which include automatic text generation capabilities based on context information such as device orientation, wireless interfaces, and sensors data.
The CrashScope GUI ripping engine systematically executes the application under test using various strategies. Then, the tool first checks for contextual features that should be tested according to the exploration strategy. So, the GUI ripping engine checks if the current activity is suitable for exercising a particular contextual feature in adverse conditions. The testing of contextual features in adverse conditions consists in setting unexpected values to the sensors (GPS, accelerometer, etc.) that would not typically be possible under normal conditions. For instance, to test the GPS in an adverse contextual condition, CrashScope sets the value to coordinates that do not represent physical GPS coordinates. In other words, for each running Activity, CrashScope checks what are the possible contextual features, checks if contextual features should be enabled / disabled, and sets feature values. CrashScope attempts to produce crashes by disabling and enabling sensors as well as sending unexpected (e.g., highly unfeasible) values. Because of that, there are scenarios that cannot be tested in CrashScope (i.e., testing if the application crashes if the user leaves a pre-established route).
MobiPlay
Accordingly to Qin et. al. [
75], MobiPlay is the first record and replay tool that is able to capture all possible inputs at the application layer, that is, MobiPlay is the first tool capable of recording and replaying, at the application layer, all the interactions between the Android app and both the user and the environment the mobile phone is inserted into.
While the user is executing the app, MobiPlay records every input the application receives and the interval time between every two consecutive inputs. After that, the tool can re-execute the application under test with the same provided inputs when executing it. The expected result is that the application behaves exactly the same way as the original execution. Basically, MobiPlay is composed of two components: a mobile phone and a remote server. Initially, the mobile phone sends and saves all sensor data and user interactions to the remote server that also stores it. From there, the remote server can reproduce the executed scenario by sending back the saved data to the mobile phone.
The application under test is called target app, and it is installed at the remote server, not at the mobile phone. The communication between the mobile phone and the target app will be done through the client app. The client app is installed at the mobile phone and it is a typical Android app that does not require root privilege and is dedicated to intercepting all the input data for the target app. The basic idea of MobiPlay is that the target app actually runs on the server, while the user interacts with the client app on the mobile phone in a way that the user is not explicitly aware that he is, in effect, using a thin client. The client app shows the GUI of the target app in real-time on the mobile phone just like the way as if the target app was actually running on the mobile phone. While the user interacts with the target app through the client app, the server records all the touch screen gestures (pinch, swipe, click, long click, multi touches, and so on) and the other inputs provided by the sensors like gyroscope, compass, GPS, and so on, in a transparent way to the user. Once the inputs are recorded, MobiPlay can re-execute the target app with the same inputs and at the same interval time, simulating the interaction between the user and the target app.
Just like the TestAWARE tool (the “
TestAWARE” section), MobiPlay first needs to record the test cases that the tester wants to check.
VALERA
VersAtile yet Lightweight rEcord and Replay for Android (VALERA) is a tool capable of record and replay Android apps by focusing on sensors and event streams, rather than system calls or the stream instruction. Its approach promises to be effective yet lightweight. VALERA is able to record and replay inputs from the network, GPS, camera, microphone, touchscreen, accelerometer, compass, and other apps via IPC. The main concern of the authors is to be able to record and replay Android applications with minimal overhead. Therefore, they claim to be able to maintain performance overhead low, on average 1.01% for record and 1.02% for replay. The timing overhead is very important when replaying an application. The variation of the original time of the application data entries can cause different behavior than when recording the iteration data with the application. For this reason, VALERA is designed to minimize timing overhead. In order to evaluate VALERA, the tool was exercised against 50 applications with different sensors. The evaluation consisted in exercising the relevant sensors of each application, e.g., scanning a barcode for the Barcode Scanner, Amazon Mobile, and Walmart apps; playing a song externally so that apps Shazam, Tune Wiki, or SoundCloud would attempt to recognize it; driving a car to record a navigation route for Waze, GPSNavig.&Maps, and NavFreeUSA; and so on.
VALERA has the same limitations as TestAWARE and MobiPlay.
RERAN
It is a black-box record and replay tool capable of capturing the low-level event stream on the phone, which includes both GUI events and sensor events, and replaying it with microsecond accuracy. RERAN is a previous record and replay system of the authors of VALERA. It is similar to VALERA but with some limitations. RERAN is unable to replay sensors whose events are made available to applications through system services rather than through the low-level event interface (e.g., camera and GPS). When validating the tool, the authors declare RERAN was able to record and replay 86 out of the Top-100 Android apps on Google Play and to reproduce bugs in popular apps, e.g., Firefox, Facebook, and Quickoffice.
RERAN has the same limitations as TestAWARE, MobiPlay, and VALERA. Another limitation of RERAN is that it does not support testing the GPS sensor.
MBTS4MA
Farto et al. [
83] proposed an MBT approach for modeling mobile apps in which test models are reused to reduce the effort on concretization and verify other characteristics such as device-specific events, unpredictable users’ interaction, telephony events for GSM/text messages, and sensors and hardware events.
The approach is based on an MBT process with Event Sequence Graphs (ESGs) models representing the features of a mobile app under test. Specifically, the models are focused on system testing, mainly user’s and GUI’s events. Farto et al. implemented the proposed testing approach in a tool called MBTS4MA (Model-Based Test Suite For Mobile Apps).
MBTS4MA provides a GUI for modeling. Thus, it supports the design of ESG models integrated with the mobile app data like labels, activity names, and general configurations. Although the models are focused on system testing, mainly user’s and GUI’s events, it is also possible to test sensors and hardware events. The supported sensor events are change acceleration data, change GPS data, disable Bluetooth, enable Bluetooth, and update coordinates. However, the authors argue that it is possible to extend the stereotypes of the tool to support more sensor events.
Just like the custom-built version of the Calabash-Android tool (the “
Custom-built version of the Calabash-Android” section), MBTS4MA needs the creation of a model that represents the features of a mobile app under test.
RQ 2.1: Which research groups are involved in android context-aware testing research?
In order to answer this research question, we have observed the publications of the authors of the Android context-aware testing studies, such as Griebe and Gruhn [
44], Vieira et al. [
37], and Amalfitano et al. [
42].
The authors of Griebe and Gruhn [
44] are Tobias Griebe
21 and Volker Gruhn
22. Both authors have written only two more publications that refer to context-aware applications:
“Towards Automated UI-Tests for Sensor-Based Mobile Applications” [
112]: presents an approach that integrates sensor information into UI acceptance testing. The approach uses a sensor simulation engine to execute test cases automatically.
“A Framework for Building and Operating Context-Aware Mobile Applications” [
113]: presents a work-in-progress paper with the description of a framework architecture design to address the following context-aware mobile applications problems: interoperability, dynamic adaptability, and context handling in a frequently changing environment.
The authors of Vieira et al. [
37] are Vaninha Vieira
23, Konstantin Holl
24, and Michael Hassel
25. Vaninha Vieira is a professor of Computer Science at Federal University of Bahia, Brazil. Her research interests include context-aware computing, mobile and ubiquitous computing, collaborative systems and crowdsourcing, gamification and user engagement, and smart cities (crisis and emergency management, intelligent transportation systems). Among her publications, we can note the interest in mobile applications concerning to context modeling, quality assurance, context-sensitive systems development, context management, and so on. Konstantin Holl has published papers related to quality assurance, but nothing can be seen about the research interest of Michael Hassel due to the lack of publications.
The authors of Amalfitano et al. [
42] are Domenico Amalfitano
26, Anna Rita Fasolino
27, Porfirio Tramontana
28, and Nicola Amatucci
29. Domenico Amalfitano, Anna Rita Fasolino, and Porfirio Tramontana are not only professors of the same institution (University of Naples Federico II) but also most of their articles were written together. Their publication concerns software engineering, testing, and reverse engineering. Many of the testing publications are about Android app testing. In particular, they have a lot of experience in the GUI ripping technique. Most of Nicola Amatucci’s publications are about testing on Android applications. Most of them written together with Domenico Amalfitano, Anna Rita Fasolino, or Porfirio Tramontana.
All of these authors have significant publications regarding mobile application testing. Besides them, as mentioned in the “
Results” section, we can refer to Iulian Neamtiu, Tanzirul Azim, and Yongjian Hu who have great contributions in the research area. However, among the studied authors, Vaninha Vieira is the author who most directly contributed to the research on context-aware applications.
RQ 2.2: What are the research gaps addressed in android context-aware testing?
In this paper, we identify five tools for testing of context-aware Android applications and five tools that support testing of context-aware applications, totaling 10 tools.
The context-aware application testing has challenges such as a wide variety of context data types and context variation. There is a huge variety of context data types. The most commonly used context data type is location, acquired by the GPS sensor. However, there are many other types of data, such as temperature, orientation, brightness, time, and date.
Context-aware applications use context data provided by sensors to provide service or information. Waze
30, for example, uses the GPS, the time, and information provided by the cloud to inform the driver about obstacles along the way to the final destination. However, many context-aware applications use combinations of sensor information to infer contexts and, from these inferred contexts, provide service or information. Vieira et al. [
37] call low-level context the context information that is directly collected from sensors or from other sources of information such as database or cloud, and high-level context for the contexts that are the product of the combination of low-level contexts.
Many context-aware applications use high-level context to provide their services or information. Samsung has developed an application called Samsung Health [
114] that tracks user’s physical activities. Combined with its Smart Watch, the application monitors heartbeat, movement, steps, geographical location, time of day, and other information. From this information, the application infers contexts in which the user is and then concludes whether the user is practicing physical activity or whether he is at rest. Taking the example of the Samsung Health application, Table
9 exposes some examples of high-level contexts from the composition of low-level contexts.
Table 9High-level context examples
If the user has stopped for more than 1 h since it is not at night, the application infers that the user is at rest for a long time and suggests that the user take a short walk or lengthen. | Time, GPS, pedometer accelerometer, heartbeat | Long rest |
If the user is in full rest with low heart rate, the application infers that the user is sleeping and counts the duration of sleep as well as infers the quality of sleep based on the luminance, noise and amount of movements that the user makes while sleeping. | Time, GPS, pedometer accelerometer, brightness, noise, heartbeat | Sleeping |
If the user is walking, the application monitors the distance and speed. From this information and the user’s profile (weight and age) the application infers the amount of lost calories. | GPS, pedometer, age, weight | Walking |
If the user is pedaling, the application infers that the user is riding a bike and then calculates the time, distance and lost calories. | GPS, pedometer accelerometer, noise, heartbeat | Riding a bike |
As we have said, another challenge in testing context-aware applications is the constant variation of context. The context changes asynchronously and the application must respond correctly and effectively to context variations. Taking Samsung Health as an example, the application must realize when the user is changing their activities throughout the day and thus provides all the information and services in the correct way. Thus, when the user is sleeping and getting up, the application should stop counting the time and the quality of sleep. If the user starts walking, the application must count for time, distance, and lost calories. When the user stops walking and gets in the car and drives to home, the application should stop counting the walking information and understand that the user is at rest, even though he is moving.
Considering the difficulties of testing context-aware applications, the 10 tools identified in this work were analyzed and compared according to 11 questions raised:
Q1: What low-level context data does the tool support?
Q2: Does the tool support high-level context data?
Q3: Are context data treated differently?
Q4: Is it possible to test context variations?
Q5: Is it possible to test abnormal context situations?
Q6: What criteria is used to select the context data?
Q7: What is the test stop criterion?
Q8: Does the tool generate test cases?
Q9: Is the tool white box, black box, or gray box?
Q10: Does it need instrumentation in the code?
Q11: Is the tool automatic or semi-automatic?
Table
10 presents the result of the analysis of the 10 tools by looking at the 11 questions.
Q1 | |
GPS | X | X | X | X | X | X | | X | X | X |
Wi-Fi | X | X | | X | | X | | X | X | X |
Accelerometer | X | X | | X | X | X | X | X | X | X |
Thermometer | X | X | X | X | | | | X | X | X |
Barometer | | X | X | | X | | | X | X | X |
Light-sensor | X | X | | | X | | X | | X | X |
Magnetometer | X | X | X | X | X | X | X | | X | X |
Gyroscope | | X | | | X | | | X | X | X |
Clock | | X | | | | | | | X | |
Calendar | | X | | | | | | | X | |
Other | | Camera, microphone, battery level, call, text message, alarm, etc. | Call, text message, battery level, USB, etc. | | | Microphone | | Bluetooth, Call, text message | | |
Q2 | No | Yes | No | No | No | No | No | No | Yes | No |
Q3 | Yes | Yes | No | No | No | No | No | No | Yes | Yes |
Q4 | Yes | Yes | No | No | No | No | No | No | Yes | No |
Q5 | Yes | Yes | No | Yes | No | No | No | No | Yes | Yes |
Q6 | Manually | Manually | Design patterns | On/off or abnormal values | None | None | None | None | None | Manually or recorded from sensor |
Q7 | All-transition-coverage criterion | All scenarios executed | Code coverage | Top-down or botton-up GUI hierarchy transverse | No more recorded events left | No more recorded events left | No more recorded events left | All edges | Breadth first search | No more recorded events left |
Q8 | Yes | No | Yes | Yes | No | No | No | Yes | Yes | No |
Q9 | Black-box | Black-box | White-box | Black-box | Black-box | Black-box | Black-box | Black-box | Black-box | Black-box and white-box |
Q10 | No | No | No | No | No | No | No | No | None | No |
Q11 | Automatic | Semi-automatic | Automatic | Automatic | Semi-automatic | Semi-automatic | Semi-automatic | Semi-automatic | Automatic | Semi-automatic |
The first observation we had of the tools was on the type of context data they support. With the exception of RERAN, they all support GPS. It was natural to expect this result since location is the most commonly used data type by context-aware applications. We can also see that Context Simulator and ContextDrive are the only tools that support all low-level context data types. In addition, these are the only tools that support high-level context data.
Mirza and Khan [
110] propose an extension of the UML activity diagram for modeling high-level context variation. Thus, their ContextDrive tool can test variations from one context to another. Authors use static data to execute the test cases. Therefore, the tool is unable to generate new test cases automatically.
The Context Simulator tool provides a graphical interface for the tester that enables the creation of application usage scenarios. Thus, it is possible for the tester to simulate high-level contexts. To do this, the tester explicitly describes each test case he/she wants to execute as well as which sensor values are going to be used in the test.
Context variations occur asynchronously and some of them in a totally unexpected way. When using a context-aware application, a phone call can be received and, during the calling, the user context may change. As another example, it is possible for the GPS signal to drop and then return after a few moments.
Although three tools support context variation testing, none of them is able to automatically generate test cases that use high-level contexts and test variations of high-level contexts, taking into account unexpected scenarios such as the event-patterns described by Amalfitano et al. [
42], presented in the “
Extended AndroidRipper” section.