We want to make Things able to execute complex tasks that are not predefined at the Things’ deployment time so as to enable developers to use the WoT as a pool of generic resources, without unneeded intermediaries (proxies, gateways, base stations, etc.). The role of such intermediaries is specifically discussed in Section 1.
Usually, a WSAN is composed of (i) several motes equipped with one or more sensors and a wireless interface, and (ii) more powerful devices, typically fixed and continuously-powered, that embed actuators [
4]. In addition, a WSAN leverages proxies, gateways or base stations for carrying out collection and computation tasks, as well as communication with other networks, such as the Internet. Nowadays, the above intermediaries are not anymore required for communication between motes and the Internet, thanks to the standardized stack composed of IEEE802.15.4 and 6LoWPAN, which is intended to replace proprietary communication proxies (application level) by standardized IP routers (network level) [
15]. As a benefit, motes have an IPv6 address, or an equivalent made of the network identifier and a small address, and can communicate directly with the Internet.
Regarding data collection, proxies are still needed in order to enhance the sensor network capabilities, e.g., for implementing heavy computation (offloading), centralized management and task deployment, caching and security/privacy (access control, key management, etc.). However, offloading data collection and processing to proxies is energy-consuming due to the wireless communication, which holds for any wireless device, including smartphones [
6],[
19]. Similarly, cloud-based stream processing is quite popular today, and there are some attempts to use it with sensor networks and IoT: cloud of sensors, cloud-based IoT, cloud-assisted remote sensing, etc [
20],[
21]. However, the same problems arise regarding communication costs, availability (specifically for mobile Things with sparse connectivity), latency and privacy.
As a solution to the above problems, it has been proposed to let the sensor network performs as much in-network processing as possible before sending anything to a proxy or the cloud, in order to: (i) reduce the amount of transferred data and (ii) make use of the motes at their full potential. For example, structural health monitoring is a case where a huge amount of measurements is produced quickly because of the vibration sensors. These types of sensors are very sensitive and detect a lot of 3-axis accelerations, saturating the network and exhausting the sensors’ batteries. In such a case, pre-aggregation, pre-filtering and compression can be performed within the motes instead of the base station [
22].
Consequently, in our opinion, centralized intermediaries (proxies, surrogates, cloudlets and the cloud) should be leveraged primarily for heavy computation, while in-network processing should be favored for common and simple tasks (filtering, merging, etc.) as well as for complex tasks when powerful/specialized enough Things are available. To this end, Dioptase is intended to avoid reliance on those intermediaries whenever possible, by running on devices that support 6LoWPAN or IPv6 and communicate directly with the Internet. Nevertheless, in cases where intermediaries are needed, Dioptase can be deployed on them and run as a middleware layer for deploying tasks dynamically and managing data streams.
2.2 DSMSs for WSANs
The work most related to ours may then be classified into three major families of DSMSs for WSANs, which are respectively based on: (i) the relational model, (ii) macro-programming and (iii) Web services.
We also identify related work on supporting the construction of mashups in the WoT although focused on the exchange of discrete data like Actinium [
23], COMPOSE [
24], Eywa [
25] and the Thin Server architecture [
26]. However, these solutions consider Things as passive data providers and shift the computation logic into powerful servers or into the cloud. As we said before, in our opinion, centralization is not suitable for the WoT from a scaling up perspective, even in the cloud, as it weakens the entire network and increases the overall energy consumption.
Relational DSMSs extend the relational model by adding concepts that are necessary to handle data streams and persistent queries, together with the stream-oriented version of the relational operators (e.g., selection or union). The sensor network is then managed as a large database that can be queried using a SQL-like language, with some specific operations. The database may further be distributed (each node runs a part of the query), centralized (a powerful node collects all the data and applies queries) or partially centralized (with many powerful nodes) [
27]. From a practical perspective, queries are translated into query plans that are distributed in the network. State of the art DSMSs primarily differ with respect to: the expressiveness of the query language, the associated algebra, and assumptions made about the underlying networking architecture. A well-known DSMS is
TinyDB[
28], which exposes the sensed data as a relation (i.e., table) on which it is possible to apply queries over the sensed values as well as the metadata associated with the sensors. During the handling of queries, all the nodes execute the queries that are distributed in the network and the results of each query get aggregated as they traverse the routing tree maintained by the system. In the same vein,
Cougar[
29] acts as a database of sensors where the query plans are provided to proxies that take care of activating the relevant sensors and applying the operations on the collected data.
MaD-WiSe[
30] offers a runtime system for queries that is fully distributed, and each sensor may directly execute part of a query plan and then deal with sensor-specific tasks.
Borealis[
31], previously
Aurora, uses data stream diagrams, which express the combination of relational operators over the streams received by the system. From a theoretical perspective, various systems propose custom extensions to the relational model as well as custom implementations of the relational operators. For instance,
STREAM[
32] distinguishes streams from relations, where the latter can be handled by classical relational operators. New operators then deal with translation from stream to relations (typically using windows), and vice versa (using streamers).
EQL[
33] moves a step forward, by enabling the developers to express composite queries in a very concise way, in order to detect and track complex events which involves various types of sensors (e.g., gas leak). Other proposals [
7]-[
9],[
34] deal with issues as diverse as blocking and non-blocking operators, windows, stream approximation, and various optimizations.
State-of-the-art WSAN-based DSMSs suffer from proprietary protocols and technologies specifically designed to handle the characteristics of resource-constrained devices. As a consequence, proxies are often used to collect, process and present sensed data on the Internet, creating (i) an unwanted bottleneck, (ii) a single point of failure and (iii) an increased energy consumption if no proper in-network processing technique is used. To alleviate such effects, a DSMS for the WoT should include a middleware layer designed to run directly on Things without any intermediary (except for conversions at physical and link levels), given that modern device classes are emerging and allows more flexible data stream management based on the use of Web technologies. In addition, such middleware must reuse and extends the rich theoretical background of relational DSMSs, especially the data models proposed to describe streams and the non-blocking operators initially designed for WSANs.
Macroprogramming-based DSMSs enable users to express tasks over the WSAN using a DSL instead of a query language. The resulting tasks, or macroprograms, are compiled into microprograms to be run on the networked nodes, hence easing the developer’s work who no longer has to bother with the decomposition and further distribution of the macroprograms. Macroprogramming-based DSMSs are overall similar to classical macroprogramming approaches aimed at WSAN. However, they feature additional primitives and mechanisms oriented toward stream management. For instance, Regiment [
35] introduces a functional language that enables programming the WSAN and manipulating the streams that flow in the network. As for Semantic Streams [
36], it defines a declarative language based on Prolog, which features data structures to handle streams, together with mechanisms to reason about the semantics of sensors. For instance, the system is able to compose or adapt data according to the available sensors and the given request.
As outlined above, existing macroprogramming-based DSMSs follow a static approach where the macroprograms are compiled into microprograms that are deployed once for all. Specific techniques can be used to dynamically update the network: (i) dynamic reconfiguration and (ii) dynamic deployment. However, the former techniques usually assume that the tasks are already implemented on the devices [
37], while the latter techniques usually support binary deployment (e.g., Deluge [
16]). Instead, a DSMS for the WoT must provide a high-level of dynamicity by making possible to change both the global and the local behaviors of the network at any time. To this end, the developers should be provided a way to represent WoT applications as abstract programs that are distributed dynamically in the actual network. In addition, sandboxes should be used to increase the overall reliability, as an attacker can benefit from arbitrary binary deployment to deploy malicious code on any open device.
Service-oriented DSMSs aim to integrate with classical service-oriented architectures, thereby taking advantages of the existing infrastructure (interaction and discovery protocols, registries, service composition based on orchestration or choreography, etc.). Similarly to database-oriented relational DSMSs, the simplest service-oriented DSMSs are centralized with a unique point of data collection [
11],[
38],[
39], or semi-distributed based on a set of data collection points [
40],[
41]. However, these DSMSs focus mainly on the problem of presenting streams as services, without reusing the existing and valuable theoretical work from WSANs. In practice, these approaches are based on well-known Web service technologies. For RESTful services, some studies use specific mechanisms of the HTTP protocol, like
Web hooks,
long polling and
HTTP streaming[
11]. As for SOAP services, some work extends the SOAP architecture by adding new
message exchange patterns (MEP) designed for stream communication (e.g., the capability for a service to receive multiple requests and produce multiple responses in parallel when invoked) [
42]. Usually, sensors are presented as Web resources, identified by URIs [
11],[
38],[
41]. The paradigms used to broadcast streams vary from one solution to another.
Stream Feeds[
38] uses pull requests to gather historical data and push requests to receive new data issued by the sensors.
RMS[
11] goes a step further by building upon a topic-based pub/sub infrastructure, while
WebPlug[
41] uses an infrastructure based on pollers that periodically check the state of resources.
Integrating data stream management into service-oriented architectures is a logical evolution of sensor networks, as Web technologies provide a greater flexibility, ease of use and interoperability compared to existing WSNs technologies. The proposed solutions, in particular, enable Things to communicate through the Internet and expose their resources as standardized Web services. As simple as the present Web, these services can be used to build mashups that interact with the physical world. However, existing solutions are limited by their scope. Indeed, much research is focusing on how to present streams as Web services, and neglects many complex aspects like continuous processing of streams (merging, filtering, adaptation, approximation, etc.). Reusing theoretical and practical foundations that were established by the two other families of DSMSs is a crucial step to enable the IoT to take advantage of WSAN capabilities together with the flexibility, the reliability and the interoperability of the Web, which guided the design of the Dioptase application model and supporting middleware toward the WoT vision.