1 Introduction
-
We propose an approach that relies on a dataspace to support data analysis within a multistore by handling not just data model heterogeneity and schema heterogeneity but also record overlapping. To this end, we introduce a new NRA operator to handle conflict-resolution between overlapping records and we significantly revise and extend both the formalization and the algorithmic logic initially proposed in [10].
-
We formalize and discuss the algorithms to produce an execution plan from a GPSJ query formulated on the dataspace, including a set of applied optimizations.
-
We present a prototypical implementation of the approach on which we carry out an extensive experimental evaluation in both terms of efficiency and effectiveness.
2 Use cases & case study
-
Analytical data offloading: to reduce costs and optimize performance, the historical depth of databases is kept limited; typically, it is 1-2 years for operational systems, and 3-5 for analytical ones [11]. After these periods, data are offloaded to cheaper as well as bigger storages, such as cloud storages or data lakes. Offloading implies a change of data model, a change of schema, and obviously an overlapping of instances with the original data. For example, offloading a relational data warehouse could imply turning instances stored in a star schema to a single JSON document including both measures and dimensional attributes; alternatively, a relational flat schema could be adopted. Similarly, invoices stored in an ERP can be offloaded to a key-value repository, where the value stores an object including only the attributes relevant for fiscal purposes. In the meanwhile, the in-place data may evolve in terms of structures or values. In this context, unforeseen analyses are often needed, such as data enthusiasts asking to compare the offloaded data with the in-place ones.
-
Multi-cloud architecture: this context combines different storage technologies and resources from multiple cloud platforms [12]. It allows application providers to manage the risks associated with technology, vendor lock-in, provider reliability, data security, and privacy thus, it is an increasingly popular tactic for designing the storage tier of cloud-based applications [13]. The multi-cloud architecture and related frameworks (e.g., data fabric) accelerate digital transformation since they enable the exploitation of data spread across different providers and architectures, all the while overcoming data silos through data virtualization. Multi-cloud architectures are a panacea in presence of many company branches. For example, consider a holding or a federation of companies (e.g., hospitals in the health sector). In this case, a lot of data is shared between the branches, but each branch is free to choose its own storage provider (either on cloud or on-premise), data model, and schema. To keep it simple, let us consider the case of ICD-9-CM (International Classification of Diseases) [14], which is often used in OLAP analysis in the healthcare domain. ICD-9-CM changes some of its attributes and values across the years; thus, depending on the ICD-9-CM version adopted by each branch, data overlapping and schema heterogeneity must be resolved when cross-queries are issued over the branches’ databases. Furthermore, every hospital or local health unit can store such data in different data models and schemas, depending on the adopted software.
-
Multiple data models: being a multistore, it comprises databases in four data models: relational, document-based, key-value, and wide-column.
-
Heterogeneous schemas: given the schemaless nature of NoSQL databases, the collections in Cloud 2 are characterized by varying levels of schema heterogeneity; in particular, we have 35 schemas in \(C_5\) and 2 schemas \(C_6\), while invoices in \(C_7\) are free to be in any schema.
-
Overlapping instances: as the two branches belong to the same holding, both customers and products are partially overlapped in the two cloud environments.
-
Unibench’s customer records are split between \(C_1\) and \(C_5\) in an overlapping fashion: \(90\%\) customers are in the relational table and \(30\%\) in the document collection, meaning that \(20\%\) of the customers are replicated.
-
Unibench’s product records are split between \(C_4\) and \(C_6\) in an overlapping fashion: in this case, both the relation table and the wide-column collection contain \(80\%\) of the original records, where \(60\%\) of the products are replicated.
-
Unibench’s order and order line records are split between the two DBMSs, but they do not overlap (i.e., each record exists in one copy only). Orders belonging to overlapping customers are randomly assigned to one of the branches.
-
Missing attributes are introduced in \(C_5\) and \(C_6\); in the former, we remove values for three attributes (namely browser, locationIP, and place) for some random customer records, in the latter we remove imgUrl values in \(10\%\) of the product records.
-
Semantic equivalence is introduced in \(C_5\) by renaming the birthdate and gender attributes into dateOfBirth and sex for some random customer records, and by renaming all attributes into a different convention for some random order line records.
-
Different data types are introduced in \(C_1\) and \(C_5\) by storing the value of attribute OrderDate as a date in the former’s records and as a string in the latter’s; also, order lines’ attribute Quantity in \(C_5\) is randomly modeled as a string or a number.
3 The dataspace
3.1 Modeling the existing collections
-
\(S_{blue} = \{ \mathsf{id}, \mathsf{firstName}, \mathsf{gender} \}\)
-
\(S_{green} = \{ \mathsf{id}, \mathsf{orders.orderId}, \mathsf{orders.orderDate}, \mathsf{orders.totalprice} \}\)
-
\(S_{orange} = \{ \mathsf{orders.orderId}, \mathsf{orders.orderLines.quantity}, \mathsf{orders.orderLines.asin}, \mathsf{orders.orderLines.price} \}\)
3.2 Modeling the dataspace
-
If \(key(S_i) \equiv key(S_j)\), then we infer a one-to-one relationship, represented as \(S_i \leftrightarrow S_j\).
-
If \(a_k \equiv key(S_j) : a_k \in \{S_i \setminus key(S_i)\}\), then we infer a many-to-one relationship from \(S_i\) to \(S_j\), represented as \(S_i \xrightarrow {a_k} S_j\).
-
If \(a_k \equiv a_l : (a_k,a_l) \in (\{ S_i \setminus key(S_i)\},\{ S_j \setminus key(S_j)\})\), no direct relationship exists between the two schemas.
name(f) | f | a | C | Product | Orderline | Order | Customer | Inv | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
\(E_1\) | \(E_2\) | \(E_3\) | \(E_4\) | \(E_5\) | ||||||||||
\(S_1\) | \(S_2\) | \(S_{10}\) | \(S_5\) | \(S_6\) | \(S_9\) | \(S_4\) | \(S_8\) | \(S_3\) | \(S_7\) | \(S_{11}\) | ||||
ProductId | \(f_{1}\) | \(a_{1}\) | \(C_3\) | \(\checkmark \) | ||||||||||
\(a_{2}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
\(a_{3}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
\(a_{4}\) | \(C_6\) | K | ||||||||||||
\(a_{5}\) | \(C_6\) | K | ||||||||||||
\(a_{6}\) | \(C_4\) | K | ||||||||||||
ProductName | \(f_{2}\) | \(a_{7}\) | \(C_6\) | \(\checkmark \) | ||||||||||
\(a_{8}\) | \(C_6\) | \(\checkmark \) | ||||||||||||
\(a_{9}\) | \(C_4\) | \(\checkmark \) | ||||||||||||
ImgUrl | \(f_{3}\) | \(a_{10}\) | \(C_6\) | \(\checkmark \) | ||||||||||
OrderLineId | \(f_{4}\) | \(a_{11}\) | \(C_3\) | K | ||||||||||
\(a_{12}\) | \(C_5\) | K | ||||||||||||
\(a_{13}\) | \(C_5\) | K | ||||||||||||
Price | \(f_{5}\) | \(a_{14}\) | \(C_5\) | \(\checkmark \) | ||||||||||
\(a_{15}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
\(a_{16}\) | \(C_6\) | \(\checkmark \) | ||||||||||||
\(a_{17}\) | \(C_6\) | \(\checkmark \) | ||||||||||||
Quantity | \(f_{6}\) | \(a_{18}\) | \(C_3\) | \(\checkmark \) | ||||||||||
\(a_{19}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
\(a_{20}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
OrderId | \(f_{7}\) | \(a_{21}\) | \(C_3\) | \(\checkmark \) | ||||||||||
\(a_{22}\) | \(C_3\) | K | ||||||||||||
\(a_{23}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
\(a_{24}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
\(a_{25}\) | \(C_5\) | K | ||||||||||||
\(a_{26}\) | \(C_7\) | K | ||||||||||||
TotalPrice | \(f_{8}\) | \(a_{27}\) | \(C_2\) | \(\checkmark \) | ||||||||||
\(a_{28}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
OrderDate | \(f_{9}\) | \(a_{29}\) | \(C_2\) | \(\checkmark \) | ||||||||||
\(a_{30}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
Invoice | \(f_{10}\) | \(a_{31}\) | \(C_7\) | \(\checkmark \) | ||||||||||
TaxId | \(f_{11}\) | \(a_{32}\) | \(C_1\) | K | ||||||||||
\(a_{33}\) | \(C_2\) | \(\checkmark \) | ||||||||||||
\(a_{34}\) | \(C_5\) | K | ||||||||||||
\(a_{35}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
LastName | \(f_{12}\) | \(a_{36}\) | \(C_1\) | \(\checkmark \) | ||||||||||
\(a_{37}\) | \(C_5\) | \(\checkmark \) | ||||||||||||
Gender | \(f_{13}\) | \(a_{38}\) | \(C_1\) | \(\checkmark \) | ||||||||||
\(a_{39}\) | \(C_5\) | \(\checkmark \) |
3.3 Obtaining the dataspace
4 Execution plan formulation
4.1 NRA and the merge operator
Operator | Description |
---|---|
\(\mathsf{CA}_{col}\)
| Denotes the access to the records of collection col. |
\(\mu _{a}(C)\)
| Denotes the unnesting of an array attribute a on collection C. |
\(\sigma _{x}(C)\)
| Denotes a selection operation on collection C, where \(x = \bigwedge _{T}\) is a conjunction of selection predicates; each selection predicate \(t \in T\) is in the form \((a,\omega ,v)\), where a is a primitive attribute, \(\omega \in \{=;>;<;\ne ;\ge ;\le \}\) and v is a value. |
\(\pi _{Y}(C)\)
| Denotes a projection operation on collection C, where Y is a set of projection predicates; each projection predicate \(y \in Y\) is in the form \(y = \bigvee _{A} / f\) where A is a set of primitive attributes (of which the first non-null values is taken), and /f indicates that the resulting attribute is named after feature f. It is \(attr(f) \supseteq A\). |
\(\gamma _{(F',Z)}(C)\)
| Denotes an aggregation operation on collection C, where \(F'\) is the group-by set (i.e., a set of features) and Z is the set of aggregations; each aggregation is in the form (f, op) where f is a feature and op an aggregation function. |
\((C_1)\cup (C_2)\)
| Denotes a union operation between collections \(C_1\) and \(C_2\). |
Denotes a merge operation between collections \(C_1\) and \(C_2\) based on the equivalence \(a_i=a_j\), with \((a_i,a_j) \in (C_1,C_2)\). See Definition 4.2. |
-
\(S^{*}_i = \{ a \in S_i : \not \exists ~ a' \in S_j, a \equiv a' \}\)
-
\(S^{*}_j = \{ a' \in S_j : \not \exists ~ a \in S_i, a \equiv a' \}\)
-
\(S^{\cap }_{ij} = \{ rep(a) ~\forall ~ (a,a') \in (S_i,S_j) : a \equiv a' \}\)
4.2 The query plan
4.2.1 Building the query plan
4.2.2 Building an entity plan
4.2.3 Building a collection plan
-
The first operation is the collection access CA to col (Line 2).
-
Unnesting operators are possibly added (Lines 3 to 8) in case one or more schemas are nested within arrays (i.e., \(|S^{\mu }| \ge 1\)). A simple check for duplicates is done in Line 6 in case \(\exists ~(S_1,S_2) \in {\mathcal {S}}^q_{col} : S^{\mu }_1 \cap S^{\mu }_2 \ne \varnothing \); unnesting operations are added to \(P_{col}\) in Line 8.
-
The optional selection operation is built in Lines 9 to 14. For each feature that needs a selection, we build a disjunction of predicate that considers every schema variation of f (Line 12); for instance, a selection \((f_2,=,v)\) (where v is some value) translates to a selection \((a_7,=,v) \vee (a_8,=,v) \vee (a_9,=,v)\). Then, the final selection predicate is the conjunction (\(\wedge \)) of the predicates built for each feature (Line 14).
-
Finally, the projection operation is built in Lines 15 to 20. Let (used in Line 10) be the set of features to be projected, where \(\forall ~l \in L^{{\mathcal {E}}}_q\) it is is the set of features whose attributes are necessary for merge operations. For each feature \(f \in F_{\pi }\) representing attributes in \({\mathcal {S}}^q_{col}\) we project a single attribute (named after rep(f)) that contains the only non-null value among its schema variations (simplified in Line 19 as a disjunction over each \(a \in A\)). We remark that, at this stage, we also apply the transcoding functions \(\varphi \) in order to consistently compare record values in the merge operations that will follow.
4.3 Optimizations
-
Schema plan grouping. Since we model several schemas within the same collection, the naive way would be to produce a query plan with as many leaves (i.e., collection accesses) as the number of schemas. As described in Sect. 4.2, we optimize it in order to have as many leaves as \(|{\mathcal {E}}^q| \cdot |col(E)|\), where \(|{\mathcal {E}}^q|\) is the number of entities in the query, and |col(E)| is the number of collections for an entity \(E \in {\mathcal {E}}^q\). Thus, the collection plan exploits mappings to query several schemas in a single pass. This is evident in Algorithm 3, where we identify \({\mathcal {S}}^q_{col}\) as the subset of schemas to be considered for the plan of collection col. In particular, \({\mathcal {S}}^q_{col}\) is used in Lines 4, 11, and 17 to, respectively, define unnesting, selection, and projection operations on col.
-
Predicate push-down. This is one of the most basic optimization techniques, which consists of applying selection predicates as close to the source as possible. We apply them in the collection plan (Algorithm 3, Lines 9 to 14) right after unnesting the necessary arrays (i.e., before any projection, merge, and aggregation operation).
-
Merge sequence reordering. When the query involves three or more collections, the order in which collections are merged together has an impact on performance. In this work, we rely on a minimum selectivity heuristics [21] to determine the order of merge operations. The basic idea is to start from the one with the lowest cardinality and progressively merge it with collections with increasing cardinality. This technique is used to decide the join sequence of collection plans within a single entity plan (Line 3 in Algorithm 2) and the join sequence of entity plans within the query plan (Line 1 in Algorithm 1). Notice that the reordering of entity plans within query plans considers the former as atomic blocks of operation, i.e., when a reordering takes place, the inner structure of entity plans remains unchanged; the same principle applies to the reordering of collection plans within entity plans. With reference to Algorithm 1, let \(E_i \in {\mathcal {E}}_q\) be the entity with the smallest cardinality and \({\mathcal {E}}'_q\) the set of entities directly connected to \(E_i\) in \(L^{{\mathcal {E}}}_q\); then \(E_i\) is merged with \(E_j \in {\mathcal {E}}'_q\) whose cardinality is the smallest one. The same step is repeated (at the second iteration, \({\mathcal {E}}'_q\) is the set of entities directly connected to either \(E_i\) or \(E_j\)) until all entities in \({\mathcal {E}}_q\) have been merged. To estimate entities’ cardinalities we take into consideration the selection predicates in \(q_{\sigma }\); in turn, this requires collecting statistics from the databases. The literature on such topics is very broad. The accuracy of the estimate strictly depends on the collected information and the assumptions made on data distribution. Following several query cost models, in this paper we assume uniformity of attribute values, attribute values independence, and join containment.
-
Column pruning. This technique consists in extracting from each collection the only attributes corresponding to features that are relevant for the query, i.e., those required by the final projection (or aggregation) operation and those necessary for merge operations. We refer to these features as \(F_{\pi }\) in Algorithm 3, Line 10. By keeping only the minimum set of attributes we minimize the amount of data that needs to be moved across the network. We finally remark that column pruning is also adopted after each merge operation to prune join attributes that are not needed anymore (although this is not shown in Algorithms 1 and 2 for simplicity).
5 Experimental evaluation
5.1 The prototypical setup
Entity | Coll. | SF1 | SF10 | SF100 | SF1000 | Part. key |
---|---|---|---|---|---|---|
Customer | \(C_1\) | 9K | 90K | 900K | 9M | FirstName |
\(C_5\) | 3K | 30K | 300K | 3M | FirstName | |
Total | 10K | 100K | 1M | 10M | – | |
Order | \(C_2\) | 50K | 500K | 5M | 50M | OrderDate |
\(C_5\) | 15K | 150K | 1.5M | 15M | – | |
Total | 65K | 650K | 6.5M | 65M | – | |
Orderline | \(C_3\) | 230K | 2.3M | 23M | 230M | OrderId |
\(C_5\) | 60K | 600K | 6M | 60M | – | |
Total | 290K | 2.9M | 29M | 290M | – | |
Product | \(C_4\) | 8K | 80K | 800K | 8M | ProductName |
\(C_6\) | 8K | 80K | 800K | 8M | – | |
Total | 10K | 100K | 1M | 10M | – | |
Invoice | \(C_7\) | 65K | 650K | 6.5M | 65M | OrderId |
-
Schema extraction is run in parallel on every DBMS. Its execution time ranges from few seconds to up to thirty minutes, depending on the considered scale factor. This is compatible with execution times from related works on schema extraction [23, 24]. We remark that extracting schemas from non-relational collections requires a full scan of the latter, as most NoSQL stores have no schema definition for collections: naively, a schema is generated for each record, but only distinct schemas are kept. The efficiency of this task could be improved by adopting approximation techniques (e.g., sampling to avoid a full scan of every collection) and an incremental strategy (i.e., to consider only new/updated records); nonetheless, the optimization of the schema extraction task is out of the scope of this paper.
-
Features and entities are automatically inferred from the mappings and the one-to-one relationships between schemas, respectively; the execution time of this step is almost immediate. The recognition of which entities suffer from record overlapping (i.e., setting \(\phi _E\) for each E) is done manually in our case study, but it could be made automatic by implementing a procedure that compares key values in the schemas and looks for matches that reveal an overlap.
5.2 Scalability under data variety
5.3 Efficiency evaluation
-
The group-by set is either absent (i.e., only a simple projection is carried out, without aggregation), weak (i.e., it involves features with high cardinality, resulting in several groups), or strong (i.e., it involves features with low cardinality, resulting in few groups). This parameter affects the cardinality of the results, which (on average) is below \(10^5\) when the group-by set is absent, \(10^4\) when it is weak, and \(10^2\) when it is strong.
-
The selection predicate is either absent, weak (i.e., its selectivity is low), or strong (i.e., its selectivity is high). This parameter affects the number of records involved in the queries, which is between \(80\%\) and \(40\%\) in weak selections, and between \(5\%\) and \(0.01\%\) in strong selections.
-
We devise 6 different query graphs, varying the number of entities involved in the query (i.e., \(|{\mathcal {E}}_q|\)) from 1 to all 5 of them.
GB set | Execution times (s ± RSD) | |||
---|---|---|---|---|
SF 1 | SF 10 | SF 100 | SF 1000 | |
Absent | \(1.2 \pm 42\%\) | \(4.0 \pm 45\%\) | \(18.0 \pm 38\%\) | \(559.2 \pm 79\%\) |
Weak | \(1.5 \pm 40\%\) | \(5.0 \pm 52\%\) | \(28.4 \pm 54\%\) | \(603.7 \pm 90\%\) |
Strong | \(1.5 \pm 40\%\) | \(4.6 \pm 48\%\) | \(26.1 \pm 54\%\) | \(593.8 \pm 84\%\) |
Selection predicate | Execution times (s ± RSD) | |||
---|---|---|---|---|
SF 1 | SF 10 | SF 100 | SF 1000 | |
Absent | \(1.5 \pm 40\%\) | \(5.0 \pm 50\%\) | \(31.0 \pm 55\%\) | \(588.1 \pm 88\%\) |
Weak | \(1.6 \pm 38\%\) | \(5.1 \pm 51\%\) | \(29.0 \pm 53\%\) | \(562.7 \pm 87\%\) |
Strong | \(1.3 \pm 38\%\) | \(4.2 \pm 45\%\) | \(20.6 \pm 43\%\) | \(569.2 \pm 82\%\) |
\(|{\mathcal {E}}_q|\) | Execution times (s ± RSD) | |||
---|---|---|---|---|
SF 1 | SF 10 | SF 100 | SF 1000 | |
1 | \(0.4 \pm 0\%\) | \(0.7 \pm 14\%\) | \(5.1 \pm 12\%\) | \(18.2 \pm 14\%\) |
2 | \(1.2 \pm 8\%\) | \(3.5 \pm 9\%\) | \(18.2 \pm 15\%\) | \(155.2 \pm 12\%\) |
3 | \(1.5 \pm 13\%\) | \(5.0 \pm 8\%\) | \(27.5 \pm 26\%\) | \(615.2 \pm 24\%\) |
4 | \(1.9 \pm 11\%\) | \(6.1 \pm 13\%\) | \(35.3 \pm 29\%\) | \(1072.1 \pm 17\%\) |
5 | \(2.2 \pm 14\%\) | \(7.8 \pm 19\%\) | \(42.2 \pm 30\%\) | \(1114.4 \pm 16\%\) |
Optimizations | Execution times increase (% ± SD) | |||
---|---|---|---|---|
turned off | SF 1 | SF 10 | SF 100 | SF 1000 |
SPG |
\(140 \pm 30\)
|
\(200 \pm 100\)
|
\(290 \pm 170\)
|
\(350 \pm 200\)
|
MSR |
\(27 \pm 23\)
|
\(40 \pm 35\)
|
\(16 \pm 8\)
|
\(2 \pm 1\)
|
CP |
\(2.4 \pm 2\)
|
\(6.1 \pm 5\)
|
\(3.0 \pm 2\)
|
\(4.2 \pm 3\)
|
5.4 Effectiveness evaluation
-
Query density as the percentage of non-null cells in the query results.
-
Query coverage as the percentage of records considered by the query with respect to \({\mathcal {D}}^*\).
-
Aggregation veracity, i.e., whether the aggregation of partial results where mappings are missing is consistent with the results obtained in the ground-truth dataspace.
-
Selection support, i.e., whether the absence of mapping hinders the capability of applying selection predicates.
Dataspace | Removed mappings | Query density | Query coverage | Aggregation veracity | Selection support |
---|---|---|---|---|---|
\({\mathcal {D}}_1\) | \(a_{29} \not \equiv a_{30}\) | 70.3% | 100% | Yes | Partial |
\({\mathcal {D}}_2\) | \(a_{36} \not \equiv a_{37}\) | 80.0% | 100% | No | Partial |
\({\mathcal {D}}_3\) | \(a_{32} \not \equiv a_{33}\), | 91.1% | 64.3% | Yes | Partial |
\(a_{32} \not \equiv a_{34}\), | |||||
\(a_{32} \not \equiv a_{35}\) |
-
In case of projections, each feature returns a null value for every record in which the respective attribute is not defined (e.g., approximately \(50\%\) of null values are returned by both \(f^{'}_{9}\) and \(f^{''}_{9}\)). Notice that actual query densities in Table 8 are higher due to the projection of other features without null values.
-
In case of selection predicates, a disjunction of separate conditions would need to be manually formulated by the user on each feature (e.g., \(f^{'}_{9}< \text {``2020-01-01''} \vee f^{''}_{9} < \text {``2020-01-01''}\)); however, this would not be answerable, as we currently support only conjunctions of selection predicates.
\({\mathcal {D}}_2\) | \({\mathcal {D}}^*\) | ||||
---|---|---|---|---|---|
\(f^{'}_{12}\) | \(\{ f_8, sum() \}\) | \(f^{''}_{12}\) | \(\{ f_8, sum() \}\) | \(f_{12}\) | \(\{ f_8, sum() \}\) |
Faye | 201542.1 | Faye | 194213.3 | Faye | 366440.2 |
Baloch | 178805.2 | Baloch | 197430.7 | Baloch | 372510.7 |
Francois | 54354.3 | Francois | 54354.3 | ||
Alschitz | 11082.4 | Alschitz | 9030.1 | Alschitz | 20523.0 |
Guelleh | 67471.7 | Guelleh | 67471.7 | ||
Akongo | 186595.7 | Akongo | 186595.7 | ||
Nagy | 118644.1 | Nagy | 136006.7 | Nagy | 289375.9 |
-
A query involving features of either one of the two entities will return only a selected number of records; thus, the queries will mostly have full density, but the coverage will decrease significantly.
-
A query involving both features is not answerable, because they are not linked in the entity graph of \({\mathcal {D}}_3\).