Introduction
Related works
-
Data locality,
-
Iteration,
-
Load balancing, and
-
Data compression.
Methods
• Converting input data items to Kavosh format
<Rule-Key, Rule count, Target field count with the specified value>
• Rule extraction
Function | Description |
---|---|
NOC(TableName)
|
Returns the number of columns
|
NOR(TableName)
|
Returns the number of rows
|
NODV(TableName, ColumnName/ColumnID)
|
Returns the number of distinct values in the specified column
|
NaOC(TableName, k)
| Returns the name of the kth column in the specified table |
DVC(TableName,ColumnName/ColumnID, k)
| Returns the kth distinct value of the specified column |
CT(ColumnsArray[],TableName)
|
Creates a table with an input column array with a specified table name
|
RVT(TableName, ColumnName/ColumnID,RowID)
|
Returns the value of the specified table name, column name/column ID, and row ID
|
EXEC (Query, ResultTable)
|
Executes a query, creates a results table and puts the results into the results table
The results table can be created in RAM or HDD, according to the node hardware specification.
|
&
|
& is a logical operator (And)
|
I2B(i)
|
Converts the integer value i to a binary value
|
City | Sex | Income | New service | Mapper |
---|---|---|---|---|
Tehran | Male | High | Yes | 1 |
Tehran | Female | Low | Yes | |
Yazd | Male | High | No | |
Yazd | Male | High | Yes | 2 |
Tehran | Female | Low | No | |
Yazd | Male | High | No |
Tehran | Sex | High | New service | Mapper |
---|---|---|---|---|
1 | 1 | 1 | 1 | 1 |
1 | 0 | 0 | 1 | |
0 | 1 | 1 | 0 | |
0 | 1 | 1 | 1 | 2 |
1 | 0 | 0 | 0 | |
0 | 1 | 1 | 0 |
Decimal rule | Target field | Rule count | Rule sum | Mapper |
---|---|---|---|---|
7 | 1 | 1 | 1 | 1 |
4 | 1 | 1 | 1 | |
3 | 0 | 1 | 0 | |
3 | 0 | 1 | 0 | 2 |
4 | 1 | 1 | 1 | |
3 | 0 | 1 | 0 |
Rule | Rule count | Rule sum |
---|---|---|
7 | 1 | 1 |
4 | 2 | 2 |
3 | 3 | 0 |
Mapping field | Rule | MHB_Key (Binary) | MHB_Key (Decimal) | Rule count | Rule sum |
---|---|---|---|---|---|
111 | 111 | 111111 | 63 | 1 | 1 |
111 | 100 | 111100 | 60 | 2 | 2 |
111 | 011 | 111011 | 59 | 3 | 0 |
110 | 111 | 110110 | 54 | 1 | 1 |
110 | 100 | 110100 | 52 | 2 | 2 |
110 | 011 | 110010 | 50 | 3 | 0 |
101 | 111 | 101101 | 45 | 1 | 1 |
101 | 100 | 101100 | 44 | 2 | 2 |
101 | 011 | 101001 | 41 | 3 | 0 |
100 | 111 | 100100 | 36 | 1 | 1 |
100 | 100 | 100100 | 36 | 2 | 2 |
100 | 011 | 100000 | 32 | 3 | 0 |
011 | 111 | 011011 | 27 | 1 | 1 |
011 | 100 | 011000 | 24 | 2 | 2 |
011 | 011 | 011011 | 27 | 3 | 0 |
010 | 111 | 010010 | 18 | 1 | 1 |
010 | 100 | 010000 | 16 | 2 | 2 |
010 | 011 | 010010 | 18 | 3 | 0 |
001 | 111 | 001001 | 9 | 1 | 1 |
001 | 100 | 001000 | 8 | 2 | 2 |
001 | 011 | 001001 | 9 | 3 | 0 |
MHB_Key (Binary) | MHB_Key (Decimal) | Rule count | Rule sum |
---|---|---|---|
111111 | 63 | 1 | 1 |
111100 | 60 | 2 | 2 |
111011 | 59 | 3 | 0 |
110110 | 54 | 1 | 1 |
110100 | 52 | 2 | 2 |
110010 | 50 | 3 | 0 |
101101 | 45 | 1 | 1 |
101100 | 44 | 2 | 2 |
101001 | 41 | 3 | 0 |
100100 | 36 | 3 | 3 |
100000 | 32 | 3 | 0 |
011011 | 27 | 4 | 1 |
011000 | 24 | 2 | 2 |
010010 | 18 | 4 | 1 |
010000 | 16 | 2 | 2 |
001001 | 9 | 4 | 1 |
001000 | 8 | 2 | 2 |
Rule | Support (%) | Confidence (one) (%) |
---|---|---|
63 | 2 | 100 |
60 | 5 | 100 |
59 | 7 | 0 |
54 | 2 | 100 |
52 | 5 | 100 |
50 | 7 | 0 |
45 | 2 | 100 |
44 | 5 | 100 |
41 | 7 | 0 |
36 | 7 | 100 |
32 | 7 | 0 |
27 | 10 | 20 |
24 | 5 | 100 |
18 | 10 | 20 |
16 | 5 | 100 |
9 | 10 | 20 |
8 | 5 | 100 |
Rule | Support (%) | Confidence (one) (%) |
---|---|---|
59 | 7 | 0 |
50 | 7 | 0 |
41 | 7 | 0 |
36 | 7 | 100 |
32 | 7 | 0 |
-
The input data are converted to a binominal format.
-
The converted data are distributed among the Mappers.
-
The results of the Mappers are sent to the Reducer layer.
-
Mapping fields are added to the results of the Reducer layer, and MHB_Keys are generated.
-
The Reducer results are sent to the Rule generation layer.
-
Confidence and support parameters are applied to the extracted rules to create the final results.
Evaluation
TPC-DS
Node type | DBMS |
---|---|
Mapper | PostgreSQL |
Reducer | Redis |
Rule generation | Redis |
Table name | Number of rows |
---|---|
Items | 502,000 |
web_sales | 71,999,670,164 |
Store_sales | 287,997,818,084 |
Customer_Demographics | 1,920,800 |
CPU | Intel Core i7-3770 Quad-Core Processor 3.4 GHz |
---|---|
HDD | 2 TB |
RAM | 64 GB |
CPU | Intel Core i7-3770 Quad-Core Processor 3.4 GHz |
---|---|
HDD | 500 GB |
RAM | 512 GB |
CPU | Intel Core i7-3770 Quad-Core Processor 3.4 GHz |
---|---|
HDD | 1.5 TB |
RAM | 150 GB |
Parameter name | Parameter value (%) |
---|---|
Confidence | 90 |
Support | 0.05 |
Execution time
Method name | Map (s) | Reduce (s) | Rule generation (s) |
---|---|---|---|
FiDoop | 1265 | 1194 | 0 |
Sequence-Growth | 3500 | 2890 | 0 |
Kavosh | 288 | 265 | 1159 |
Compression
Method name | Used table data volume (TB) |
---|---|
FiDoop | 39.9 |
Sequence-Growth | 70 |
Kavosh | 70 |
Load balancing
Method | Balance_Factor (s) |
---|---|
FiDoop | 800 |
Sequence-Growth | 3230 |
Kavosh | 15 |
Real traffic data of a mobile operator
MobileNumber | ServiceID |
---|---|
09121450111 | 1041 |
09121450111 | 58 |
09121450111 | 971 |
09121450111 | 119 |
09123895004 | 971 |
09191005069 | 113 |
… | … |
Parameter name | Parameter value (%) |
---|---|
Confidence | 85 |
Support | 0.1 |
Method name | Map (s) | Reduce (s) | Rule generation (s) |
---|---|---|---|
FiDoop | 206 | 199 | 0 |
Sequence-Growth | 779 | 750 | 0 |
Kavosh | 75 | 71 | 119 |
Compression
Method name | Table data volume used (TB) |
---|---|
FiDoop | 19 |
Sequence-Growth | 40 |
Kavosh | 40 |
Load balancing
Method | Balance_Factor (s) |
---|---|
FiDoop | 683 |
Sequence-Growth | 2800 |
Kavosh | 17 |