Introduction
-
Construction of the support social network starting from the training set.
-
Construction of the spectrum of each user from the data about her behavior stored in the dataset and the social network built in the previous step.
-
Selection of the classes of interest.
-
Construction of the spectrum of each class from the spectra of the corresponding users.
-
Definition of a new version of the Eros distance tailored to our scenario.
-
For each new user:
-
Computation of the Eros distance between her spectrum and the one of each class.
-
Assignment of the user to the nearest class (or to no class, if her spectrum is very far from the ones of all classes) based on the values of the Eros distance computed in the previous step.
-
Related literature
Proposed method
Modeling a blockchain as a social network
-
In-degree
: it represents the number of arcs incoming to \(n_i\) and, therefore, the number of nodes of \({{{\mathcal {G}}}}\) pointing to \(n_i\). It can be determined by computing the cardinality of the set:$$IN_i = \{ n_j | (n_j, n_i, TrS_{ji}) \in A \}$$ -
Out-degree
: it denotes the number of arcs outgoing from \(n_i\) and, therefore, the number of nodes of \({{{\mathcal {G}}}}\) which \(n_i\) points to. It can be determined by computing the cardinality of the set:$$OUT_i = \{ n_j | (n_i, n_j, TrS_{ij}) \in A \}$$ -
In-transaction
: it indicates the number of transactions towards \(n_i\) made by the nodes of \({{{\mathcal {G}}}}\). It can be computed as:where \(|TrS_{ji}|\) denotes the cardinality of the set \(TrS_{ji}\).$$\sum _{n_j \in IN_i} |TrS_{ji}|$$ -
Out-transaction
: it represents the number of transactions towards the nodes of \({{{\mathcal {G}}}}\) made by \(n_i\). It can be computed as:$$\sum _{n_j \in OUT_i} |TrS_{ij}|$$ -
In-value
: it denotes the total amount of Wei received by \(n_i\). It can be computed as:$$\sum _{n_j \in IN_i} \sum _{k = 1 .. |TrS_{ji}|} v_{ji_k}$$ -
Out-value
: it indicates the total amount of Wei sent by \(n_i\). It can be computed as:$$\sum _{n_j \in OUT_i} \sum _{k = 1 .. |TrS_{ij}|} v_{ij_k}$$ -
Clustering-coefficient
: it represents the clustering coefficient of \(n_i\). Recall that, in Social Network Analysis, this parameter is an indicator of the tendency of \(n_i\) and its neighbors to form a cluster. -
PageRank
: it denotes the PageRank of \(n_i\). This parameter is an indicator of the number of links received by \(n_i\), the centrality of the neighbors of \(n_i\) and their propensity to link to each other [49].
Defining the spectrum of a user or a class of users
Day
: its \(h^{th}\) element indicates the date corresponding to \(d_h\).In-degree
: its \(h^{th}\) element denotes the number of addresses from which \(n_i\) received transactions during the time interval \(\tau _h\) between 12:00 am of \(d_{h-1}\) and 12:00 am of \(d_h\).Out-degree
: its \(h^{th}\) element indicates the number of addresses to which \(n_i\) has made transactions during \(\tau _h\).In-transaction
: its \(h^{th}\) element denotes the number of transactions received by \(n_i\) during \(\tau _h\).Out-transaction
: its \(h^{th}\) element indicates the number of transactions made by \(n_i\) during \(\tau _h\).In-value
: its \(h^{th}\) element denotes the amount of Wei received from \(n_i\) during \(\tau _h\).Out-value
: its \(h^{th}\) element indicates the amount of Wei sent by \(n_i\) during \(\tau _h\).Clustering-coefficient
: its \(h^{th}\) element denotes the clustering coefficient of \(n_i\) in the social network \({{{\mathcal {G}}}}(d_{h-1}, d_h)\).PageRank
: its \(h^{th}\) element indicates the PageRank of \(n_i\) in \({{{\mathcal {G}}}}(d_{h-1}, d_h)\).Defining the new version of the Eros Distance
-
The set Cl of the classes of interest; in our case, this set consists of the classes “Token Contract”, “Exchange”, “Bancor” and “Uniswap”.
-
The set \({{{\mathcal {S}}}}_{train}\) of the spectra of the training addresses; the element \({{{\mathcal {S}}}}_{train}^i\) represents the set of spectra of the training addresses assigned to the class \(Cl_i\).
-
The parameter step, which is a decimal number in the range [0, 1]. As we will see below, it allows the management of a tradeoff between the accuracy and the computation time of our heuristics. In fact, the smaller the step, the more accurate the output of our heuristics, but the longer its computation time.
Classifying users based on their spectra
Experiments
Dataset
Parameter | Value |
---|---|
Number of transactions | 41,420,435 |
Total number of addresses | 5,553,645 |
Total number of from_address | 4,980,691 |
Total number of to_addresses | 4,471,985 |
Cardinality of the intersection between from_address and to_address | 3,899,031 |
Number of null from_address | 1 |
Number of null to_address | 2 |
from_address
, representing the blockchain address starting the transaction; (ii) to_address
, denoting the blockchain address receiving the transaction; (iii) timestamp
, indicating the transaction timestamp; (iv) value
, representing the amount of Wei transferred during the transaction.An example of user spectrum
0xf0ee6b27b759c9893ce4f094b49ad28fd15a23e4
and to the time interval T ranging from September \(1^{st}\), 2019 to September \(30^{th}\), 2019.Day | In- degree | Out- degree | In-transactions | Out-transactions | In- value | Out- value | Clustering- coefficient | PageRank |
---|---|---|---|---|---|---|---|---|
2019-09-01 | 14 | 0 | 36 | 0 | 36 | 0 | 0.000020 | 0.021978 |
2019-09-02 | 11 | 0 | 24 | 0 | 24 | 0 | 0.000014 | 0.010526 |
2019-09-03 | 30 | 0 | 45 | 0 | 45 | 0 | 0.000019 | 0.003171 |
2019-09-04 | 21 | 0 | 36 | 0 | 36 | 0 | 0.000015 | 0.003025 |
2019-09-05 | 16 | 0 | 28 | 0 | 28 | 0 | 0.000013 | 0.002261 |
2019-09-06 | 22 | 0 | 46 | 0 | 46 | 0 | 0.000013 | 0.002272 |
2019-09-07 | 25 | 0 | 54 | 0 | 54 | 0 | 0.000014 | 0.002922 |
2019-09-08 | 18 | 0 | 46 | 0 | 46 | 0 | 0.000026 | 0.002871 |
2019-09-09 | 15 | 0 | 45 | 0 | 45 | 0 | 0.000026 | 0.002669 |
2019-09-10 | 22 | 0 | 63 | 0 | 63 | 0 | 0.000028 | 0.002312 |
2019-09-11 | 24 | 0 | 78 | 0 | 78 | 0 | 0.000031 | 0.002150 |
2019-09-12 | 25 | 0 | 85 | 0 | 85 | 0 | 0.000031 | 0.002070 |
2019-09-13 | 18 | 0 | 49 | 0 | 49 | 0 | 0.000031 | 0.002020 |
2019-09-14 | 8 | 0 | 22 | 0 | 22 | 0 | 0.000030 | 0.001925 |
2019-09-15 | 10 | 0 | 12 | 0 | 12 | 0 | 0.000029 | 0.001733 |
2019-09-16 | 24 | 0 | 34 | 0 | 34 | 0 | 0.000031 | 0.001689 |
2019-09-17 | 12 | 0 | 18 | 0 | 18 | 0 | 0.000030 | 0.001578 |
2019-09-18 | 24 | 0 | 34 | 0 | 34 | 0 | 0.000031 | 0.001543 |
2019-09-19 | 13 | 0 | 16 | 0 | 16 | 0 | 0.000031 | 0.001587 |
2019-09-20 | 24 | 0 | 35 | 0 | 35 | 0 | 0.000031 | 0.001542 |
2019-09-21 | 23 | 0 | 29 | 0 | 29 | 0 | 0.000031 | 0.001501 |
2019-09-22 | 12 | 0 | 20 | 0 | 20 | 0 | 0.000032 | 0.001494 |
2019-09-23 | 15 | 0 | 29 | 0 | 29 | 0 | 0.000032 | 0.001462 |
2019-09-24 | 19 | 0 | 43 | 0 | 43 | 0 | 0.000031 | 0.001436 |
2019-09-25 | 28 | 0 | 55 | 0 | 55 | 0 | 0.000032 | 0.001481 |
2019-09-26 | 20 | 0 | 31 | 0 | 31 | 0 | 0.000031 | 0.001436 |
2019-09-27 | 15 | 0 | 33 | 0 | 33 | 0 | 0.000031 | 0.001440 |
2019-09-28 | 17 | 0 | 29 | 0 | 29 | 0 | 0.000032 | 0.001339 |
2019-09-29 | 27 | 0 | 57 | 0 | 57 | 0 | 0.000033 | 0.001308 |
2019-09-30 | 19 | 0 | 27 | 0 | 27 | 0 | 0.000033 | 0.001308 |
Defining the classes of interest
-
The “Token Contract” class includes addresses using tokens instead of Ether. Tokens are an alternative currency to Ether, used to fasten up and simplify processes.
-
The “Exchange” class includes addresses acting as money changers; these allow clients to buy and sell cryptocurrencies.
-
The “Bancor” class includes addresses acting as banks. A bancor allows clients to deposit and convert each available token in the network, without counterparts, automatically at a given price, using a simple web wallet.
-
The “Uniswap” class includes addresses using the “Uniswap”9 protocol for the automatic exchange of tokens in Ethereum.
Class | Number of addresses |
---|---|
Token Contract | 1866 |
Exchange | 1021 |
Bancor | 666 |
Uniswap | 577 |
Defining class spectra
In-transaction
is totally correlated with In-value
, while Out-transaction
is totally correlated with Out-value
. Furthermore, there are other strong correlations. For instance, In-degree
is strongly correlated with In-transaction
and In-value
, while Out-degree
is strongly correlated with Out-transaction
and Out-value
.-
In principle, we could remove one feature between
In-transaction
andIn-value
and one feature betweenOut-transaction
andOut-value
from the spectrum. We decided not to do so because the result refers to a specific time interval. We believe it is plausible that it applies to the other time intervals as well. However, since a formal proof of this is not possible, we felt it appropriate to preserve all features. As a consequence of this decision, it is to be expected that some spectrum features will have perfectly coincident trends in the following. -
There are strong correlations between several spectrum features. Consequently, they cannot be considered independent of each other and the spectrum of an address in a time interval must be analyzed as a multivariate time series.
Spectrum of the class “Token Contract”
Feature | Minimum Value | Maximum Value | Mean Value | Standard Deviation |
---|---|---|---|---|
In-degree | 4.65 | 91.40 | 20.52 | 18.60 |
Out-degree | 0 | 0 | 0 | 0 |
In-transaction | 10.80 | 354.44 | 59.24 | 70.76 |
Out-transaction | 0 | 0 | 0 | 0 |
In-value | 10.81 | 314.44 | 59.24 | 70.76 |
Out-value | 0 | 0 | 0 | 0 |
Clustering-coefficient | \(5.80 \cdot 10^{-4}\) | \(2.90 \cdot 10^{-2}\) | \(8.40 \cdot 10^{-3}\) | \(7.30 \cdot 10^{-3}\) |
PageRank | \(1.61 \cdot 10^{-5}\) | \(9.41 \cdot 10^{-5}\) | \(5.97 \cdot 10^{-5}\) | \(2.24 \cdot 10^{-5}\) |
In-transaction
and In-Degree
, on the one hand, and Out-transaction
, Out-degree
and Out-value
, on the other hand. In addition, there are strong similarities between the trends of In-degree
on the one hand, and In-transaction
and In-value
on the other hand. To quantify this fact, we computed the correlation matrix for the spectrum features of this class. It is shown in Figure 5. This figure also reveals another interesting correlation, i.e., a strong inverse correlation between Clustering-coefficient
and PageRank.
Spectrum of the class “Exchange”
Feature | Minimum Value | Maximum Value | Mean Value | Standard Deviation |
---|---|---|---|---|
In-degree | 73.00 | 322.05 | 145.22 | 96.60 |
Out-degree | 21.40 | 190.13 | 83.78 | 55.43 |
In-transaction | 84.56 | 387.67 | 173.85 | 81.61 |
Out-transaction | 76.37 | 417.83 | 185.35 | 93.10 |
In-value | 84.56 | 387.67 | 173.85 | 81.61 |
Out-value | 76.37 | 417.83 | 185.33 | 93.10 |
Clustering-coefficient | \(5.26 \cdot 10^{-4}\) | \(1.99 \cdot 10^{-2}\) | \(4.99 \cdot 10^{-3}\) | \(5.02 \cdot 10^{-3}\) |
PageRank | \(2.76 \cdot 10^{-4}\) | \(5.68 \cdot 10^{-4}\) | \(4.43 \cdot 10^{-4}\) | \(8.00 \cdot 10^{-5}\) |
In-transaction
, In-degree
and In-value
are identical. Similarly, the trends of Out-transaction
and Out-value
are identical. There is also a strong correlation between these last trends and the one of Out-degree
.In-degree
, In-transactions
and In-value
, as well as between Out-transactions
and Out-value
. There is also a very high correlation, equal to 0.92, between Out-degree
and Out-transactions
and between Out-degree
and Out-value
. All these values fully confirm what we have deduced above from the direct observations of the trends in Figure 6.
Spectrum of the class “Bancor”
Feature | Minimum Value | Maximum Value | Mean Value | Standard Deviation |
---|---|---|---|---|
In-degree | 0.42 | 9.63 | 3.10 | 2.23 |
Out-degree | 0 | 0 | 0 | 0 |
In-transaction | 1.57 | 37.40 | 9.47 | 8.04 |
Out-transaction | 0 | 0 | 0 | 0 |
In-value | 1.57 | 37.47 | 9.47 | 8.04 |
Out-value | 0 | 0 | 0 | 0 |
Clustering-coefficient | \(1.87 \cdot 10^{-4}\) | \(4.27 \cdot 10^{-3}\) | \(1.32 \cdot 10^{-3}\) | \(1.01 \cdot 10^{-3}\) |
PageRank | \(8.99 \cdot 10^{-7}\) | \(3.57 \cdot 10^{-6}\) | \(1.49 \cdot 10^{-6}\) | \(6.21 \cdot 10^{-7}\) |
Out-transaction
, Out-degree
and Out-value
are identical. An analogous discourse is valid for the trends of In-transaction
and In-value
, which, in turn, show a strong correlation with the trend of In-degree
.Clustering-coefficient
and In-degree
. It also reveals a strong correlation between In-transaction
, In-value
and In-degree
, on one hand, and Out-transaction
, Out-value
and Out-degree
, on the other hand. This is typical of this class of addresses that represents bankers.
Spectrum of the class “Uniswap”
Feature | Minimum Value | Maximum Value | Mean Value | Standard Deviation |
---|---|---|---|---|
In-degree | 0.42 | 9.63 | 3.10 | 2.23 |
Out-degree | 0 | 0 | 0 | 0 |
In-transaction | 1.57 | 37.40 | 9.47 | 8.04 |
Out-transaction | 0 | 0 | 0 | 0 |
In-value | 1.57 | 37.47 | 9.47 | 8.04 |
Out-value | 0 | 0 | 0 | 0 |
Clustering-coefficient | \(1.87 \cdot 10^{-4}\) | \(4.27 \cdot 10^{-3}\) | \(1.32 \cdot 10^{-3}\) | \(1.01 \cdot 10^{-3}\) |
PageRank | \(8.99 \cdot 10^{-7}\) | \(3.57 \cdot 10^{-6}\) | \(1.49 \cdot 10^{-6}\) | \(6.21 \cdot 10^{-7}\) |
Out-transaction
, Out-degree
and Out-value
are identical. The same conclusion applies to the trends of In-transaction
and In-value
. In addition, we can observe a strong correlation between the trend of In-degree
and the ones of In-value
and In-transaction
.Clustering-coefficient
and PageRank
and a good correlation between PageRank
and In-Degree
.
Weights of the Eros distance
Class | Weights |
---|---|
Token Contract | In-degree: 0.18 |
Out-degree: 0 | |
In-transactions: 0.26 | |
Out-transactions: 0 | |
In-value: 0.26 | |
Out-value: 0 | |
PageRank: 0.14 | |
Clustering-coefficient: 0.16 | |
Exchange | In-degree: 0.13 |
Out-degree: 0.15 | |
In-transactions: 0.13 | |
Out-transactions: 0.15 | |
In-value: 0.13 | |
Out-value: 0.15 | |
PageRank: 0.10 | |
Clustering-coefficient: 0.06 | |
Bancor | In-degree: 0.27 |
Out-degree: 0 | |
In-transactions: 0.27 | |
Out-transactions: 0 | |
In-value: 0.27 | |
Out-value: 0 | |
PageRank: 0.10 | |
Clustering-coefficient: 0.09 | |
Uniswap | In-degree: 0.12 |
Out-degree: 0 | |
In-transactions: 0.12 | |
Out-transactions: 0 | |
In-value: 0.12 | |
Out-value: 0 | |
PageRank: 0.31 | |
Clustering-coefficient: 0.33 |
-
As for the class “Token Contract” the most important features are
In-transactions
andIn-value
. An intermediate weight is assigned toIn-degree
,Clustering-coefficient
andPageRank
. Finally,Out-degree
,Out-transactions
andOut-value
have no weight. -
As far as the class “Exchange” is concerned, all features have roughly similar weights.
-
Regarding the class “Bancor”, the most important features are
In-transactions
,In-value
eIn-degree
. A fairly small weight is assigned toPageRank
andClustering-coefficient
. Finally, the other ones have no weight. -
As far as the class “Uniswap” is concerned, the most important features are
PageRank
andClustering-coefficient
. A small to medium weight is assigned to the featuresIn-degree
,In-transactions
andIn-value
. The other ones have no weight.
Evaluation
Class | Number of addresses |
---|---|
Token Contract | 1954 |
Exchange | 1052 |
Bancor | 684 |
Uniswap | 592 |
Evaluating our approach with the original Eros distance
Token Contract | Exchange | Bancor | Uniswap | |
---|---|---|---|---|
Token Contract | 1632 | 88 | 224 | 10 |
Exchange | 62 | 964 | 54 | 72 |
Bancor | 124 | 20 | 523 | 17 |
Uniswap | 18 | 70 | 20 | 484 |
Metric | Value |
---|---|
Accuracy | 0.75 |
Micro Average Precision | 0.74 |
Macro Average Precision | 0.62 |
Micro Average Recall | 0.74 |
Macro Average Recall | 0.63 |
Micro Average F1-Score | 0.76 |
Macro Average F1-Score | 0.76 |
Evaluating our approach with an exhaustive examination of all weight combinations for the Eros distance
Class | Weights |
---|---|
Token Contract | In-degree: 0.15 |
Out-degree: 0 | |
In-transactions: 0.30 | |
Out-transactions: 0 | |
In-value: 0.30 | |
Out-value: 0 | |
PageRank: 0.12 | |
Clustering-coefficient: 0.13 | |
Exchange | In-degree: 0.12 |
Out-degree: 0.16 | |
In-transactions: 0.12 | |
Out-transactions: 0.16 | |
In-value: 0.12 | |
Out-value: 0.16 | |
PageRank: 0.11 | |
Clustering-coefficient: 0.09 | |
Bancor | In-degree: 0.30 |
Out-degree: 0 | |
In-transactions: 0.30 | |
Out-transactions: 0 | |
In-value: 0.30 | |
Out-value: 0 | |
PageRank: 0.06 | |
Clustering-coefficient: 0.04 | |
Uniswap | In-degree: 0.10 |
Out-degree: 0 | |
In-transactions: 0.10 | |
Out-transactions: 0 | |
In-value: 0.10 | |
Out-value: 0 | |
PageRank: 0.34 | |
Clustering-coefficient: 0.36 |
Token contract | Exchange | Bancor | Uniswap | |
---|---|---|---|---|
Token Contract | 1896 | 18 | 32 | 8 |
Exchange | 21 | 984 | 24 | 23 |
Bancor | 36 | 15 | 621 | 12 |
Uniswap | 12 | 32 | 16 | 532 |
Metric | Value |
---|---|
Accuracy | 0.97 |
Micro Average Precision | 0.94 |
Macro Average Precision | 0.93 |
Micro Average Recall | 0.94 |
Macro Average Recall | 0.93 |
Micro Average F1-Score | 0.94 |
Macro Average F1-Score | 0.93 |
Evaluating our approach with our version of the Eros distance
Class | Weights |
---|---|
Token Contract | In-degree: 0.17 |
Out-degree: 0 | |
In-transactions: 0.28 | |
Out-transactions: 0 | |
In-value: 0.28 | |
Out-value: 0 | |
PageRank: 0.14 | |
Clustering-coefficient: 0.13 | |
Exchange | In-degree: 0.13 |
Out-degree: 0.13 | |
In-transactions: 0.13 | |
Out-transactions: 0.13 | |
In-value: 0.13 | |
Out-value: 0.13 | |
PageRank: 0.12 | |
Clustering-coefficient: 0.10 | |
Bancor | In-degree: 0.29 |
Out-degree: 0 | |
In-transactions: 0.29 | |
Out-transactions: 0 | |
In-value: 0.20 | |
Out-value: 0 | |
PageRank: 0.08 | |
Clustering-coefficient: 0.05 | |
Uniswap | In-degree: 0.12 |
Out-degree: 0 | |
In-transactions: 0.12 | |
Out-transactions: 0 | |
In-value: 0.12 | |
Out-value: 0 | |
PageRank: 0.31 | |
Clustering-coefficient: 0.33 |
Token contract | Exchange | Bancor | Uniswap | |
---|---|---|---|---|
Token contract | 1838 | 44 | 54 | 18 |
Exchange | 33 | 956 | 31 | 33 |
Bancor | 42 | 18 | 608 | 16 |
Uniswap | 14 | 46 | 18 | 514 |
Metric | Value |
---|---|
Accuracy | 0.91 |
Micro Average Precision | 0.91 |
Macro Average Precision | 0.90 |
Micro Average Recall | 0.91 |
Macro Average Recall | 0.89 |
Micro Average F1-Score | 0.91 |
Macro Average F1-Score | 0.89 |
Computation time analysis
-
The time required to build the training (resp., testing) network was 2,522 (resp., 2,734) seconds.
-
The time necessary to compute the spectra of the training (resp., testing) users was 9,234 (resp., 9,624) seconds. This is the largest computation time. It was necessary because, for the computation of the spectrum of a user, it is necessary to compute the clustering coefficient of the corresponding network node, which requires most of the time indicated above.
-
The time required to compute the spectra of the training and testing classes from the ones of the corresponding users is negligible.
-
The time required for classifying the training (resp., testing) users adopting our version of the Eros distance was 1,242 (resp., 1,410) seconds.