On an average day, the Ethereum mainnet will handle about 15 transactions per second. How are these transactions distributed? Are some accounts more likely to receive a transaction than others? If so, by how much?

Preferential attachment is the idea that the more connected a node is, the more likely it is to receive new connections. We see this often in the real world, where networks form a small number of hubs that account for most of the activity. This behavior tends to generate power-law probability distributions with the scale-free property.

Transactions can be viewed as a directed graph where an edge from node \(i\) to node \(j\) denotes a transaction sent from one account to another.

- Let \(k_i\) denote the in-degree of node \(i\) (the number of transactions sent to account \(i\))
- Let \(n_k\) denote the number of nodes with degree \(k\)

Note that we’re not interested in the structure of the network, only the frequency of in-degree. The degree distribution is then just the normalized in-degree

\[P(k_i = k) = P_k = \frac{n_k}{\sum_{k} n_k}\]

Here’s what the network looked like yesterday (logarithmic scale)

This is exactly what we’d expect in the presence of preferential attachment. Most accounts aren’t likely to receive many transactions. On the other hand, a few accounts are likely to receive most of the transactions.

The power law density function is \(p(k) = Ck^{-\alpha}\) with normalization factor \(C = \frac{\alpha - 1}{k_{\text{min}}^{1 - \alpha}}\). Our goal is to model the transactions by estimating \(\alpha\) and \(k_\text{min}\) appropriately.

We compute the scaling parameter \(\alpha\) by maximizing the log-likelihood of the node degrees given the model. The maximum likelihood estimate is:

\[\hat{\alpha} = 1 + n (\sum_{i = 1}^{n} \ln \frac{k_i}{k_{\text{min}}})^{-1}\]

With error estimate \(\sigma = \frac{\alpha - 1}{\sqrt{n}}\).

It isn’t exactly clear where the power-law behavior begins in our data. We’ll start with the minimum node degree (1) and calculate \(\hat{\alpha}\) for each value of \(x_\text{min}\), choosing the combination that minimizes the Kolmogorov-Smirnov statistic:

\[ D = \max_{x \geq x_{\text{min}}} | S(x) - P(x) | \]

where \(S(x)\) and \(P(x)\) are the observed and expected CDFs respectively.

The degree distribution here is best modeled with scaling parameter \(\hat{\alpha} = 2.29\) for values greater than \(x_{\text{min}} = 11\). A scaling parameter of \(2 < \alpha < 3\) indicates that the transaction network is scale-free.