• Nie Znaleziono Wyników

Methodology description: construction of Bitcoin users graph and derivation of time series to regression model

N/A
N/A
Protected

Academic year: 2021

Share "Methodology description: construction of Bitcoin users graph and derivation of time series to regression model"

Copied!
5
0
0

Pełen tekst

(1)

Methodology description: construction of Bitcoin users graph and derivation of time series to regression model

appendix to the article “Hedging Capabilities of Bitcoin for Central and East European Markets”

Jacek Mizerka, Agnieszka Stróżyńska-Szajek, Piotr Mizerka

1. Construction of users graph

We form the data from the blockchain into transaction hypergraph. The vertices of this hypergraph are the Bitcoin addresses occuring in the blockchain and the hyperedges correspond to transactions. Each hyperedge has therefore an additional structure of a bibartite directed graph (Bondy J.A. 1976), 𝜏 = (𝐴𝑖𝑛, 𝐴𝑜𝑢𝑡, 𝐸), where 𝐴𝑖𝑛 and 𝐴𝑜𝑢𝑡 stand for the sets of input and output addresses respectively and the edges in the set 𝐸 have their begins in 𝐴𝑖𝑛 and ends in 𝐴𝑜𝑢𝑡 (this structure follows naturally from the architecture of Bitcoin transactions - see (Antonopoulos 2017)). Additionally, every edge contains an information about amount of Bitcoin sent (in Satoshis) and time when the transaction  took place. We call the edges in 𝐸 elementary transactions. Denote by 𝐴 and 𝐻 respectively, the sets of all Bitcoin addresses and hyperedges-transactions contained in the whole hypergraph.

We derive the users graph 𝐺 = (𝑉, 𝐸) from the transaction hypergraph. The vertices 𝑉 correspond to users. Using a widely accepted method (Ron, D., Shamir, A. 2013, Reid, F., Harrigan, M. 2011), we identify Bitcoin addresses in the same inputs of transactions so that they belong to the same user. Hence, two Bitcoin addresses 𝑎 and 𝑎’ belong to the same user if and only if there is a finite sequence of transactions 𝜏1, … , 𝜏𝑛 and a sequence of Bitcoin addresses 𝑎1, … , 𝑎𝑛+1 such that 𝑎1 = 𝑎, 𝑎𝑛+1= 𝑎′ and 𝑎𝑖𝜖𝐴𝑖𝑛(𝜏𝑖)⌒𝐴𝑖𝑛(𝜏𝑖+1) for 𝑖 = 1, … , 𝑛 (here 𝐴𝑖𝑛(𝜏) stands for the set of input addresses of transaction ). From 𝐴 we obtain therefore the family of disjoint subsets of 𝐴 which sum up to 𝐴 and contain Bitcoin addresses corresponding to a single user. We call these subsets clusters and identify them with vertices, 𝑉, of 𝐺. Every elementary transaction t from any 𝜏 𝜖 𝐻, defines an edge 𝑒 𝜖 𝐸 of 𝐺 in the following way: the begin and the end of 𝑒 are the vertices corresponding to clusters defined by the begin and the end of 𝜏 respectively, the amount of Bitcoins and timestamp are the same as in 𝜏.

(2)

We decided to consider a subgraph of the whole users graph defined by users represented by at least 10 Bitcoin addresses. All the data presented further in the paper concerns this subgraph.

2. Obtaining values of variables from the users graph.

We believe that some features of the users graph may be responsible for the Bitcoin rate of return. When examining the Bitcoin users graph, we took into account the number and value of transactions made by the most active users of this cryptocurrency. On the account on big number of low-active users, we restrict our attention to the subgraph 𝑆 of users graph induced by users who were active by at least 1200 days (i.e. the time interval between the first and the last transaction observed was at least 1200 days) and took part in at least 200 elementary transactions. There are 𝑛 = 3967 users satisfying these conditions. We call the subgraph 𝑆 long-term subgraph.

Using 𝑆, we build the underlying graph 𝐺 = (𝑉, 𝐸) for base graphs (principal components) in the following way: 𝑉 = {𝑢1, … , 𝑢𝑛} = the set of 3967 users from 𝑆 and 𝐸 = {𝑒1, … , 𝑒𝐿} = the set of all directed edges 𝑢𝑖 → 𝑢𝑗 such that there was at least one elementary transaction from 𝑢𝑖 to 𝑢𝑗 (the number of such edges equals 𝐿 = 74175). Note that if multiple elementary transactions were made from 𝑢𝑖 to 𝑢𝑗, we still create only one edge 𝑢𝑖 → 𝑢𝑗 in 𝐺. The starting point for our snapshot series to be defined is 2013-04-29. Then, we split 𝑆 into weekly snapshots 𝑆1, … , 𝑆𝑇 (𝑇 = 249 is the number of weeks between 2013-04-29 and 2018-01-29) according to the parameter taken into account (transaction value or transaction number): 𝑆𝑡 is a weighted graph with underlying set of edges equal to 𝐸 and the weight of edge 𝑢𝑖 → 𝑢𝑗 from 𝐸 is the amount of a given parameter at week 𝑡 from 𝑢𝑖 to 𝑢𝑗. Having this, we can form a time series 𝑇 × 𝐿 matrix 𝑋 which rows correspond to the snapshots 𝑆𝑡:

𝑋 = (

𝑥1,1 ⋯ 𝑥1,𝐿

⋮ ⋱ ⋮

𝑥𝑇,1 ⋯ 𝑥𝑇,𝐿)

where, 𝑥𝑡,𝑙− value of a chosen parameter of an edge 𝑒𝑙 at day 𝑡 and 𝑙 = 1, … ,74175

We shall define base graphs as weighted graphs using the underlying graph 𝐺. We use for this purpose the Principal Component Analysis (PCA) (Izenman 2008).

(3)

In the first step, on the account on high variation of parameters, we normalize each row of 𝑋 and subtract column averages from each column – this is the standard procedure in PCA (we denote the so obtained matrix by 𝑋′) (Kondor, Csabai, Szüle, & Vattay, 2014).

Next, we perform the Singular Value Decomposition (SVD) of 𝑋′, 𝑋′ = 𝑈𝛴𝑉𝑇. The matrix 𝛴 is zero except for the diagonal on the first 𝑇 × 𝑇 submatrix and the entries of the diagonal are the singular values 𝜆1, … , 𝜆𝑇 of principal components sorted in decreasing order. The columns of 𝑉 correspond to principal components. In our case they are weighted graphs with the underlying graph 𝐺,

𝑉 = (𝑣1⋯ 𝑣𝑇,∗, … ,∗), 𝑣𝑖 = ( 𝑣1,𝑖

⋮ 𝑣𝐿,𝑖)

and we can think of 𝑣𝑖 as a graph 𝐺 with 𝑗-th edge weight equal to 𝑣𝑗,𝑖 (we do not take into account values marked by (*) – there are only 𝑇 relevant principal components). We call these principal components base graphs. Since we have ordered the singular values in decreasing order, the base graph responsible for the most changes in the long-term subgraph (i.e. that with the highest singular value – variance) is 𝑣1, the next 𝑣2 and so on. For each base graph 𝑣𝑖, we compute its associated time series 𝑠𝑖 = (𝑠𝑖,1, … , 𝑠𝑖,𝑇) by the formula 𝑠𝑖,𝑡 = ∑𝐿𝑙=1𝑣𝑙,𝑖𝑥𝑡,𝑙. PCA tells us that the time series 𝑠𝑖’, 𝑠𝑖,𝑡′ = ∑𝐿𝑙=1𝑣𝑙,𝑖𝑥𝑡,𝑙′, defined in a similar way for normalized vectors, are uncorrelated. The time series we take into next step are the three time series 𝑠1, 𝑠2 and 𝑠3 corresponding to the three normalized time series with the highest variations, that is 𝑠1′, 𝑠2′ and 𝑠3′. Since for the parameter value we take either transaction number or transaction value, we consider 6 time series altogether.

Figure 1. Bitcoin and CEE Index Rates of Return, Time series defined by principal components

-30,00%

-20,00%

-10,00%

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

2013-05-06 2013-11-06 2014-05-06 2014-11-06 2015-05-06 2015-11-06 2016-05-06 2016-11-06 2017-05-06 2017-11-06

CEE index rate of return

-30,00%

-20,00%

-10,00%

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

2013-05-06 2013-11-06 2014-05-06 2014-11-06 2015-05-06 2015-11-06 2016-05-06 2016-11-06 2017-05-06 2017-11-06

Bitcoin rate of return

(4)

The construction of users graph from the data obtained by the Hungarian research group (Kondor et. al) is available on (Mizerka, P. 2019).

-20000 -10000 0 10000 20000 30000 40000 50000 60000 70000 80000

2013-05-06 2013-11-06 2014-05-06 2014-11-06 2015-05-06 2015-11-06 2016-05-06 2016-11-06 2017-05-06 2017-11-06

s3 - time series of 3rd component (transaction value) -10000

-5000 0 5000 10000 15000 20000 25000 30000

2013-05-06 2013-11-06 2014-05-06 2014-11-06 2015-05-06 2015-11-06 2016-05-06 2016-11-06 2017-05-06 2017-11-06

s2 - time series of 2nd component (transaction value) -20000

-10000 0 10000 20000 30000 40000 50000

2013-05-06 2013-11-06 2014-05-06 2014-11-06 2015-05-06 2015-11-06 2016-05-06 2016-11-06 2017-05-06 2017-11-06

s1 - time series of 1st component (transaction value)

-5000 0 5000 10000 15000 20000

2013-05-06 2013-11-06 2014-05-06 2014-11-06 2015-05-06 2015-11-06 2016-05-06 2016-11-06 2017-05-06 2017-11-06

s2 - time series of 2nd component (transaction number) -10000

0 10000 20000 30000 40000 50000 60000

2013-05-06 2013-11-06 2014-05-06 2014-11-06 2015-05-06 2015-11-06 2016-05-06 2016-11-06 2017-05-06 2017-11-06

s1 - time series of 1st component (transaction number)

-12000 -1000010000-8000-6000-4000-200020004000600080000

2013-05-06 2013-11-06 2014-05-06 2014-11-06 2015-05-06 2015-11-06 2016-05-06 2016-11-06 2017-05-06 2017-11-06

s3 - time series of 3rd component (transaction number)

(5)

References:

Antonopoulos, A. M. (2017): Mastering Bitcoin. Programming the Open Blockchain. II. Beijing, Cambridge, Farnham, Koeln,Sebastopol, Tokyo: O'Reilly.

Bondy, J.A., Murty, U.S.R. (1976): Graph Theory with Applications. New York, Amsterdam, Oxford:

North Holland.

Izenman, A. J. (2008): Modern Multivariate Statistical Techniques. Regression, Classification and Manifold Learning: Springer.

Kondor, D., Csabai, I., Szüle, J., Pósfai, M., Vattay, G.: ELTE Bitcoin Project website and resources.

Project of Hungarian researchers, available on: http://www.vo.elte.hu/Bitcoin/.

Kondor, D., Csabai, I., Szüle, J., Pósfai, M., Vattay, G. (2014): Inferring the interplay between network structure and market effects in bitcoin. in: New Journal of Physics, (16).

Mizerka, P. (2019): Blockchain software, available on:

https://github.com/piotrmizerka/blockchain_software.

Reid, F., Harrigan, M. (2011): An Analysis of Anonymity in the Bitcoin System. 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust. Boston, USA, 2011.

Ron, D., Shamir, A. (2013): Quantitative Analysis of the Full Bitcoin Transaction Graph. Financial Cryptography and Data Security, 17th International Conference. Okinawa, Japan, 2013.

Cytaty

Powiązane dokumenty

Wydaje się, ůe bardziej uprawomocnionym wy- jaœnieniem, a juů na pewno bardziej uprawomocnionym w odniesieniu do przywoâywanych przeze mnie ksiĊůek poetki, byâoby

In a pilot study, we wanted to investigate if a complex robotic device (e.g. an exoskeleton robot with many degrees of freedom), such as the ARMin rehabilitation robot, is

specimen B (figure '40b) which shows much weld-undercut in the longitudinal. Figure '41b shows crack Al of specimen B at the side of the 6,5 mm weld, looking, from bulkhead

Aksjologia system owa obrazow ana w orzecznictwie Sądu Najwyższego ma tym większy wpływ na stosowa­ nie prawa kiedy jawi się poprzez uchwały, którym nadano moc zasad

Co-transcription of the gene encoding cytochrome c with genes encoding all other Rnf subunits is consistent with this hypothesis.The proposed roles for RnfG and RnfC (Fig. 10)

The solution where sę realizes the middle voice head in a number of verbal structures with reasonably uniform semantics and not a pronominal argument seems to us to be a superior

W ogólnym zestawieniu repatriantów, którzy przybyli drogą mor- ską do Polski, OW VII Lublin plasuje się jako ostatni, przybyło do niego zaledwie 4% wszystkich szeregowych.. Co

Op 79.000 woningen komen zo’n 4.750 woningen vrij en 500 woningen (een kleine 10% van de gemuteerde woningen) komen per jaar in aanmerking voor woningverbetering. Kosten per