Papers on Blockchain and Bitcoin: Student notes

Lots of time devoted to this post. The reader will find it useful to get an initial idea of both concepts: Bitcoin and blockchain.

As usual, a disclaimer note: This summary is not comprehensive and it reflects mostly literal extracts from the mentioned papers and article.

Let's start summarising a classic paper, the paper on bitcoin, written by Satoshi Nakamoto.

Paper 1
Bitcoin: A Peer-to-Peer Electronic Cash System by Satoshi Nakamoto

This paper appeared on 2008. The first Bitcoins were exchanged in 2009. By June 2011 there were around 10000 users and 6.5 million Bitcoins [Info coming from this paper]. 

Bitcoins (BTC) are generated at a predictable rate. The eventual total number of BTC will 21 million [Info coming from this paper].

This is a proposal of a peer-to-peer electronic cash version. The novelty here is that there is no need for a trusted third party to tackle the double spending challenge. The key for the functioning of this peer-to-peer network is that the majority of CPU power is not controlled by an attacker.

The proposal is to replace the need of trust, an element that make transaction costs higher by a cryptographic (and distributed) proof. More specifically, via a peer-to-peer distributed timestamp server that generates computational proof of the chronological order of transactions.

Interestingly, an electronic coin is a chain of digital signatures. The main challenge in an electronic payment system is double spending avoidance. A digital mint would solve this, however this means that the entire scheme would depend on the mint.

The participants of the peer-to-peer network form a collective consensus regarding the validity of this transaction by appending it to the public history of previously agreed transactions (the blockchain) using a hashing function and their keypair. A transaction can have multiple inputs and multiple outputs [Info coming from this paper].

The only way to confirm the absence of a transaction is to be aware of all transactions. Without the role of a central party, this translates into two requirements:
- All transactions must be publicly announced.
- All market participants should agree on a single payment history. In practical terms, this means the majority of market participants.

The first technical element requiring this electronic payment proposal is a timestamp server. Each timestamp includes all previous timestamps. A timestamp consists of a published hash of a block of items.

How do we consider the distributed nature of the system in the case of the timestamp servers? The authors (or author) propose a proof-of-work system. In this case, the proof-of-work means one CPU-one vote. The majority decision is represented by the longest chain. This longest chain will have the greatest proof of work effort invested in it.

Technically, the proof-of-work involves scanning for a value that when hashed, the hash begins with a number of zero bits. The average work required is exponential in the number of zero bits required and can be verified by executing a single hash.

The reader can start grasping the great degree of CPU-intensity that this electronic payment system requires.

Nodes always consider the longest chain to be the correct one and they will keep working on extending it.

Incentives come both from the creation of a new coin and from transaction fees. The first transaction in a block is a special transaction that starts a new coin owned by the creator of the block. Potential attackers will find much more benefitial to them to devote their CPU cycles and electricity to create new coins rather than to re-do existing blocks and to control enough consensus for theirs.

Transactions are hashed in a Merkle Tree way. This makes payment verification possible without running a full network node. This verification helps as long as honest nodes control the network.

The authors state that there is never the need to extract a complete standalone copy of a transaction history.

As all transactions need to be published, the way to obtain privacy in this system is to de-couple identities from public keys. However, privacy is only partially guaranteed. An additional recommendation is to use a new keypair for each new transaction.

An attacker can only try to change one of their own transactions to take back money that they recently spent.




Paper 2
An Analysis of Anonymity in the Bitcoin System by Fergal Reid and Martin Harrigan

This paper provides further input, probably the first paper, on the topic we mentioned earlier in this post regarding Bitcoin and the limits of user anonymity.

The decentralised nature of the BTC system and the lack of a central authority brings along the need to make all transaction history publicly available.

Users in Bitcoin are identified by public keys. Bitcoin maps a public key with a user only in the user's node and it allows users to issue as many public keys as wished. Users can also make use of third-party mixers (or laundry).

The interesting element of this paper is the description of the topological structure of two networks derived from Bitcoin's public transaction history: The transaction network and the user network. This is doable thanks to the transaction history being publicly available. After joining Bitcoin's peer-to-peer network, a client can freely request the entire history of Bitcoin transactions. This enables the possibility of performing passive identification analysis.

The authors of this paper studied BTC transactions from January 2009 to July 2011. 1019486 transactions and 12530564 public keys.

The authors of this paper are not aware of network structure studies of electronic currencies. However, this was done in a physical currency based on gift certificates named Tomamae-cho that existed in Japan during 3 months in 2004-2005.

The flow of that currency showed that the cumulative degree distribution followed a power-law distribution and the network showed small-world properties (high average clustering coefficient and low average path length.

Many papers maintain the difficulty to keep anonymity in networks in which user behaviour data is available. The main postulate of the authors of this paper is that Bitcoin does not anonymise user activity.

The transaction network
It represents the flow of BTCs between transactions over time. Each node represents a transaction and each directed edge represents an output of the transaction corresponding to the source node that is an input to the transaction corresponding to the target node. Each edge includes a timestamp and a BTC value.

There is no preferential attachment in this network.

The user network
It represents the flow of BTCs between users over time. Each node represents a user and each directed edge represents an input-output pair of a single transaction. Each directed edge also includes a timestamp and a BTC value.

As a user can use many different public keys, the authors of the paper construct an ancillary network in which each vertex represents a public key. They connect nodes with undirected edges where each edge joins a pair of public keys that are both inputs to the same transaction and then are controlled by the same user.

The contraction of public keys into users generates a network that is a proxy for the social network of BTC users.

Disassembling anonymity
A first source to decrease anonymity consists of integrating off-network information. Some BTC related organisations relate public keys with personally identifiable information. SOme BTC users disclose voluntarily  their public keys in fora.

Bitcoin public keys are strings with about 33 characters in length and starting with the digit one.


A second source of information is IP addressing. Unless they are using anonymising proxy technology such as Tor, it is relatively true that the first IP address informing of a transaction is the source of it.

A third source is based in egocentric analysis and visualisation e.g. WIkileaks published its public key to request donations. The analysis of transactions having as destination that particular public key can also provide input on identities.

A fourth source will be context discovery e.g. identifying nodes that correspond to BTC brokers.

These techniques help investigating BTC thefts. For example, a very quick transfer of BTCs between public keys (most of them not yet known to the network of already done transactions)  can be an indication to generate a theft hypothesis.

There are other analysis paths involving tainted BTCs, order books from BTC exchanges, client implementations, time analysis and the like.

Mitigation strategies
The official BTC client could be patched to prevent the linking of public keys with user information, a service that would use dummy public keys could be implemented (certainly, this would increase transaction fees). Even the BTC protocol could be modified to allow for BTC mixing at protocol level.

For the time being, the authors of this paper state that physical cash payments still represent a competitive and anonymous payment system.

The final statement from the authors of this paper: "Strong anonymity is not a prominent design goal of the BTC system".




Paper 3
Bitcoin: Economics, Technology and Governance by Rainer Boehme, Nicolas Christin, Benjamin Edelman and Tyler Moore

This paper defines Bitcoin (BTC) as an online communication protocol facilitating the use of a virtual currency. It states that BTc is the first widely adopted mechanism to provide absolute scarcity of a money supply. Inflation does not have a place in this system.

Public keys serve as account numbers. Every new transaction published to the BTC network is periodically grouped in a block of recent transactions. A new block is added to the chain of blocks every ten minutes.

In some cases, a transaction batch will be added to the block chain but then a few minutes later it will be altered because a majority of miners reached a different solution.

When listing a transaction, the buyer and the seller can also offer to pay a "transaction fee", normally 0.0001 which is a bonus payment to whatever miner solves the computationally difficult puzzle that verifies the transaction.

The paper reviews four key categories of intermediaries: Currency exchanges, digital wallet services, mixers and mining pools.

Currency exchanges
They exchange BTCs for traditional currencies or other virtual currencies. Most operate double auctions with bids and asks and charge a commission (from to 0.2. to 2 percent). Today BTC resembles more a payment platform rather than a real currency.

There are significant regulatory requirements (including expensive certification fees) to establish a exchange. In addition to that, they require considerable security measures. So the number of them is relatively limited.

Digital wallet services
They are data files that include BTC accounts, recorded transactions and keys necessary to spend or transfer the stored value. In practice, digital wallet services tend to increase centralisation (and online availability with high security requirements also).

The loss of a private key, if not backed up, would mean the loss of the possibility to trade with those owned (i.e. digitally signed) BTCs.

The entire blockchain reached 30GB in March 2015.

Mixers
Mixers ensure that timing does not yield clues about money flows. They let users pool sets of transactions in unpredictable combinations. Mixers charge 1 to 3 percent of the amount sent. Mixer protocols are usually not public.


Mining pools
BTCs are created when a miner solves a mathematical puzzle. Mining pools now combine resources from numerous miners. Oversized mining pools threaten the decentralisation that underpins BTC's trustworthiness.

Uses of Bitcoin
Initially it seems illicit activities use BTC given its openness and distributed nature. Every Bitcoin transaction must be copied into all future versions of the block chain. Updating the block chain entails an undesirable delay, making BTC too slow for many in-person retail payments.

Some scientists stress the importance of BTC for its ability to create a decentralised record of almost anything.

Risks in BTC
Market risk due to the fluctuation in the exchange rate between BTC and other currencies. It has also the shallow market problem: a person trading quickly a large amount would affect the market price.
Counterparty risk: Of the exchanges that closed (either due to a security breach or to low-volume business), 46% of them did not reimburse their customers after shutting down.

The BTC system offers no possibility to un-do a transaction, creating then transaction risk (and affecting end consumer protection).

There is certainly some operational risk coming from the technical infrastructure and the already mentioned 51 percent attack.

Finally, BTC faces also privacy, legal and regulatory risks.

Crime
Three types, BTC-specific crime, BTC-facilitated crime and money laundering.

Regulation
The authors suggest that longstanding reporting requirements can provide a level of compliance for virtual currencies similar to what has been achieved for traditional currencies. However, they recommend to consider regulations in the broader context of a global market for virtual currencies services.

Social science lab
Interestingly, most users treat their bitcoin investments as speculative assets rather than as means of payment.

Incentives
A so far theoretical concern: Larger blocks are less likely to win a block race than a smaller one.

Privacy and anonymity
Some authors claim that almost have of BTC users can be identified.

An open question posed by the authors
What happens if the BTC economy grows faster than the supply of bitcoins?

A final thought by the authors of this paper: BTC may be able to accommodate a community of experimentation built on its foundations.

Paper 4
Bitcoin-NG: A scalable blockchain protocol by Ittay Eyal, Adem Efe Gencer, Emin Gun Sirer, and Robert van Renesse (Cornell Univesity)

This paper proposes a new blockchain protocol designed to scale. Original bitcoin-derived blockchain protocols have inherent scalability limits. To improve efficiency, one has to trade off throughput for latency. BTC currently targets a conservative 10-minute slot between blocks, yielding 10 minute expected latencies for transactions to be encoded in the blockchain.

Bitcoin-NG achieves a performance improvement by decoupling Bitcoin's blockchain operation into two planes: leader election and transaction serialisation.

Some generic descriptions of the blockchain protocol
An output is spent if it is the input of another transaction. A client owns x Bitcoins at time t if the aggregate of unspent outputs to its address is x. The miners commit the transactions into a global append-only log called the blockchain.

Blockchain
The blockchain records transactions in units of blocks. A valid block contains a solution to a cryptopuzzle involving the hash of the previous block, the hash (the Merkle root) of the transactions in the current block, which have to be valid and a special transaction (the coinbase) crediting the miner with the reward for solving the cryptopuzzle. The cryptopuzzle is a double hash of the block header whose result has to be smaller than a set value. The difficulty of the problem, set by this value, is dynamically adjusted such that blocks are generated at an average rate of one every ten minutes.

Bitcoin-NG
It is a blockchain protocol that serialises transactions allowing for better latency and bandwidth than BTC.

The protocol divides into time epochs. In each epoch, a single leader is in charge of serialising state machine transitions. To facilitate state propagation, leaders generate blocks. The protocol introduces two types of blocks: key blocks for leader election and microblocks that contain the ledger entries.

Leader election is already taking place in BTC. But in BTC the leader is in charge of serialising history, making the entire duration of time between leader elections a long system freeze. Leader election in BTC-NG is forward-looking and ensures that the system is able to continually process transactions.

Resilience
Bitcoin-NG is resilient to selfish mining against attackers with less than 1/4 of the mining power.

Bitcoin-NG shows that it is possible to improve the scalability of blockchain protocols to the point where the network diameter limits consensus latency and the individual node processing power is the throughput bottleneck.  


Paper 5
A Protocol for Interledger Payments by Stefan Thomas and Evan Schwartz

This paper deals with the complexity to move money between different payment systems. The authors of the paper propose a way to connect different blockchain implementations. It uses ledger-provided escrow (conditional locking of funds) to allow secure payments through untrusted connectors.

This is a protocol for secure interledger payments across an arbitrary chain of ledgers and connectors. It uses ledger-provided escrow based on cryptographic conditions to remove the need to trust connectors between different ledgers. Payments can be as fast and cheap as the participating ledgers and connectors allow and transaction details are private to their participants.

The focus of this summary is not the deep description of this protocol but the introduction to the BAR (Byzantine, Altruistic, Rational model.

Byzantine actors may deviate from the protocol for any reason, ranging from technical failure to deliberate attempts to harm other parties or simply impede the protocol.

Altruistic actors follow the protocol exactly.

Rational actors are self-interested and will follow or deviate from the protocol to maximize their short and long-term benefits.

The authors of the paper assume that all actors in the payment are either Rational or Byzantine. Any participant in a payment may attempt to overload or defraud any other actors involved. Thus, escrow is needed to make secure interledger payments.

This protocol proposes two working modes: The atomic mode and the universal mode.

In the atomic mode, transfers are coordinated by a group of notaries that serve as the source of truth regarding the success or failure of the payment. The atomic mode only guarantees atomicity when notaries N act honestly. Rational actors can be incentivised to participate with a fee.

The universal mode relies on the incentives of rational participants to eliminate the need for external coordination.



Paper 6
A Next-Generation Smart Contract and Decentralized Application Platform from Ethereum's GitHub repository

This white paper presents a blockchain implementation alternative to BTC and, initiallly, more generic.It presents blockchain technology as a tool of distributed consensus. It is not only cryptocurrencies but also financial instruments, non-fungible assets such as domain names or any other digital asset being controlled by a script i.e. a piece of code implementing arbitrary rules (e.g. smart contracts).

Ethereum provides a blockchain with a built-in fully fledged Turing-complete programming language.

A recap on BTC
 
As already mention in the summary of Paper 1 in this post, BTC is a decentralised currency managing ownership through public key cryptography with a consensus algorithm named "proof of work". It achieves two main goals: It allows nodes in the network to collectively agree on the state of the BTC ledger and it allows free entry into the consensus process. How does it do this last point? By replacing the need to use a central register by an economic barrier.

The ledger of a cryptocurrency can be thought of as a state transition system. The "state" in BTC is the collection of all coins (unspent transaction outputs, UTXO) and their owners.

BTC decentralised consensus process requires nodes in the network to continuously attempt to produce packages of transactions called blocks. The network is intended to create a block every ten minutes. Each block contains a timestamp, a nonce, a hash of the previous block and a list of all transactions that took place in the previous block.

Requirement for the "proof of work": The double SHA256 hash of every block - a 256-bit number - must be less than a dynamically adjusted target (e.g. 2 to the power of 187).

The miner of every block is entitled to include a transaction giving themselves 25 BTC out of nowhere.

In the event of a malicious attacker, they will target the order of transactions, not protected by cryptography.

The rule is that in a fork the longest blockchain prevails. In order for an attacker to make his blockchain the longest, he would need to have more computational power than the rest of the network (51% attack).

Merkle Trees
A Merkle Tree is a type of binary tree. Each node is the hash of its two children. As hashes propagate upwards. This way, a client, by downloading the header of a block, would know whether the block has been tampered.

A "simplified payment verification" protocol allows for light nodes to exist. They download only block headers and branches related to their transactions.

Alternative blockchain applications


Namecoin: A decentralised name registration database.
Colored coins and metacoins: A customised digital currency on top of BTC.


Basic scripting
UTXO in BTC can be owned also by a script expressed in a simple stack-based programming language. However, this language has some drawbacks:

- Lack of Turing completeness. Loops are not supported.
- Value-blindness: UTXO are all or nothing.
- No opportunity to consider multi-stage contracts.
- UTXOs are blockchain-blind.

Ethereum builds an alternative framework with a built-in Turing-complete programming language.

Ethereum
An Ethereum account contains four fields: The nonce (a counter that guarantees that each transaction can only be processed once), the ether balance, the contract code and the account's storage.

Ether is the crypto-fuel of Ethereum. Externally owned accounts are controlled by private keys and contract accounts are controlled by their contract code.

Contracts are autonomous agents living inside the Ethereum execution environment. Contracts have the ability to send message to other contracts. A message is a transaction produced by a contract. A transaction refers to the signed data package that stores a message to be sent from an externally owned account. Each transaction sets a limit to how many computational steps of code execution it can use.

Ethereum is also based on blockchain. Ethereum blocks contain a copy of both the transaction and the most recent state.

Ethereum applications
Token systems, financial derivatives (financial contracts mostly require reference to an external price ticker), identity and reputation systems, decentralised file storage and decentralised autonomous organisations.

Other potential uses are saving wallets, a decentralised data feed, smart multisignature escrow, cloud computing, peer-to-peer gambling and prediction markets.

GHOST
Blockchains with fast confirmation times suffer from reduced security due to blocks taking a long time to propagate through the network. Ethereum implements a simplified version of GHOST (Greedy Heaviest Observed Subtree) which only goes down seven levels.

Currency issuance
Ether is released in a currency sale at the price of 1000-2000 ether per BTC. Ether has an endowment pool and a permanently growing linear supply.

The linear supply reduces the risk of an excessive wealth concentration and gives users a fair chance to acquire ether.

Mining
BTC mining is no longer a decentralised and egalitarian task. It requires high investments. Most BTC miners rely on a centralised mining pool to provide block headers. Ethereum will use a mining algorithm where miners are required to fetch random data from the state. This white paper states that this model is untested.
Ethereum full nodes need to store just the state instead of the entire blockchain history. Every miner will be forced to be a full node, creating a lower bound on the number of full nodes and an intermediate state tree root after processing each transaction will be included in the blockchain.

The question now is ... will it work?




Article 1
Technology: Banks Seek the Key to Blockchain by Jane Wild, Martin Arnold and Philip Stafford - FT.com 

This FT.com article on blockchain can be found here. The authors mention an internal blockchain implementation and remember that a blockchain is a shared database technology that connects consumers and suppliers creating online networks with no need for middlemen or a central authority. Applications are endless and supporters claim that trust is created by the participating parties.

The authors of this article mention the use of blockchain, also named distributed ledger, as a back-office new implementation and even new governmental implementations such as land registers. 

There are two types of blockchains in terms of accessibility: invitation-only (private) and public (open). UBS and Microsoft are working with blockchain start-up Ethereum (running  an open source technology). Other banks are going the private blockchain way.

The authors of this article also mention that this technology has in front of it key challenges such as robustness, security, regulation.

The ledger of BTC weighs already more than 45 GB.

Bits and pieces