Mining for Science

One of the most debated aspects of Bitcoin is the use of the Proof-of-Work method to achieve consensus. Critics point to Bitcoin inefficiency and unbounded waste of resources. Other methods to achieve consensus are known, such as Byzantine-fault-tolerant protocols. Even if such alternatives may be more efficient, there are many practical problems that need to be solved in order to use them for a network like Bitcoin, where users constantly join and leave. Bitcoin’s proof-of-work block chain success is evidence of the system resilience to attacks, scalability and adaptability to different scenarios. So instead of trying to be more efficient, maybe we could at least try to use the “wasted” CPU cycles for something more useful, like helping science by solving important problems. The difficulty arises in finding problems suitable for mining. The problems should meet the following properties:

1) We are able to create problems without the knowledge of their solution.
2) The average time for resolvability can be estimated, so that we can efficiently create problems of a certain difficulty.
3) The created problems are useful to humanity (or at least useful for the network community).
4) The space required to write the solution is bounded in advance.
5) A solution can be efficiently verified.
6) There is always a solution or the nonexistence of a solution can be efficiently checked.
7) There are no ¨weak¨ problems.

The last property says the chosen problems should be all the same difficulty and there should not be a way to pick easy problems at will.

We must leave out problems that require input from any external source, such as the Wikipedia or a genome database online, because the network would become dependent on such centralized resource. Nevertheless we could use problems whose inputs are taken from a database (genetics, biology or chemistry) distributed with the client application. As a drawback, the finite size of such database may limit the number of distinct problems the system can create, rendering the mining process as wasteful as hash mining after some time.
A problem instance could be randomly chosen by each miner only if the average time required to solve any specially crafted instance is not less than any other (in other case, the last property would be violated). Other possibility is that the problem instance be derived from the hash of the block header. Still other possibility is that the instances could be enumerated deterministically so each instance could be derived from the previous instance already solved.

The main problem of embedding natural science-related problems in the block chain is that problems generally involve finding near optimal solutions, for which optimality cannot be efficiently tested. On the contrary, math problems (including many in computer science and cryptography) may be formulated to allow easy checking for optimality or equality.

Mining for Wang protosets

Ten years ago I worked to prove that no aperiodic Wang protoset exists of 7 tiles or less. I did it by brute force (but clever) enumeration of protosets and elaborate checking algorithms. The checking algorithms may check forever (the problem is not computable) but luckily they did not! So the question if there exists an aperiodic Wang protoset of a certain size grater than 7 can be approximated and still be useful: the search for Wang protosets that are non-periodic up to a certain period length (n,n). Let’s call such a set “almost” aperiodic of degree n. After such a set is found, scientists can analyze the protoset in depth to check for aperiodicity with other methods and discover the aperiodicity pattern.
Is easy to check if certain Wang protoset is almost aperiodic of degree 40 or even more, but very difficult to find such sets. Mining for Wang protosets is not an ideal solution. For example, because almost aperiodic protosets tend to cluster during enumeration, enumeration itself should be randomized such that each instance is created as the output of a CSPRNG seeded by the the hash of of previous block, concatenated with a random number. But some of the CPU resources will be used for the CSPRNG itself, instead of the search. Other problem is that it’s difficult to increase difficulty of the problem in a measured steps: adding another tile or another color to the protoset may increase the difficulty in an unexpected way.

Mining for Algorithm Discovery

An interesting search would be to help discover a new algorithm for solving a certain known problem. Some time ago I created a machine language (the opcodes) specifically designed to implement the 4 most common sorting algorithms (bubble-sort, Quick-sort, heap-sort and insertion-sort) in less than 30 instructions each. Each resulting sorting program binary code size was less than 120 bits or 15 bytes! Turing proved that no program can decide that an another program given as input is a correct sorting algorithm. But what if we limit the number of instructions to execute and memory registers allowed to access? Then we choose a random algorithm (a 120-bit string) and test it with random vectors of increasing size as inputs. If an output is not a sorted vector, we abort. Such limits can be set to let the ¨almost correct¨ and ¨efficient¨ programs pass all tests. What is the probability that a program that correctly sort a hundred random vectors of a hundred of values each be incorrect? The main problem with algorithm discovery is that there may be too few sorting algorithms in the search space to allow a predictable rate of discovery. But there is one way we can still benefit from the proof-of-work block chain even for problems with unpredictable rate of discovery, and that is competition embedding. And it can work for many distributed computing efforts.

Automatic embedding of Competitions and Prizes in the Block Chain

If you think that hash based mining (as Bitcoin is) is much better than science problem mining, then at least one can include in the protocol incentives to solve great mathematical questions that can be efficiently checked. The rationale is that if issued money has to be distributed somehow, giving scientist some of it is fair and of great importance to the human kind (disclaimer: I consider myself a a scientist). One evident advantage over traditional math prices (e.g Millenium prize Problems) is that block chain embedded prizes are much cheaper (the prize is created from thin air). For example, one can insert problems of increasing difficulty between standard mining blocks (say one math problem every month) with prices payed with some fixed newly issued coins. If you are worried if the prize will become too high or too low over time, then you can make the prize be a function of the average fee paid in the last n mined blocks. Fees, in the long term, are supposed to be correlated with the electricity cost. If you fear inflation, then it can be controlled with exponential decrease in the prize or exponential separation of competitions. One interesting example is to embed a competition to find the discrete logarithm of a number (in some Field). The number could be chosen pseudo-randomly with the last block hash as seed, in a way algorithmically specified in the client applications. In this way the prices will promote advances in cryptology. If you pay prizes for breaking the same cryptographic algorithms used in the protocol itself (say ECDSA) with smaller key-sizes, then you network users will know the state of the art of breaking systems for free as more prizes get awarded. If you allow different key-lengths for ECDSA, then you allow users to switch to stronger cryptosystems as cryptosystems of smaller key-size are broken. But if you allow signatures of arbitrary key-sizes concurrently you should use a fee scheme where stronger algorithms pay higher fees to be verified than weaker ones. Another possibility is to enable the stronger algorithms automatically when others became broken by the embedded competition. I want to make clear that competitions happen without user intervention and without the control of any central authority. They are embedded in the block chain by design.

To claim a prize, a user has to follow a protocol to prevent other users from claiming the same prize after seeing the solution. Two special transactions have to be sent, the first immediately and the second a few days afterwards. The first transaction contains a commitment to the solution concatenated with the public key of an account to deposit the prize (Commit(Solution+Pubkey)). After the transaction is included in a block, the user waits some additional blocks to be built and then, in a new special transaction, opens the commitment. Clients check both transactions against the competition, and vuala! the prize is awarded.

Conclusion

The proof-of-work algorithm chosen for the block chain of a Bitcoin-like network is a fascinating design topic, and is far from being optimally solved, at least from the humanitarian side that arises when we consider the CPU resource waste Bitcoin network entails (which is the destruction of wealth). I’m optimistic that in the future we will build distributed systems more efficient, more ecological and more fair.

This entry was posted on December 31, 2011, 4:05 am and is filed under Uncategorized. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

	SDLerner on A New Mystery in Patoshi …
	TechMiX on The Patoshi Mining Machin…
	SDLerner on A New Mystery in Patoshi …
	Jeremy Davis on A New Mystery in Patoshi …
	TechMiX on A New Mystery in Patoshi …

Bitslog