WEBVTT

00:00:01.090 --> 00:00:03.460
The following content is
provided under a Creative

00:00:03.460 --> 00:00:04.850
Commons license.

00:00:04.850 --> 00:00:07.060
Your support will help
MIT OpenCourseWare

00:00:07.060 --> 00:00:11.150
continue to offer high quality
educational resources for free.

00:00:11.150 --> 00:00:13.690
To make a donation or to
view additional materials

00:00:13.690 --> 00:00:17.650
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.650 --> 00:00:18.550
at ocw.mit.edu.

00:00:22.520 --> 00:00:25.220
TADGE DRYJA: Today we're going
to talk about synchronization--

00:00:25.220 --> 00:00:27.950
how all these different
nodes on the network

00:00:27.950 --> 00:00:31.790
come to consensus,
how they link up.

00:00:31.790 --> 00:00:33.540
So it's sort of bringing
it all together.

00:00:33.540 --> 00:00:36.110
So so far in these
lectures we've

00:00:36.110 --> 00:00:40.910
talked about signatures,
mining and blocks, transactions

00:00:40.910 --> 00:00:42.020
and scripts.

00:00:42.020 --> 00:00:43.987
And now we're going to
put it all together.

00:00:43.987 --> 00:00:45.320
How does this all actually work?

00:00:45.320 --> 00:00:47.730
What do all these components
come together to do?

00:00:47.730 --> 00:00:51.890
And how does this
make this cool money?

00:00:51.890 --> 00:00:54.860
Quick recap on
signatures that I think

00:00:54.860 --> 00:00:57.170
you got the idea
from the homeworks,

00:00:57.170 --> 00:01:00.980
from the lectures-- you have
these public and private keys.

00:01:00.980 --> 00:01:03.500
This key pair where you
generate the private key,

00:01:03.500 --> 00:01:05.752
you distribute the public key.

00:01:05.752 --> 00:01:08.300
The private key
can sign a message.

00:01:08.300 --> 00:01:14.240
And anyone can verify, given
the triple public key, message,

00:01:14.240 --> 00:01:16.082
signature group.

00:01:16.082 --> 00:01:17.540
This is useful for
lots of things--

00:01:17.540 --> 00:01:20.390
providing identity, ownership,
all sorts of things.

00:01:20.390 --> 00:01:22.250
And it's better than
paper signatures.

00:01:22.250 --> 00:01:26.120
Paper signatures don't
really sign the message.

00:01:26.120 --> 00:01:28.730
They just sort of
sign the paper.

00:01:28.730 --> 00:01:32.090
So if you change some
part of a document,

00:01:32.090 --> 00:01:33.980
the signature still
sort of looks OK.

00:01:33.980 --> 00:01:35.090
Maybe you can see--

00:01:35.090 --> 00:01:37.850
you want the paper
to not be tampered.

00:01:37.850 --> 00:01:40.520
But you can sign a
blank piece of paper

00:01:40.520 --> 00:01:43.160
and then add all this
text on afterwards.

00:01:43.160 --> 00:01:46.380
You can't do that
with these systems.

00:01:46.380 --> 00:01:47.890
So signatures are really cool.

00:01:47.890 --> 00:01:53.760
And they are sort of a necessary
thing for this network to work.

00:01:53.760 --> 00:01:54.970
So mining and blocks--

00:01:54.970 --> 00:01:57.880
I think you sort of got the idea
with the Lamport signatures.

00:01:57.880 --> 00:02:00.880
If you've looked at the
current problem set,

00:02:00.880 --> 00:02:03.190
it makes a lot of sense.

00:02:03.190 --> 00:02:06.560
You change it nonce, hash a
bunch of times while you change

00:02:06.560 --> 00:02:07.690
a nonce a bunch of times.

00:02:07.690 --> 00:02:09.560
Try to get a low output.

00:02:09.560 --> 00:02:11.200
So you get the idea of mining.

00:02:11.200 --> 00:02:14.920
You're proving work
by repeatedly going

00:02:14.920 --> 00:02:18.190
through different nonces, trying
to find a certain hash output

00:02:18.190 --> 00:02:19.990
that there's no shortcut to.

00:02:19.990 --> 00:02:22.240
You just have the
guess and check.

00:02:22.240 --> 00:02:24.340
And now if you include
the previous data

00:02:24.340 --> 00:02:27.070
as part of your input
to that hash function,

00:02:27.070 --> 00:02:29.500
you can make a chain of work.

00:02:29.500 --> 00:02:33.220
And that's what we call
a blockchain I guess.

00:02:33.220 --> 00:02:34.955
Any questions on this so far?

00:02:34.955 --> 00:02:39.120
It's the basic idea
from two lectures ago.

00:02:39.120 --> 00:02:40.880
Cool.

00:02:40.880 --> 00:02:44.815
And a recap of what
Neha said yesterday--

00:02:44.815 --> 00:02:46.190
there's transactions
and scripts.

00:02:46.190 --> 00:02:48.800
I'm not going to go
into scripts too much

00:02:48.800 --> 00:02:53.870
yet, because in practice,
99% of the scripts

00:02:53.870 --> 00:02:56.810
are just checking a
public key signature.

00:02:56.810 --> 00:03:01.160
You can do all sorts of other
crazy things with the scripts.

00:03:01.160 --> 00:03:04.010
But almost nobody does yet.

00:03:04.010 --> 00:03:06.860
Basically they say, OK, I'm
sending to a public key hash.

00:03:06.860 --> 00:03:09.290
When I spend from it, I
reveal the public key,

00:03:09.290 --> 00:03:12.892
check that the hash
matches, check a signature.

00:03:12.892 --> 00:03:14.600
And transactions have
inputs and outputs.

00:03:14.600 --> 00:03:15.860
We went over this yesterday.

00:03:15.860 --> 00:03:22.120
Just a sort of
real-world numbers,

00:03:22.120 --> 00:03:25.420
this is how big
things are, "ish,"

00:03:25.420 --> 00:03:28.810
so a transaction ID
and index to point

00:03:28.810 --> 00:03:30.280
to a previous transaction.

00:03:30.280 --> 00:03:32.470
The transaction ID is 32 bytes.

00:03:32.470 --> 00:03:36.677
The index is encoded as a
four byte, so 32-bit integer,

00:03:36.677 --> 00:03:39.010
which is kind of overkill,
because I think the most that

00:03:39.010 --> 00:03:39.945
there's ever--

00:03:39.945 --> 00:03:41.620
the most outputs
there's ever been

00:03:41.620 --> 00:03:43.480
is like 1,000 or something.

00:03:43.480 --> 00:03:47.440
So you really don't need it to
be able to go up to 4 billion.

00:03:47.440 --> 00:03:49.450
But that's how it is.

00:03:49.450 --> 00:03:51.580
The signature ends up
being about 100 bytes

00:03:51.580 --> 00:03:55.000
because you have to provide
the public key, which is either

00:03:55.000 --> 00:03:57.190
33 or 65.

00:03:57.190 --> 00:03:59.530
And then the signature itself
is encoded to something

00:03:59.530 --> 00:04:01.360
like 70 bytes.

00:04:01.360 --> 00:04:04.040
Both of those things could
be much more efficient.

00:04:04.040 --> 00:04:06.462
But they're not yet.

00:04:06.462 --> 00:04:08.170
And the output is
actually a lot smaller.

00:04:08.170 --> 00:04:09.730
An output, the script--

00:04:09.730 --> 00:04:12.110
the main thing is
your public key hash,

00:04:12.110 --> 00:04:14.260
which in Bitcoin is 20 bytes.

00:04:14.260 --> 00:04:16.510
And then you have those
other little opcodes,

00:04:16.510 --> 00:04:18.339
which it adds a
few bytes, and then

00:04:18.339 --> 00:04:21.459
your amount, which is--
so all amounts in Bitcoin

00:04:21.459 --> 00:04:25.405
are encoded as 8-byte,
signed 64-bit integers.

00:04:25.405 --> 00:04:28.280
So you can have
pretty high precision.

00:04:28.280 --> 00:04:32.500
And that's also overkill, since
all the bitcoins ever mined,

00:04:32.500 --> 00:04:34.120
put together--

00:04:34.120 --> 00:04:39.100
if you added them all up, it's
nowhere near 64 bits to the 64.

00:04:39.100 --> 00:04:43.560
It's like 2 to the 40 something.

00:04:43.560 --> 00:04:46.180
So it's also kind of
overkill but yeah, whatever.

00:04:46.180 --> 00:04:50.500
So this ends up being for a one
input, one output transaction,

00:04:50.500 --> 00:04:51.640
less than 200 bytes.

00:04:54.300 --> 00:04:56.340
So that's a message,
pretty small.

00:04:56.340 --> 00:04:58.986
You can broadcast it
around the network.

00:04:58.986 --> 00:05:01.060
Inputs point to old
outputs, have signatures.

00:05:01.060 --> 00:05:03.970
Outputs have scripts
and coin amounts.

00:05:03.970 --> 00:05:05.630
So what do we do with
all these things?

00:05:05.630 --> 00:05:07.360
What is the mining process?

00:05:07.360 --> 00:05:10.558
So in the homework,
you're mining your name.

00:05:10.558 --> 00:05:12.100
You connect to the
server, figure out

00:05:12.100 --> 00:05:14.980
with the last block was, put
your name on, put a nonce,

00:05:14.980 --> 00:05:16.690
and continue to mine.

00:05:16.690 --> 00:05:18.940
That's not super useful,
unless you want to prove that

00:05:18.940 --> 00:05:21.940
you're-- hey, this is me.

00:05:21.940 --> 00:05:24.220
In Bitcoin, the
basic idea is users

00:05:24.220 --> 00:05:25.540
are making these transactions.

00:05:25.540 --> 00:05:28.520
Transactions are moving coins
from one place to another,

00:05:28.520 --> 00:05:30.820
from one key to another.

00:05:30.820 --> 00:05:34.240
They make the transactions, they
sign them, and they broadcast.

00:05:34.240 --> 00:05:36.860
I'll get in to what
"broadcast" means.

00:05:36.860 --> 00:05:39.250
So in the current
problem set, there's

00:05:39.250 --> 00:05:44.500
one server, which is not really
a robust distributed system,

00:05:44.500 --> 00:05:46.480
as people may have
seen yesterday

00:05:46.480 --> 00:05:52.390
from about 1:30 to 3:00 PM,
when the whole thing went down.

00:05:52.390 --> 00:05:55.480
In Bitcoin, it's
completely peer-to-peer.

00:05:55.480 --> 00:05:57.040
Every node is the same.

00:05:57.040 --> 00:05:59.180
They're all listening
for people connecting in.

00:05:59.180 --> 00:06:03.040
They're all connecting
out to other nodes.

00:06:03.040 --> 00:06:03.960
And they broadcast.

00:06:03.960 --> 00:06:05.410
So if someone sends
you a message,

00:06:05.410 --> 00:06:07.790
you'll pass it on to
all the other people.

00:06:07.790 --> 00:06:10.090
So it's called a gossip network.

00:06:10.090 --> 00:06:12.310
In practice, it works OK.

00:06:12.310 --> 00:06:15.190
It's fairly heavy load
on some network traffic,

00:06:15.190 --> 00:06:18.880
but you can make transactions,
sign them, broadcast them.

00:06:18.880 --> 00:06:22.190
And then someone,
the miner, takes

00:06:22.190 --> 00:06:24.590
all these recent
transactions that they've

00:06:24.590 --> 00:06:28.930
seen, and puts them into a
block, and then does some work.

00:06:28.930 --> 00:06:31.080
So those transactions
are now confirmed,

00:06:31.080 --> 00:06:34.540
and people can build
the next block.

00:06:34.540 --> 00:06:36.840
So the only difference from
the current problem set

00:06:36.840 --> 00:06:40.440
is instead of putting
your name in, you put--

00:06:40.440 --> 00:06:43.290
you can put your name-- but
you also put all these messages

00:06:43.290 --> 00:06:44.910
that you've seen recently.

00:06:44.910 --> 00:06:47.160
And so you commit
to them that way.

00:06:47.160 --> 00:06:49.930
You could do it by just
sticking them all in,

00:06:49.930 --> 00:06:54.780
but instead, there's a bit
more advanced way to do it.

00:06:54.780 --> 00:06:57.450
You use what's called
a block header.

00:06:57.450 --> 00:07:00.420
So yeah, the block header
itself is the message.

00:07:00.420 --> 00:07:03.090
So similar to the problem
set, it's the block header

00:07:03.090 --> 00:07:06.480
that you need to hash to get a
low output value, not the block

00:07:06.480 --> 00:07:09.040
itself, which is
kind of interesting.

00:07:09.040 --> 00:07:12.337
And the headers have a hash
of all the transactions

00:07:12.337 --> 00:07:12.920
in the blocks.

00:07:12.920 --> 00:07:14.730
So you don't just put
all the transactions

00:07:14.730 --> 00:07:17.550
into one big megabyte
data structure,

00:07:17.550 --> 00:07:19.860
hash the whole thing, and
then try to get a low output.

00:07:19.860 --> 00:07:22.465
You actually do some
intermediate steps first.

00:07:22.465 --> 00:07:24.090
And what's interesting
is it's actually

00:07:24.090 --> 00:07:26.040
just the headers that
make a chain, not

00:07:26.040 --> 00:07:27.580
the blocks themselves.

00:07:27.580 --> 00:07:30.440
So instead of blockchain, you
could call it a headerchain.

00:07:30.440 --> 00:07:33.260
So I'll talk about headers.

00:07:33.260 --> 00:07:34.950
The headers are 80 bytes.

00:07:34.950 --> 00:07:37.650
And they're actually quite
similar to the blocks

00:07:37.650 --> 00:07:38.900
in problem set 2.

00:07:38.900 --> 00:07:41.040
So the main three
components are you've

00:07:41.040 --> 00:07:43.590
got the previous hash, the
Merkle root, and the nonce.

00:07:43.590 --> 00:07:45.490
And so this is like
in the problem set.

00:07:45.490 --> 00:07:47.340
You start with
the previous hash.

00:07:47.340 --> 00:07:51.073
Then you have data that
you're actually committing to.

00:07:51.073 --> 00:07:52.740
And then you have
some data that doesn't

00:07:52.740 --> 00:07:55.960
have any actual meaning,
just to get the work done.

00:07:55.960 --> 00:08:00.390
So I can-- to reference, if
you look through the work

00:08:00.390 --> 00:08:03.780
people are doing right
now, this is all public,

00:08:03.780 --> 00:08:05.430
the current problem set.

00:08:05.430 --> 00:08:07.470
There's a previous
hash, which is basically

00:08:07.470 --> 00:08:09.760
the hash of the line above it.

00:08:09.760 --> 00:08:12.420
And then there's some data
you're committing to, in this

00:08:12.420 --> 00:08:17.040
case people's user names, and
then some non-meaningful data

00:08:17.040 --> 00:08:23.080
here with just random numbers
and stuff it looks like--

00:08:23.080 --> 00:08:24.800
so very similar.

00:08:24.800 --> 00:08:26.050
Any questions about this idea?

00:08:30.380 --> 00:08:30.880
Cool.

00:08:33.970 --> 00:08:38.070
So we use a Merkle root, which
I think I talked about last week

00:08:38.070 --> 00:08:41.289
Monday, instead of
just concatenating

00:08:41.289 --> 00:08:43.690
all the transactions and
hashing them in together.

00:08:43.690 --> 00:08:44.530
You could do that.

00:08:44.530 --> 00:08:45.572
That actually would work.

00:08:45.572 --> 00:08:47.565
It wouldn't make
a huge difference.

00:08:47.565 --> 00:08:48.940
But this is a
little nicer if you

00:08:48.940 --> 00:08:53.050
want to prove that a
transaction was in this block,

00:08:53.050 --> 00:08:55.840
without giving the whole block.

00:08:55.840 --> 00:08:59.080
So the idea is I
have these TXIDs.

00:08:59.080 --> 00:09:04.540
And a TXID, transaction ID, is
just a hash of the transaction.

00:09:04.540 --> 00:09:07.450
Stick all the components of
the transaction into bytes,

00:09:07.450 --> 00:09:09.670
hash that, and you've
got what's called a TXID.

00:09:09.670 --> 00:09:11.930
And that's how you
refer to transactions.

00:09:15.060 --> 00:09:18.820
You hash these two together to
get this intermediate point.

00:09:18.820 --> 00:09:20.980
Do the same thing
up to the root.

00:09:20.980 --> 00:09:25.420
And so you can't change any of
these little transaction IDs

00:09:25.420 --> 00:09:27.280
without changing
the Merkle root.

00:09:27.280 --> 00:09:30.250
So it commits to
all the transactions

00:09:30.250 --> 00:09:33.402
just the same way it would
if you just concatenated them

00:09:33.402 --> 00:09:34.610
all together and hashed that.

00:09:37.150 --> 00:09:42.400
So it really-- we'll go in
to, later, why this is useful.

00:09:42.400 --> 00:09:47.160
But in many cases, it really
doesn't help too much.

00:09:47.160 --> 00:09:49.370
Any getting questions
about Merkle, Merkle tree?

00:09:49.370 --> 00:09:51.360
Good.

00:09:51.360 --> 00:09:56.230
So the actual Bitcoin headers,
which many things use,

00:09:56.230 --> 00:09:57.340
has a couple of fields.

00:09:57.340 --> 00:10:00.850
Some of them are
actually not very useful.

00:10:00.850 --> 00:10:05.200
But the main two are previous
hash, Merkle root, nonce.

00:10:05.200 --> 00:10:07.460
And then there's-- I'll
talk about the other things.

00:10:07.460 --> 00:10:10.300
So there's also a version
field right at the beginning.

00:10:10.300 --> 00:10:12.000
It's 4 bytes.

00:10:12.000 --> 00:10:13.750
It indicates block version.

00:10:13.750 --> 00:10:17.530
It's not clear what that's going
to be used for in the future.

00:10:17.530 --> 00:10:20.970
It used to be used for sort
of signaling protocol changes.

00:10:20.970 --> 00:10:23.500
I'm not sure that's going to
be the case going forward,

00:10:23.500 --> 00:10:27.440
because it didn't really
work very well for that.

00:10:27.440 --> 00:10:30.970
So right now I think they all
start like 02 and then a bunch

00:10:30.970 --> 00:10:32.290
of zeros.

00:10:32.290 --> 00:10:36.110
And that's the current
version, whatever that means.

00:10:36.110 --> 00:10:40.420
And if you mine something
with a different version,

00:10:40.420 --> 00:10:41.960
everyone will accept it.

00:10:41.960 --> 00:10:43.960
But there'll be like these
warnings that show up

00:10:43.960 --> 00:10:48.520
in your Bitcoin log files that
say, warning, unknown version

00:10:48.520 --> 00:10:50.440
detected.

00:10:50.440 --> 00:10:53.710
The idea is maybe, well, if the
inversion increases or changes,

00:10:53.710 --> 00:10:57.580
maybe there's some new rules
in this system, or new opcodes,

00:10:57.580 --> 00:10:59.243
or new something going on.

00:10:59.243 --> 00:11:00.910
And you're not aware
of it, so you might

00:11:00.910 --> 00:11:03.280
need to upgrade your software.

00:11:03.280 --> 00:11:06.050
That was the idea anyway.

00:11:06.050 --> 00:11:07.718
In practice, what happens is--

00:11:07.718 --> 00:11:09.260
you'll see in your
logs all the time,

00:11:09.260 --> 00:11:11.360
that like, unknown
version detected.

00:11:11.360 --> 00:11:14.030
And it's just someone
just set random numbers

00:11:14.030 --> 00:11:15.440
in the version field.

00:11:15.440 --> 00:11:19.680
And it doesn't seem to mean
anything, so not super useful.

00:11:19.680 --> 00:11:21.690
Previous hash, just
like in the problem set,

00:11:21.690 --> 00:11:24.910
it's the hash of the
previous block, 32 bytes.

00:11:24.910 --> 00:11:27.700
Merkle root, as described
a few slides before--

00:11:27.700 --> 00:11:29.560
hash of all the
transactions in the block.

00:11:32.458 --> 00:11:35.830
Time, actually kind of complex--

00:11:35.830 --> 00:11:39.160
I'm not going to go into
the whole thing right here.

00:11:39.160 --> 00:11:41.590
So far we haven't really
talked about time.

00:11:41.590 --> 00:11:48.240
Does anyone know why we'd
want time in these headers?

00:11:51.200 --> 00:11:51.700
Yeah.

00:11:51.700 --> 00:11:53.158
AUDIENCE: You had
mentioned earlier

00:11:53.158 --> 00:11:56.670
that you don't accept blocks
between a certain interval

00:11:56.670 --> 00:11:58.458
if they were too late.

00:11:58.458 --> 00:11:59.250
TADGE DRYJA: Right.

00:11:59.250 --> 00:12:02.850
So it makes sense intuitively
that like if someone says, hey,

00:12:02.850 --> 00:12:06.180
I mined this in 1987.

00:12:06.180 --> 00:12:08.640
It's like well,
that seems crazy.

00:12:08.640 --> 00:12:10.350
Or if someone says,
here's a block.

00:12:10.350 --> 00:12:12.380
It came out in 2046.

00:12:12.380 --> 00:12:15.960
Like, this doesn't
make any sense.

00:12:15.960 --> 00:12:17.780
So intuitively,
yeah, you shouldn't

00:12:17.780 --> 00:12:20.000
accept things that have
some crazy date that's

00:12:20.000 --> 00:12:21.260
clearly wrong.

00:12:21.260 --> 00:12:22.130
But why?

00:12:22.130 --> 00:12:23.320
Why do we need time at all?

00:12:25.660 --> 00:12:26.160
Yeah.

00:12:26.160 --> 00:12:28.035
AUDIENCE: If you want
to lock the transaction

00:12:28.035 --> 00:12:29.830
until a certain time.

00:12:29.830 --> 00:12:32.960
TADGE DRYJA: Yeah, you could
say, here's a transaction,

00:12:32.960 --> 00:12:37.040
and I don't want it to
be valid before August 1.

00:12:37.040 --> 00:12:41.070
And so then, you could say,
if it goes into a block,

00:12:41.070 --> 00:12:44.570
and that block has a
timestamp before August 1,

00:12:44.570 --> 00:12:46.430
consider the block invalid.

00:12:46.430 --> 00:12:48.530
You could.

00:12:48.530 --> 00:12:51.980
You can also do timestamping
based just on block height.

00:12:51.980 --> 00:12:54.240
But what's the main-- does
anyone know the main reason

00:12:54.240 --> 00:12:55.340
to have height here?

00:12:55.340 --> 00:12:55.955
Yeah.

00:12:55.955 --> 00:12:57.372
AUDIENCE: Is it
because if they're

00:12:57.372 --> 00:12:58.920
a competing transaction?

00:12:58.920 --> 00:13:03.680
And then you would
pick one or the other.

00:13:03.680 --> 00:13:06.320
TADGE DRYJA: So the competing
transactions, when they get in,

00:13:06.320 --> 00:13:07.890
they sort of get into a block.

00:13:07.890 --> 00:13:09.925
And that sort of solves
that competition.

00:13:09.925 --> 00:13:11.300
So you have two
transactions that

00:13:11.300 --> 00:13:13.650
both are mutually exclusive.

00:13:13.650 --> 00:13:16.333
Well, if they're both in the
same Merkle root and both

00:13:16.333 --> 00:13:17.750
in the same block,
then that block

00:13:17.750 --> 00:13:19.790
is considered invalid,
because it's like,

00:13:19.790 --> 00:13:21.590
hey, you've given me a block.

00:13:21.590 --> 00:13:24.140
It's got two things that
can't both exist here.

00:13:24.140 --> 00:13:27.020
So throw the block away.

00:13:27.020 --> 00:13:30.520
If you find two blocks
that both seem to have--

00:13:30.520 --> 00:13:31.880
that sort of collide.

00:13:31.880 --> 00:13:33.350
They're both pointing to the--

00:13:33.350 --> 00:13:35.270
they've both got the
same previous hash.

00:13:35.270 --> 00:13:40.550
So they're both in the same
height, we call, of the chain.

00:13:40.550 --> 00:13:44.645
You could say, oh, well,
whichever one came out first.

00:13:44.645 --> 00:13:46.270
I'll look at the
timestamp and say, OK,

00:13:46.270 --> 00:13:49.780
the block that came out
first will be the valid one.

00:13:49.780 --> 00:13:53.680
But the problem is this
is claimed block creation.

00:13:53.680 --> 00:13:56.590
You can put whatever 4
bytes you want in there.

00:13:56.590 --> 00:13:58.570
And so you can always
say, oh, I just

00:13:58.570 --> 00:14:01.690
wanted it exactly 1 second
after the previous block.

00:14:01.690 --> 00:14:05.380
It just took me a
while to broadcast it.

00:14:05.380 --> 00:14:07.630
So you can't really
trust the timestamp

00:14:07.630 --> 00:14:09.460
to see which came first.

00:14:09.460 --> 00:14:12.010
If you could, you wouldn't need
all this crazy mining stuff.

00:14:12.010 --> 00:14:15.520
And transactions themselves
could just have a timestamp.

00:14:15.520 --> 00:14:17.860
And you wouldn't need
this whole structure.

00:14:17.860 --> 00:14:20.050
So the fundamental
reason you're mining

00:14:20.050 --> 00:14:24.643
is we can't trust people to
say when they did something.

00:14:24.643 --> 00:14:26.810
You can always say, no,
this transaction came first.

00:14:26.810 --> 00:14:28.450
No, this came first.

00:14:28.450 --> 00:14:30.160
So the real reason
for this blockchain

00:14:30.160 --> 00:14:33.442
is, OK, we know which
came before what.

00:14:33.442 --> 00:14:35.150
AUDIENCE: In practice,
which one happens?

00:14:35.150 --> 00:14:37.540
Do people just lie and say
it happened a second later?

00:14:37.540 --> 00:14:38.868
Or is it [INAUDIBLE]?

00:14:38.868 --> 00:14:40.660
TADGE DRYJA: Oh, in
practice the timestamps

00:14:40.660 --> 00:14:43.810
are pretty unreliable.

00:14:43.810 --> 00:14:48.330
They can be off by minutes.

00:14:48.330 --> 00:14:50.650
It can be before the
previous block's time.

00:14:50.650 --> 00:14:53.050
And that's OK.

00:14:53.050 --> 00:14:55.987
It seems intuitively like, well,
that should just be a rule.

00:14:55.987 --> 00:14:58.570
And it probably-- it would have
been cool if it was a rule and

00:14:58.570 --> 00:15:00.670
made things simpler
from the beginning.

00:15:00.670 --> 00:15:04.420
But if you're pointing
to a previous block,

00:15:04.420 --> 00:15:05.830
I'm building on top of it.

00:15:05.830 --> 00:15:09.870
And the previous block
came out at 10:15.

00:15:09.870 --> 00:15:13.600
And I set my timestamp
to 10:12, 3 minutes prior

00:15:13.600 --> 00:15:15.010
to the previous block.

00:15:15.010 --> 00:15:16.443
Logically, that's impossible.

00:15:16.443 --> 00:15:17.860
I'm referencing
something, and I'm

00:15:17.860 --> 00:15:19.480
saying I'm coming before it.

00:15:19.480 --> 00:15:22.002
But the software says that's OK.

00:15:22.002 --> 00:15:23.710
AUDIENCE: So if we're
creating a version,

00:15:23.710 --> 00:15:26.973
would it be useful to just
get rid of version and time,

00:15:26.973 --> 00:15:28.640
like if we're creating
a new blockchain?

00:15:28.640 --> 00:15:30.937
TADGE DRYJA: So version,
maybe you could get rid of,

00:15:30.937 --> 00:15:33.520
or you could put it somewhere
in the Merkle root or something.

00:15:33.520 --> 00:15:36.430
Time actually does have
a really useful purpose.

00:15:39.150 --> 00:15:41.690
Does anyone, maybe if you know?

00:15:41.690 --> 00:15:43.710
AUDIENCE: I don't
know, but does it

00:15:43.710 --> 00:15:45.820
play into the
difficulty of the mine?

00:15:45.820 --> 00:15:46.500
Does it?

00:15:46.500 --> 00:15:49.350
TADGE DRYJA: Yeah, so
the main reason for time

00:15:49.350 --> 00:15:53.820
here is to adjust
the difficulty.

00:15:53.820 --> 00:15:56.970
And that happens
every 2,016 blocks.

00:15:56.970 --> 00:15:59.910
You just look at, OK, how long
did this 2,016-block period

00:15:59.910 --> 00:16:02.790
take according to
these timestamps?

00:16:02.790 --> 00:16:05.080
And if it took two
weeks, OK, we're good.

00:16:05.080 --> 00:16:06.840
The difficulty doesn't
have to change.

00:16:06.840 --> 00:16:09.450
If it took three weeks,
that means the blocks

00:16:09.450 --> 00:16:11.070
were coming out very slowly.

00:16:11.070 --> 00:16:14.130
And we need to reduce
the difficulty.

00:16:14.130 --> 00:16:17.670
If the 2,016 blocks came
out in one week, that means,

00:16:17.670 --> 00:16:20.470
wow, people were
mining really fast.

00:16:20.470 --> 00:16:22.510
And so we need to
increase the difficulty.

00:16:22.510 --> 00:16:25.320
So there's this negative
feedback mechanism

00:16:25.320 --> 00:16:27.120
based on this time.

00:16:27.120 --> 00:16:29.430
And it can be tweaked.

00:16:29.430 --> 00:16:31.530
It's not accurate.

00:16:31.530 --> 00:16:34.180
You can have things
coming in the wrong order.

00:16:34.180 --> 00:16:36.080
The general rule of thumb--

00:16:36.080 --> 00:16:38.970
the rule in the software
is about two hours.

00:16:38.970 --> 00:16:41.310
If you see something
that's two hours off

00:16:41.310 --> 00:16:46.170
from what your internal clock
says, you will reject it.

00:16:46.170 --> 00:16:48.720
But that's a huge gap.

00:16:48.720 --> 00:16:52.350
Most network systems,
everyone's got their clocks

00:16:52.350 --> 00:16:57.000
to the same second at
least, or millisecond.

00:16:57.000 --> 00:17:00.670
Two hours is like
kind of enormous gaps.

00:17:00.670 --> 00:17:03.000
But the system works
OK, because you've

00:17:03.000 --> 00:17:06.359
got these really long-term
difficulty adjustments that

00:17:06.359 --> 00:17:09.810
only happen every 2,016 blocks,
which in practice is something

00:17:09.810 --> 00:17:11.619
like two weeks.

00:17:11.619 --> 00:17:15.089
So if someone gets
something a few minutes off,

00:17:15.089 --> 00:17:17.810
it doesn't really
affect things too much.

00:17:17.810 --> 00:17:22.079
And it's really only used for,
OK, look at the last 2,016

00:17:22.079 --> 00:17:25.380
blocks, two weeks-ish of
work, of all these blocks,

00:17:25.380 --> 00:17:28.410
and see how fast we
need to make things.

00:17:28.410 --> 00:17:33.270
So that ties into the next
field, which is difficulty.

00:17:33.270 --> 00:17:35.340
It's in a sort of weird
floating pointlike

00:17:35.340 --> 00:17:39.120
format with a mantissa
and exponent, which

00:17:39.120 --> 00:17:40.410
is totally custom.

00:17:40.410 --> 00:17:43.620
And you kind of have to write
your own code to deal with it.

00:17:43.620 --> 00:17:45.570
But it basically
says, OK, what does

00:17:45.570 --> 00:17:48.420
the number have to-- what does
the hash have to be below?

00:17:48.420 --> 00:17:50.790
It's not just number of bits.

00:17:50.790 --> 00:17:55.170
So in the problem set,
I said 33 bits of work.

00:17:55.170 --> 00:17:57.420
So that's fairly easy to
detect, because you just look

00:17:57.420 --> 00:17:59.910
for 33 0-bits in the front.

00:17:59.910 --> 00:18:02.040
In Bitcoin, it's not
just number of bits.

00:18:02.040 --> 00:18:04.478
It's actually a number
that it must be below.

00:18:04.478 --> 00:18:06.270
If it were just number
of bits, the problem

00:18:06.270 --> 00:18:10.260
then is your adjustments are
fairly coarse, because you can

00:18:10.260 --> 00:18:12.120
only adjust by a factor of 2.

00:18:12.120 --> 00:18:15.930
You can double your
difficulty or half it.

00:18:15.930 --> 00:18:19.590
But with this, you can have much
smaller difficulty adjustments

00:18:19.590 --> 00:18:20.940
of like a fraction of a percent.

00:18:23.840 --> 00:18:27.873
Yeah, this field is
pretty much useless

00:18:27.873 --> 00:18:29.540
since you can calculate
it from the time

00:18:29.540 --> 00:18:31.523
fields of the previous blocks.

00:18:31.523 --> 00:18:33.065
So you could just
have it be implied.

00:18:35.630 --> 00:18:36.530
But it's in there.

00:18:36.530 --> 00:18:37.790
And you can just whatever.

00:18:37.790 --> 00:18:39.910
It's an extra 4 bytes.

00:18:39.910 --> 00:18:42.410
I don't think you
actually-- like, you

00:18:42.410 --> 00:18:44.330
don't have to store it
on a disk if you want,

00:18:44.330 --> 00:18:47.893
because you can just figure
it out from the other things.

00:18:47.893 --> 00:18:52.972
AUDIENCE: Wouldn't you
need it for [INAUDIBLE]??

00:18:52.972 --> 00:18:55.430
TADGE DRYJA: No, because you
can figure out what difficulty

00:18:55.430 --> 00:18:56.472
is just from the headers.

00:18:58.833 --> 00:18:59.750
I mean, it's in there.

00:18:59.750 --> 00:19:01.670
I guess it's nice
if you just want

00:19:01.670 --> 00:19:06.880
to validate whether a single
header has enough work.

00:19:06.880 --> 00:19:10.202
But it's like, how much
work does it claim it needs?

00:19:10.202 --> 00:19:11.410
And then you can validate it.

00:19:11.410 --> 00:19:12.897
But I don't know.

00:19:12.897 --> 00:19:13.480
It's in there.

00:19:13.480 --> 00:19:16.067
It doesn't-- you could take
it out and reorganize the code

00:19:16.067 --> 00:19:17.650
a little if you
wanted to optimize it.

00:19:17.650 --> 00:19:20.580
But that would change so
much that no one bothers.

00:19:20.580 --> 00:19:23.430
AUDIENCE: So when we talk about
the adjusting difficulties

00:19:23.430 --> 00:19:27.960
and even just showing the
problem or proof of work, who

00:19:27.960 --> 00:19:30.180
[INAUDIBLE] for
the problems that

00:19:30.180 --> 00:19:33.260
will go to the central server?

00:19:33.260 --> 00:19:35.170
TADGE DRYJA: So in
this, it's just everyone

00:19:35.170 --> 00:19:36.700
broadcasts their blocks.

00:19:36.700 --> 00:19:41.140
So if you've received a block or
if you found a block yourself,

00:19:41.140 --> 00:19:44.620
you just send it to all your
peers that you're connected to.

00:19:44.620 --> 00:19:50.180
And so there's no like, oh,
this is the canonical block.

00:19:50.180 --> 00:19:52.180
There can be competing
blocks where you have two

00:19:52.180 --> 00:19:56.110
at the same time and just
stochastically, one of them

00:19:56.110 --> 00:19:59.110
will pull ahead,
because, well, randomly.

00:19:59.110 --> 00:20:01.780
So you can have
conflicting things.

00:20:01.780 --> 00:20:03.520
Yeah, and then the
adjustments-- also,

00:20:03.520 --> 00:20:06.030
everyone computes
the adjustments.

00:20:06.030 --> 00:20:08.590
And this is an actually
very quick computation,

00:20:08.590 --> 00:20:12.670
because you're just looking at--

00:20:12.670 --> 00:20:15.250
you're not even looking
at 2,016 timestamps.

00:20:15.250 --> 00:20:17.920
You're basically just
saying, OK, if height--

00:20:17.920 --> 00:20:20.080
so height is just what
block number it is.

00:20:20.080 --> 00:20:23.840
So if you're-- right now,
it's about 500 million.

00:20:23.840 --> 00:20:26.165
No, sorry, 500,000.

00:20:26.165 --> 00:20:27.540
So you basically
in the code just

00:20:27.540 --> 00:20:30.820
say, well, if
height modulo 2,016

00:20:30.820 --> 00:20:38.800
is equal to 0, check
height minus 2,016's block.

00:20:38.800 --> 00:20:40.570
Compare the two timestamps.

00:20:40.570 --> 00:20:41.950
Subtract them.

00:20:41.950 --> 00:20:43.810
Get a duration.

00:20:43.810 --> 00:20:46.810
And then compare that
duration to two weeks.

00:20:46.810 --> 00:20:49.580
And then change the
difficulty proportionally.

00:20:49.580 --> 00:20:51.980
So it's actually, like,
super quick for everyone

00:20:51.980 --> 00:20:53.230
to compute the new difficulty.

00:20:53.230 --> 00:20:56.120
And they only do it
once every two weeks.

00:20:56.120 --> 00:20:59.270
And it, yeah, it's
pretty straightforward.

00:20:59.270 --> 00:21:01.690
There are weird
attacks and stuff.

00:21:01.690 --> 00:21:05.200
And it's kind of some weird
off by 1 errors, where you're--

00:21:05.200 --> 00:21:06.000
I don't remember.

00:21:06.000 --> 00:21:07.420
Like, it's kind of confusing.

00:21:07.420 --> 00:21:10.150
It's also confusing because
the test network, which

00:21:10.150 --> 00:21:15.240
I haven't gone into but will
use probably in two weeks.

00:21:15.240 --> 00:21:19.390
There's a Bitcoin test network,
which operates pretty much

00:21:19.390 --> 00:21:21.790
exactly the same as
Bitcoin, except everyone

00:21:21.790 --> 00:21:24.790
agrees that the coins
are not worth any money.

00:21:24.790 --> 00:21:27.850
What's interesting is it's
actually called testnet3.

00:21:27.850 --> 00:21:31.540
The first two test networks
have the same setup.

00:21:31.540 --> 00:21:33.970
However, the agreement that
they were not worth any money

00:21:33.970 --> 00:21:36.700
broke down.

00:21:36.700 --> 00:21:38.530
So at testnet1,
someone said, hey,

00:21:38.530 --> 00:21:42.580
I'll pay you a bitcoin for
a million testnet coins.

00:21:42.580 --> 00:21:44.110
And once people
saw this happening,

00:21:44.110 --> 00:21:45.610
they said, oh, well,
you just ruined testnet.

00:21:45.610 --> 00:21:46.900
Now they're worth money.

00:21:46.900 --> 00:21:48.040
So we'll go to testnet2.

00:21:48.040 --> 00:21:49.220
It happened again.

00:21:49.220 --> 00:21:50.720
Testnet3 has had
some staying power.

00:21:50.720 --> 00:21:55.000
I think people realized that if
they try to buy testnet3 coins,

00:21:55.000 --> 00:21:58.365
everyone's going to
leave and go to testnet4.

00:21:58.365 --> 00:22:02.230
So it's kind of fun.

00:22:02.230 --> 00:22:05.080
I'd actually be OK with testnet3
coins being worth money,

00:22:05.080 --> 00:22:10.270
because I have many,
many thousands of them.

00:22:10.270 --> 00:22:18.470
But yeah, so one
difference, though,

00:22:18.470 --> 00:22:20.770
between the test networks
and the real network

00:22:20.770 --> 00:22:22.600
is the difficulty adjustments.

00:22:22.600 --> 00:22:26.080
So I think in the
first test network,

00:22:26.080 --> 00:22:28.390
it just worked
exactly like Bitcoin.

00:22:28.390 --> 00:22:31.600
But one of the problems
was people would mine,

00:22:31.600 --> 00:22:34.030
and the difficulty
would increase.

00:22:34.030 --> 00:22:36.250
And then people would
stop mining, say, oh, I'm

00:22:36.250 --> 00:22:38.590
going to test out
my mining software.

00:22:38.590 --> 00:22:41.095
I'll mine a couple
thousand blocks.

00:22:41.095 --> 00:22:42.970
Maybe it only takes me
a day or two to do so,

00:22:42.970 --> 00:22:45.760
because I have a very
fast computer compared

00:22:45.760 --> 00:22:47.052
to the rest of the network.

00:22:47.052 --> 00:22:48.760
And then I say, OK,
well, it works, cool.

00:22:48.760 --> 00:22:50.427
I'm going to go to
the real network now.

00:22:50.427 --> 00:22:51.750
And I leave the test network.

00:22:51.750 --> 00:22:53.440
And now the
difficulty increased,

00:22:53.440 --> 00:22:56.560
because let's say 2,000
or 4,000 blocks came out.

00:22:56.560 --> 00:22:59.410
And they came out very quickly,
so the difficulty went up.

00:22:59.410 --> 00:23:01.180
And then all the
mining power left.

00:23:01.180 --> 00:23:03.550
And so now blocks
aren't coming out.

00:23:03.550 --> 00:23:07.010
And since the adjustment
can be up or down

00:23:07.010 --> 00:23:12.190
but happens based on number
of blocks, not based on time,

00:23:12.190 --> 00:23:15.430
if you have a very high
difficulty and the very low

00:23:15.430 --> 00:23:18.100
hash rate relative
to that difficulty,

00:23:18.100 --> 00:23:21.100
it can take weeks
or months or years

00:23:21.100 --> 00:23:23.830
for the difficulty to reduce.

00:23:23.830 --> 00:23:28.090
So testnet3 put in this
sort of difficulty nerfing

00:23:28.090 --> 00:23:32.950
code, which is probably wrong
and not what they intended.

00:23:32.950 --> 00:23:36.040
And it has this thing where
like if 20 minutes have gone by,

00:23:36.040 --> 00:23:37.300
the difficulty lowers.

00:23:37.300 --> 00:23:40.540
And it's kind of ugly.

00:23:40.540 --> 00:23:45.183
So that's the main place
I've dealt with this field.

00:23:45.183 --> 00:23:47.350
One other rule with the
restriction-- the difficulty

00:23:47.350 --> 00:23:50.620
can go up by at
most a factor of 4

00:23:50.620 --> 00:23:53.770
and drop by at
most a factor of 4.

00:23:53.770 --> 00:23:57.850
So if you mine 2,016
blocks in one day,

00:23:57.850 --> 00:23:59.920
the difficulty goes
up 4x but does not

00:23:59.920 --> 00:24:05.750
go up 14x or whatever
the implied would.

00:24:05.750 --> 00:24:06.390
Any-- go ahead.

00:24:06.390 --> 00:24:08.170
AUDIENCE: So the
difficulty is definitely

00:24:08.170 --> 00:24:09.860
constant for two weeks then?

00:24:09.860 --> 00:24:11.080
TADGE DRYJA: Yeah.

00:24:11.080 --> 00:24:12.400
Well, sorry, not two weeks--

00:24:12.400 --> 00:24:16.920
2,016 blocks, which is generally
around two weeks, but yeah.

00:24:16.920 --> 00:24:18.600
AUDIENCE: So it
unblocks and blocks,

00:24:18.600 --> 00:24:20.100
within literally an
almost two-week period,

00:24:20.100 --> 00:24:21.380
that difficulty
would be the same.

00:24:21.380 --> 00:24:23.797
TADGE DRYJA: Yeah, so if you
actually look at the headers,

00:24:23.797 --> 00:24:24.905
this is just the constant.

00:24:24.905 --> 00:24:26.030
It just is always the same.

00:24:26.030 --> 00:24:27.905
So it's kind of a silly
field to be in there.

00:24:27.905 --> 00:24:31.475
You never need it, and
it's always the same.

00:24:31.475 --> 00:24:32.850
Any other questions
about-- yeah?

00:24:32.850 --> 00:24:35.382
AUDIENCE: How many transactions
are usually [INAUDIBLE]??

00:24:35.382 --> 00:24:36.840
TADGE DRYJA: Oh,
I'll get to that--

00:24:36.840 --> 00:24:40.170
right now, a couple
thousand, 4,000-ish.

00:24:40.170 --> 00:24:42.580
We'll get to that I think.

00:24:42.580 --> 00:24:43.830
But yeah, in the Merkle root--

00:24:43.830 --> 00:24:47.190
so the height of the
Merkle root's like 12-ish.

00:24:47.190 --> 00:24:51.450
And it goes out to maybe
4,000 transactions,

00:24:51.450 --> 00:24:53.820
sometimes more,
sometimes very few.

00:24:53.820 --> 00:24:55.560
You'll find empty
blocks that just

00:24:55.560 --> 00:24:58.990
have one transaction in them.

00:24:58.990 --> 00:25:02.190
And that transaction ID just
becomes the Merkle root,

00:25:02.190 --> 00:25:03.330
because a height--

00:25:03.330 --> 00:25:06.390
it's like a height-zero
Merkle tree,

00:25:06.390 --> 00:25:08.260
but yeah, something like that.

00:25:08.260 --> 00:25:11.350
And then last-- pretty easy--

00:25:11.350 --> 00:25:13.050
there's a nonce, 4 byte.

00:25:13.050 --> 00:25:14.580
Anything you want goes in there.

00:25:14.580 --> 00:25:18.510
You can think of it
as a you went 32,

00:25:18.510 --> 00:25:20.850
there's no meaning to it.

00:25:20.850 --> 00:25:26.450
So does anyone see a problem
with this nonce field?

00:25:26.450 --> 00:25:26.950
Yeah.

00:25:26.950 --> 00:25:27.992
AUDIENCE: It's too small.

00:25:27.992 --> 00:25:29.690
TADGE DRYJA: Yeah,
it's too small.

00:25:29.690 --> 00:25:33.520
4 bytes-- even in
the homework, people

00:25:33.520 --> 00:25:37.960
are using 12 something
bytes for a nonce.

00:25:37.960 --> 00:25:40.990
With only 4 bytes of
nonce, you can go through 2

00:25:40.990 --> 00:25:45.490
to the 32 possibilities,
which is not

00:25:45.490 --> 00:25:47.560
enough to mine in
almost all cases,

00:25:47.560 --> 00:25:50.290
because you're going to need
to go through 2 to the 70

00:25:50.290 --> 00:25:53.500
possibilities to find a block.

00:25:53.500 --> 00:25:57.350
So what are some ideas for how
do you deal with this problem?

00:25:57.350 --> 00:25:59.920
Like, it would be nice
if it was just 8 bytes.

00:25:59.920 --> 00:26:01.090
That'd make things simpler.

00:26:01.090 --> 00:26:02.800
But the system is what it is.

00:26:02.800 --> 00:26:04.210
It's very hard to change.

00:26:04.210 --> 00:26:08.992
How, as a miner, would you
work around this issue?

00:26:08.992 --> 00:26:10.950
AUDIENCE: Adjust the
version and time.

00:26:10.950 --> 00:26:12.825
TADGE DRYJA: Yeah, so
you can adjust version.

00:26:12.825 --> 00:26:16.000
So that may be why sometimes
weird version numbers come up.

00:26:16.000 --> 00:26:17.875
Time is a good one
too, since time--

00:26:21.610 --> 00:26:24.350
so yeah, adjust time
and also Merkle root.

00:26:24.350 --> 00:26:31.710
So time, if you're off by a
few seconds, nobody cares.

00:26:31.710 --> 00:26:36.600
So use the low bits of this time
field as part of your nonce.

00:26:36.600 --> 00:26:39.000
It's kind of in the wrong
place, but you can make chips

00:26:39.000 --> 00:26:40.290
to sort of fiddle--

00:26:40.290 --> 00:26:43.350
twiddle these bits as well.

00:26:43.350 --> 00:26:47.340
What's also nice is that
every second you can sort of--

00:26:47.340 --> 00:26:48.600
you can do it the wrong way.

00:26:48.600 --> 00:26:50.058
And you can say,
oh, I'm just going

00:26:50.058 --> 00:26:54.450
to take the least significant
4 bits of my time field

00:26:54.450 --> 00:26:56.740
and just use them as
nonce space randomly.

00:26:56.740 --> 00:27:00.720
What's nice is that the
actual time progresses by one

00:27:00.720 --> 00:27:02.970
bit every second.

00:27:02.970 --> 00:27:08.003
So as long as your
chip has enough space--

00:27:08.003 --> 00:27:09.670
so you're like, OK,
I've got 2 bit to 32

00:27:09.670 --> 00:27:14.300
here, another 4 bits here,
so I'm at 2 to the 36.

00:27:14.300 --> 00:27:17.660
If you're chip only goes
through 2 to the 36 hashes

00:27:17.660 --> 00:27:20.660
every second, you're good
because the actual time

00:27:20.660 --> 00:27:22.970
progresses.

00:27:22.970 --> 00:27:26.660
The other way you can do it
is modify the Merkle root.

00:27:26.660 --> 00:27:32.210
And you can do that-- so can
you think of ways to modify that

00:27:32.210 --> 00:27:34.682
without breaking things?

00:27:34.682 --> 00:27:36.450
AUDIENCE: Add or
drop the transaction.

00:27:36.450 --> 00:27:37.590
TADGE DRYJA: Yeah,
you could add or you

00:27:37.590 --> 00:27:38.580
could drop a transaction.

00:27:38.580 --> 00:27:40.497
So you say, OK, I have
all these transactions.

00:27:40.497 --> 00:27:41.612
I'm going to drop one.

00:27:41.612 --> 00:27:43.320
That's got some
disadvantages, because it

00:27:43.320 --> 00:27:45.147
may pay fees to the miner.

00:27:45.147 --> 00:27:46.355
AUDIENCE: Changing the order?

00:27:46.355 --> 00:27:47.855
TADGE DRYJA: Yep,
you can swap them.

00:27:50.550 --> 00:27:52.840
You can just say, OK, well,
these two are independent.

00:27:52.840 --> 00:27:54.030
I'm going to swap them.

00:27:54.030 --> 00:27:56.780
This will change,
which will change that.

00:27:56.780 --> 00:27:59.970
So you can swap
transactions around.

00:27:59.970 --> 00:28:03.120
You can also edit what's called
the Coinbase, which I think

00:28:03.120 --> 00:28:05.440
is in like one more slide.

00:28:05.440 --> 00:28:06.975
So yeah, so there's
a bunch of ways

00:28:06.975 --> 00:28:08.100
that you can change things.

00:28:08.100 --> 00:28:09.800
And so this is really
where you're going

00:28:09.800 --> 00:28:11.700
to have all the variation.

00:28:11.700 --> 00:28:15.140
You have 32-bit bytes here.

00:28:15.140 --> 00:28:16.803
And just even if it
was just swapping,

00:28:16.803 --> 00:28:18.470
and if you have 1,000
transactions, swap

00:28:18.470 --> 00:28:19.860
in whatever order you want--

00:28:19.860 --> 00:28:21.570
there's enough sort
of entropy there

00:28:21.570 --> 00:28:23.338
that you'll be able to find it.

00:28:23.338 --> 00:28:25.380
So what's interesting is
that the nonce is there.

00:28:25.380 --> 00:28:26.797
And it's important,
because that's

00:28:26.797 --> 00:28:31.290
where sort of the
high-speed mining occurs.

00:28:31.290 --> 00:28:35.610
But most mining chips will also
have circuitry to modify this,

00:28:35.610 --> 00:28:38.280
because they're operating
so quickly that they

00:28:38.280 --> 00:28:41.370
will exhaust the
4-byte nonce space

00:28:41.370 --> 00:28:43.390
in a fraction of a second.

00:28:43.390 --> 00:28:46.790
And so they'll have to
swap two transactions,

00:28:46.790 --> 00:28:51.803
recalculate a Merkle root, which
involves a few dozen hashes,

00:28:51.803 --> 00:28:52.720
and then go back here.

00:28:52.720 --> 00:28:54.970
So it actually doesn't hurt
their efficiency too much,

00:28:54.970 --> 00:28:58.980
because, OK, I just did 4
billion hash operations.

00:28:58.980 --> 00:29:01.110
And then I need to
do a few dozen more

00:29:01.110 --> 00:29:03.540
to get to the next 4 billion.

00:29:03.540 --> 00:29:05.710
So it doesn't hurt
things too much.

00:29:05.710 --> 00:29:09.350
Also, in Bitcoin-- a
sort of weird quirk--

00:29:09.350 --> 00:29:11.820
it's called SHA-256d.

00:29:11.820 --> 00:29:13.380
They do SHA-256.

00:29:13.380 --> 00:29:17.180
And then from the output,
they do a SHA-256 again,

00:29:17.180 --> 00:29:17.810
not sure why.

00:29:17.810 --> 00:29:22.767
I think in one person's
first problem set,

00:29:22.767 --> 00:29:24.600
they inadvertently were
doing the same thing

00:29:24.600 --> 00:29:27.710
and was like, yeah, that works.

00:29:27.710 --> 00:29:30.572
Satoshi, whoever he
or she was, or they,

00:29:30.572 --> 00:29:31.530
just put that in there.

00:29:31.530 --> 00:29:34.550
Any questions about this header
in this format and anything

00:29:34.550 --> 00:29:35.780
about it?

00:29:35.780 --> 00:29:36.770
It's pretty compact.

00:29:36.770 --> 00:29:37.580
It's 80 bytes.

00:29:41.244 --> 00:29:43.800
AUDIENCE: [INAUDIBLE]
including the Merkle root.

00:29:43.800 --> 00:29:45.838
So if you end up
mining something,

00:29:45.838 --> 00:29:47.380
you don't have to
put anything there.

00:29:47.380 --> 00:29:50.497
So the only incentive
is the transaction fees?

00:29:50.497 --> 00:29:52.330
TADGE DRYJA: In the
block reward, which I'll

00:29:52.330 --> 00:29:53.538
get to in a second, but yeah.

00:29:57.220 --> 00:29:58.390
That's a good question.

00:29:58.390 --> 00:30:00.710
Can you put nothing
in as a Merkle?

00:30:00.710 --> 00:30:01.430
I don't think so.

00:30:01.430 --> 00:30:02.770
I'm pretty sure you
need one transaction.

00:30:02.770 --> 00:30:03.380
AUDIENCE: Oh, that's right.

00:30:03.380 --> 00:30:05.220
I mean, you need
the base for the--

00:30:05.220 --> 00:30:08.120
because even if just did it, you
need a [INAUDIBLE] transaction.

00:30:08.120 --> 00:30:08.860
TADGE DRYJA: You can do that.

00:30:08.860 --> 00:30:09.860
And there's many blocks.

00:30:09.860 --> 00:30:12.730
So in the first
year or so in 2009,

00:30:12.730 --> 00:30:15.880
almost all the blocks are empty
and only have one transaction,

00:30:15.880 --> 00:30:17.080
because no one was using it.

00:30:17.080 --> 00:30:18.490
But people were mining.

00:30:18.490 --> 00:30:20.960
So it was very similar
to the problem set, where

00:30:20.960 --> 00:30:22.960
everyone's just mining,
and they're not actually

00:30:22.960 --> 00:30:23.668
using the system.

00:30:23.668 --> 00:30:25.570
AUDIENCE: Bu then right now--

00:30:25.570 --> 00:30:26.340
TADGE DRYJA: Right
now there's tons.

00:30:26.340 --> 00:30:27.700
AUDIENCE: People aren't doing
that just because there's

00:30:27.700 --> 00:30:28.480
transaction fees.

00:30:28.480 --> 00:30:29.140
TADGE DRYJA: Right now--

00:30:29.140 --> 00:30:29.950
AUDIENCE: It's their
only incentive?

00:30:29.950 --> 00:30:31.090
People just decided--

00:30:31.090 --> 00:30:33.100
TADGE DRYJA: No, you
still get more bitcoins.

00:30:33.100 --> 00:30:34.330
So you still get a reward.

00:30:34.330 --> 00:30:40.040
The reason you'll see empty
blocks now is a little tricky.

00:30:40.040 --> 00:30:42.440
We're not sure,
because who knows?

00:30:42.440 --> 00:30:47.410
But it's probably because of
blind mining, where you receive

00:30:47.410 --> 00:30:50.545
a block, and you
haven't actually

00:30:50.545 --> 00:30:52.420
looked through the
contents of the block yet.

00:30:52.420 --> 00:30:55.030
But you want to
mine the next block.

00:30:55.030 --> 00:30:57.820
But you're not sure what
transactions to put in.

00:30:57.820 --> 00:30:59.980
You see this 80-byte
header, and you're like, oh,

00:30:59.980 --> 00:31:01.440
someone find a block.

00:31:01.440 --> 00:31:03.010
But you only have
the header, and you

00:31:03.010 --> 00:31:04.135
want to build on top of it.

00:31:06.640 --> 00:31:08.890
You have a bunch of transactions
you'd like to put in,

00:31:08.890 --> 00:31:12.173
but they may have already been
put into the previous one.

00:31:12.173 --> 00:31:13.840
And so you're like,
well, I have no idea

00:31:13.840 --> 00:31:14.965
what's in the previous one.

00:31:14.965 --> 00:31:17.260
I'm just going to mine a
block with nothing in it,

00:31:17.260 --> 00:31:19.775
that way I'm sure that
I'm not going to conflict

00:31:19.775 --> 00:31:20.650
with my previous one.

00:31:23.590 --> 00:31:26.740
You'll often see a block with
only one transaction very soon

00:31:26.740 --> 00:31:28.510
after its predecessor block.

00:31:28.510 --> 00:31:32.053
AUDIENCE: So like every once
in a while, it would check.

00:31:32.053 --> 00:31:34.470
So you're saying if it happened
and went to a block really

00:31:34.470 --> 00:31:37.197
quickly, there's just more
likely to be more transactions.

00:31:37.197 --> 00:31:39.280
TADGE DRYJA: Right, right,
because a miner might--

00:31:39.280 --> 00:31:41.570
it's actually an optimal
strategy for a miner

00:31:41.570 --> 00:31:43.610
to say, look, first thing
I'm going do, download

00:31:43.610 --> 00:31:46.580
the 80-byte header.

00:31:46.580 --> 00:31:48.650
Figure out if it's got
a valid proof of work.

00:31:48.650 --> 00:31:50.990
If it does, I'm
going to just assume

00:31:50.990 --> 00:31:53.420
it's valid for the
next few seconds,

00:31:53.420 --> 00:31:56.000
because it probably is.

00:31:56.000 --> 00:31:59.430
And then I'm going to try to
mine a block on top of that.

00:31:59.430 --> 00:32:00.680
Reference this previous block.

00:32:00.680 --> 00:32:01.730
Mine a block on top.

00:32:01.730 --> 00:32:04.580
The thing is, I have no idea
what's actually in the block.

00:32:04.580 --> 00:32:07.040
I have no idea what contributes
to this Merkle root,

00:32:07.040 --> 00:32:09.860
because I haven't even
downloaded that data yet.

00:32:09.860 --> 00:32:12.890
But I can build on top of
it, just from the header.

00:32:12.890 --> 00:32:14.360
But I can't include
transactions,

00:32:14.360 --> 00:32:16.610
because I have no idea what
transactions are in here,

00:32:16.610 --> 00:32:18.110
so they might conflict.

00:32:18.110 --> 00:32:21.920
So I'll just mine sort of
blind for a second or two.

00:32:21.920 --> 00:32:26.960
And then download all the
transactions, validate it.

00:32:26.960 --> 00:32:28.850
And now I can include
my transactions

00:32:28.850 --> 00:32:30.970
that haven't been included.

00:32:30.970 --> 00:32:33.230
And so that happens.

00:32:33.230 --> 00:32:35.330
It can be an OK strategy.

00:32:35.330 --> 00:32:38.870
Sometimes it can lead to
you mining an invalid block.

00:32:38.870 --> 00:32:42.180
If someone produces an invalid
block, and you just see, oh,

00:32:42.180 --> 00:32:45.380
well, it's got proof of work,
you grab that header, start

00:32:45.380 --> 00:32:47.960
mining on top of it.

00:32:47.960 --> 00:32:49.480
No matter what you
mine, it's going

00:32:49.480 --> 00:32:52.460
to be invalid, because it's
pointing to an invalid block.

00:32:52.460 --> 00:32:56.130
That happened 2015 or 2016.

00:32:56.130 --> 00:32:58.187
The summer of 2015 it happened.

00:32:58.187 --> 00:32:59.520
And it was like quite extensive.

00:32:59.520 --> 00:33:00.930
It was like seven
or eight blocks

00:33:00.930 --> 00:33:03.870
in a row that were all invalid,
because none of the miners

00:33:03.870 --> 00:33:06.180
were actually
verifying anything.

00:33:06.180 --> 00:33:08.670
They were just downloading
the headers from each other

00:33:08.670 --> 00:33:12.720
and being like, yeah, I
mean, you did the work.

00:33:12.720 --> 00:33:14.910
So they were just
assuming everyone else

00:33:14.910 --> 00:33:17.360
was verifying the Merkle roots.

00:33:17.360 --> 00:33:20.170
And yeah, so then it ended--

00:33:20.170 --> 00:33:25.110
so they lost 25 times 8,
however many bitcoins.

00:33:25.110 --> 00:33:28.890
They lost hundreds of bitcoins,
which at the time was still

00:33:28.890 --> 00:33:32.820
worth quite a bit-- and now
it's worth millions of dollars--

00:33:32.820 --> 00:33:35.580
just because they weren't
actually checking things.

00:33:35.580 --> 00:33:37.840
AUDIENCE: If I mine a
block and it's empty,

00:33:37.840 --> 00:33:41.070
do I decrease my chances
of being mined afterwards?

00:33:41.070 --> 00:33:44.160
TADGE DRYJA: No, actually,
I would say you'd, sort

00:33:44.160 --> 00:33:49.690
of game theoretically, you would
increase it, because you're not

00:33:49.690 --> 00:33:51.680
depleting the mempool.

00:33:51.680 --> 00:33:53.410
I need to talk about
the actual mining

00:33:53.410 --> 00:33:54.860
coinbase and mempool and stuff.

00:33:54.860 --> 00:33:56.530
But yeah, it's a
tricky question.

00:33:56.530 --> 00:33:57.940
And I'll try to get back to it.

00:33:57.940 --> 00:34:00.050
If I don't, bug me again.

00:34:00.050 --> 00:34:01.840
TX-- so in this
Merkle root, you've

00:34:01.840 --> 00:34:03.160
got all these transactions.

00:34:03.160 --> 00:34:05.230
They have a specified order.

00:34:05.230 --> 00:34:08.050
Transaction 0 is the
coinbase transaction.

00:34:08.050 --> 00:34:09.219
And it's special.

00:34:09.219 --> 00:34:11.965
It generates new coins,
and it takes fees

00:34:11.965 --> 00:34:13.840
from all the other
transactions in the block.

00:34:13.840 --> 00:34:16.580
I think Neha mentioned
this yesterday,

00:34:16.580 --> 00:34:20.139
where if you have a difference
between the input amounts

00:34:20.139 --> 00:34:24.230
and output amounts,
that's implicitly a fee.

00:34:24.230 --> 00:34:26.090
So if your input--

00:34:26.090 --> 00:34:29.380
so here, Neha's thing,
you're spending 20 coins.

00:34:29.380 --> 00:34:31.389
You've got 5, 10, 4.

00:34:31.389 --> 00:34:34.060
Well, there's only 19
coins in the output,

00:34:34.060 --> 00:34:38.469
so there's an implicit
fee of 1 coin.

00:34:38.469 --> 00:34:42.820
That one coin can go
to transactions 0.

00:34:42.820 --> 00:34:47.150
So transaction 0 has
essentially no input.

00:34:47.150 --> 00:34:50.360
Transaction 0's input
field is just empty,

00:34:50.360 --> 00:34:53.030
anything you want to put, any
bytes you want to put in there.

00:34:53.030 --> 00:34:55.489
Its output field
generates new coins.

00:34:55.489 --> 00:34:57.590
So that's currently
12 and 1/2 coins.

00:34:57.590 --> 00:35:01.130
So currently, if you mine
a block with only TX0,

00:35:01.130 --> 00:35:02.540
you get 12 and 1/2 coins.

00:35:02.540 --> 00:35:04.820
If you mine a block with
thousands of transactions,

00:35:04.820 --> 00:35:08.092
you get the 12 and 1/2
coins plus the difference

00:35:08.092 --> 00:35:10.550
between the input and outputs
of all the other transactions

00:35:10.550 --> 00:35:13.490
in the block, which can
be even more than 12

00:35:13.490 --> 00:35:17.660
and-- which can recently,
like in January or December,

00:35:17.660 --> 00:35:21.240
there were quite a few blocks
where they're getting 25, 26,

00:35:21.240 --> 00:35:24.830
27 coins, because the total
fees for the entire block

00:35:24.830 --> 00:35:29.510
were more than 12 and 1/2 coins,
which is hundreds of thousands

00:35:29.510 --> 00:35:30.230
of dollars now.

00:35:30.230 --> 00:35:32.540
So it's kind of cool.

00:35:32.540 --> 00:35:34.130
The fees have since decreased.

00:35:34.130 --> 00:35:35.780
Fees are highly variable.

00:35:35.780 --> 00:35:38.930
And we'll talk about fees
in a few more lectures.

00:35:38.930 --> 00:35:42.800
And it's kind of a mess,
but it's an evolving area

00:35:42.800 --> 00:35:46.400
in this whole network thing.

00:35:46.400 --> 00:35:48.410
So you've got your
coinbase transaction.

00:35:48.410 --> 00:35:49.230
That's important.

00:35:49.230 --> 00:35:52.550
That's why people are doing this
stuff, because they want money.

00:35:52.550 --> 00:35:55.310
All the other transactions
can be shuffled around.

00:35:55.310 --> 00:35:58.430
However, they can
only spend outputs

00:35:58.430 --> 00:36:00.350
from previous transactions.

00:36:00.350 --> 00:36:02.540
And previous means
they have an index

00:36:02.540 --> 00:36:04.610
within the block that's lower.

00:36:04.610 --> 00:36:07.040
So for example, you
have transaction

00:36:07.040 --> 00:36:11.720
B spends an output of
transaction A. Transaction A

00:36:11.720 --> 00:36:14.840
must come first in
the block ordering.

00:36:14.840 --> 00:36:17.540
This makes it so that you can
go through in a linear fashion

00:36:17.540 --> 00:36:20.180
and validate every
transaction in order.

00:36:20.180 --> 00:36:23.153
Otherwise, you'd go through,
see, OK, transaction 0, I

00:36:23.153 --> 00:36:24.320
don't have to validate that.

00:36:24.320 --> 00:36:27.228
That's coinbase--
transaction 1, transaction 2.

00:36:27.228 --> 00:36:28.520
And then you see transaction 3.

00:36:28.520 --> 00:36:30.440
It appears to be
spending something

00:36:30.440 --> 00:36:32.640
you've never heard of.

00:36:32.640 --> 00:36:34.140
So that would at first--

00:36:34.140 --> 00:36:35.778
that would appear to be invalid.

00:36:35.778 --> 00:36:38.070
And then maybe you go through
another few transactions,

00:36:38.070 --> 00:36:42.510
say, oh, this creates the
output that this thing I just

00:36:42.510 --> 00:36:44.270
saw before spends.

00:36:44.270 --> 00:36:45.630
It's sort of out of order.

00:36:45.630 --> 00:36:47.820
It makes things very
complicated to validate.

00:36:47.820 --> 00:36:51.720
And so this rule ensures that
if you go through and just check

00:36:51.720 --> 00:36:55.360
every transaction in order,
it'll all make sense.

00:36:55.360 --> 00:36:55.860
Yes.

00:36:55.860 --> 00:36:58.547
AUDIENCE: Is there any benefit
moving earlier or later?

00:36:58.547 --> 00:36:59.630
TADGE DRYJA: In the block?

00:36:59.630 --> 00:37:01.160
AUDIENCE: Yeah.

00:37:01.160 --> 00:37:02.640
TADGE DRYJA: No,
I don't think so.

00:37:02.640 --> 00:37:06.430
I mean, I can't think of one.

00:37:06.430 --> 00:37:08.790
Yeah, it's just sort of random.

00:37:08.790 --> 00:37:11.010
A lot of times they'll
organize it by fee rate.

00:37:11.010 --> 00:37:13.350
Or by default, they'll
just organize it

00:37:13.350 --> 00:37:15.210
by when they saw them first.

00:37:15.210 --> 00:37:16.745
So it's pretty arbitrary.

00:37:16.745 --> 00:37:18.120
AUDIENCE: Does
that mean that you

00:37:18.120 --> 00:37:19.825
have to wait until
the transaction

00:37:19.825 --> 00:37:21.855
that the major output
has been mined,

00:37:21.855 --> 00:37:23.210
in order spend that again?

00:37:23.210 --> 00:37:26.190
TADGE DRYJA: No,
although if they did,

00:37:26.190 --> 00:37:28.800
that would make the software
a lot simpler and easier

00:37:28.800 --> 00:37:29.920
to deal with.

00:37:29.920 --> 00:37:31.850
But that is not how it works.

00:37:36.920 --> 00:37:43.400
So for example-- I'll draw
it-- you can have a block where

00:37:43.400 --> 00:37:45.350
there's-- let's do it this way.

00:37:45.350 --> 00:37:46.550
So you have TX0.

00:37:46.550 --> 00:37:51.230
There's coinbase transaction,
transaction 1, transaction 2,

00:37:51.230 --> 00:37:53.060
transaction 3.

00:37:53.060 --> 00:37:56.570
And transaction 3 may be
spending something that

00:37:56.570 --> 00:37:59.270
was generated in transaction 1.

00:37:59.270 --> 00:38:00.590
That can happen.

00:38:00.590 --> 00:38:05.300
So if you make transaction 1
broadcast it. it's unconfirmed.

00:38:05.300 --> 00:38:07.310
You then make transaction
3 broadcast it,

00:38:07.310 --> 00:38:09.150
spends transaction 1.

00:38:09.150 --> 00:38:10.850
The miner can put
those in the blocks.

00:38:10.850 --> 00:38:12.380
They must put it in order.

00:38:12.380 --> 00:38:16.310
So if this happens, you
can't switch them in order.

00:38:16.310 --> 00:38:18.020
But that's considered OK.

00:38:18.020 --> 00:38:21.680
It makes parallel-- it makes
multi-core validation more

00:38:21.680 --> 00:38:23.255
annoying.

00:38:23.255 --> 00:38:26.390
If you said, no, you can only
use outputs that have already

00:38:26.390 --> 00:38:29.840
been confirmed, then
the block validation

00:38:29.840 --> 00:38:32.240
becomes embarrassingly
parallel, because you can just

00:38:32.240 --> 00:38:34.955
validate every
transaction independently.

00:38:34.955 --> 00:38:36.080
That would be kind of nice.

00:38:38.756 --> 00:38:40.400
There's other
interesting reasons

00:38:40.400 --> 00:38:42.595
why this is also useful.

00:38:42.595 --> 00:38:43.970
I mean, if I were
designing it, I

00:38:43.970 --> 00:38:45.680
would say you have to
confirm, because it just

00:38:45.680 --> 00:38:46.555
makes things simpler.

00:38:46.555 --> 00:38:48.730
But I was not Satoshi.

00:38:48.730 --> 00:38:50.560
So yeah, the order
is fairly arbitrary.

00:38:50.560 --> 00:38:53.060
Any other questions about block
ordering, Merkle root stuff?

00:38:53.060 --> 00:38:55.102
And then we're going to
have a quick intermission

00:38:55.102 --> 00:38:56.234
right at the halfway point.

00:38:58.980 --> 00:39:04.400
Sounds good, so
256-second break.

00:39:04.400 --> 00:39:07.070
So now I'll talk about the
synchronization process.

00:39:07.070 --> 00:39:10.220
How does this actually
work in the software

00:39:10.220 --> 00:39:12.140
when you download Bitcoin?

00:39:12.140 --> 00:39:14.300
So first, you download Bitcoin.

00:39:14.300 --> 00:39:16.460
You go to bitcoin.org.

00:39:16.460 --> 00:39:19.080
of Your friend hands you
a USB drive and says,

00:39:19.080 --> 00:39:23.030
hey, I got some good stuff, man,
this new thing called Bitcoin.

00:39:23.030 --> 00:39:27.110
And so you've got the
Bitcoin EXE file or DMG file,

00:39:27.110 --> 00:39:28.702
or the binary, or the code.

00:39:28.702 --> 00:39:31.160
And you want to know what's
been going on for the last nine

00:39:31.160 --> 00:39:32.510
years?

00:39:32.510 --> 00:39:34.010
So first, you
download the binary,

00:39:34.010 --> 00:39:35.390
or you compile the code.

00:39:35.390 --> 00:39:39.757
And you verify all the GPG
signatures of this code,

00:39:39.757 --> 00:39:41.090
if you want to do this securely.

00:39:41.090 --> 00:39:47.120
So I'm sure everyone has their
PGP keys on the MIT PGP server

00:39:47.120 --> 00:39:51.770
and goes to key signing parties
held on the weekends, right?

00:39:51.770 --> 00:39:54.047
Yeah, no?

00:39:54.047 --> 00:39:55.130
AUDIENCE: Keybase is cool.

00:39:55.130 --> 00:39:55.520
Base

00:39:55.520 --> 00:39:57.270
TADGE DRYJA: Keybase
is also useful, yeah.

00:39:57.270 --> 00:40:00.950
So I have my PGP key
hash on my business card.

00:40:00.950 --> 00:40:04.460
I don't think anyone's
actually ever used it.

00:40:04.460 --> 00:40:06.920
But the Bitcoin nerds
actually do do this.

00:40:06.920 --> 00:40:12.680
And they're very sort
of annoying about it,

00:40:12.680 --> 00:40:15.740
because a really
good attack vector

00:40:15.740 --> 00:40:19.430
is to get someone to download
compromised Bitcoin code.

00:40:19.430 --> 00:40:21.290
It's like the best
attack vector ever

00:40:21.290 --> 00:40:25.130
if you're trying to do
something sneaky, mainly just

00:40:25.130 --> 00:40:26.240
to steal all the money.

00:40:26.240 --> 00:40:28.790
If you get them to download
a Bitcoin binary that you

00:40:28.790 --> 00:40:32.840
control, that you put
some backdoor code in,

00:40:32.840 --> 00:40:34.170
the code can be like two lines.

00:40:34.170 --> 00:40:38.240
It's like, open a TCP
connection to my computer.

00:40:38.240 --> 00:40:39.800
Send me all the private keys.

00:40:39.800 --> 00:40:40.790
We're good.

00:40:40.790 --> 00:40:43.790
Or if you want to be
more sophisticated,

00:40:43.790 --> 00:40:46.610
every time they click Send
and type in their password,

00:40:46.610 --> 00:40:50.060
I just change all the addresses
and all the outputs to me,

00:40:50.060 --> 00:40:51.910
and like, every time
they try to send money.

00:40:51.910 --> 00:40:55.080
To UI sort of shows that they're
sending money but actually just

00:40:55.080 --> 00:40:55.580
send to me.

00:40:55.580 --> 00:40:57.372
And they won't find
out for a little while.

00:40:57.372 --> 00:40:59.600
There's a lot of
things where you

00:40:59.600 --> 00:41:02.930
want to be running the
right Bitcoin code.

00:41:02.930 --> 00:41:04.700
And that's a hard problem.

00:41:04.700 --> 00:41:07.050
Because we're sort of
operating in this Trust List,

00:41:07.050 --> 00:41:09.590
decentralized
network, how do you

00:41:09.590 --> 00:41:11.968
get into this in the beginning?

00:41:11.968 --> 00:41:13.760
If it's your friend
and saying, hey, here's

00:41:13.760 --> 00:41:14.760
the Bitcoin I'm running.

00:41:14.760 --> 00:41:15.620
I know this is good.

00:41:15.620 --> 00:41:16.407
Then it works.

00:41:16.407 --> 00:41:18.740
But just a website-- what if
someone hacks the website--

00:41:18.740 --> 00:41:19.670
things like that?

00:41:19.670 --> 00:41:21.020
It's like a huge rabbit hole.

00:41:21.020 --> 00:41:25.040
And you can try to worry
about it for years.

00:41:25.040 --> 00:41:26.540
But anyway, you
download the binary,

00:41:26.540 --> 00:41:28.580
assume you've somehow
gotten the binary,

00:41:28.580 --> 00:41:31.280
and you're pretty sure
it's the right software.

00:41:31.280 --> 00:41:32.990
So how do you connect
to this network?

00:41:32.990 --> 00:41:36.230
Well, there are these
hardcoded DNS seeds in order

00:41:36.230 --> 00:41:37.670
to find peers in the beginning.

00:41:37.670 --> 00:41:40.640
If you know how DNS works, it's
how you look up IP addresses

00:41:40.640 --> 00:41:41.900
based on a hostname.

00:41:41.900 --> 00:41:44.990
There are some servers that will
return multiple different IP

00:41:44.990 --> 00:41:49.270
addresses every
time you query them.

00:41:49.270 --> 00:41:54.890
And those are IP addresses
of currently running

00:41:54.890 --> 00:41:56.042
Bitcoin nodes.

00:41:56.042 --> 00:41:58.250
So the idea is, OK, someone's
running a Bitcoin node.

00:41:58.250 --> 00:42:00.020
They've got their DNS server.

00:42:00.020 --> 00:42:02.570
You query that DNS server,
and it will hand you

00:42:02.570 --> 00:42:05.720
out some IP addresses.

00:42:05.720 --> 00:42:08.540
This is also sort
of centralized,

00:42:08.540 --> 00:42:12.500
slash trusted, slash whatever,
in that if someone compromises

00:42:12.500 --> 00:42:15.373
these four or five
DNS servers, you

00:42:15.373 --> 00:42:17.540
might not be able to connect
to the Bitcoin network.

00:42:17.540 --> 00:42:20.180
So in practice, it's not
completely mathematically

00:42:20.180 --> 00:42:21.382
secure in Trust List.

00:42:21.382 --> 00:42:22.840
There's all these
real-world issues

00:42:22.840 --> 00:42:25.280
that's like, how do I know
I've got the right software?

00:42:25.280 --> 00:42:28.080
How do I know I'm connecting
to the actual Bitcoin network?

00:42:28.080 --> 00:42:30.230
What if my ISP is
blocking me and sending me

00:42:30.230 --> 00:42:33.540
to some other network,
or things like that?

00:42:33.540 --> 00:42:36.350
So in practice, it sort
of works OK right now.

00:42:36.350 --> 00:42:37.910
You connect to the DNS seeds.

00:42:37.910 --> 00:42:40.310
And then you connect
to a Bitcoin node,

00:42:40.310 --> 00:42:41.420
and you ask for headers.

00:42:41.420 --> 00:42:45.140
You say, hey, I just showed up.

00:42:45.140 --> 00:42:46.490
I know about one header.

00:42:46.490 --> 00:42:49.700
There's a hardcoded header in
the code called the genesis

00:42:49.700 --> 00:42:52.520
block that Satoshi did.

00:42:52.520 --> 00:42:54.530
And you say, hey, I've
got this genesis block.

00:42:54.530 --> 00:42:58.130
Do you know anything that builds
above this genesis block, that

00:42:58.130 --> 00:42:59.480
comes after?

00:42:59.480 --> 00:43:02.588
And they say, yes, I
actually know 500,000 headers

00:43:02.588 --> 00:43:03.380
that come after it.

00:43:03.380 --> 00:43:05.420
And they'll start
sending it to you.

00:43:05.420 --> 00:43:07.500
They send it to you in
a couple of thousand

00:43:07.500 --> 00:43:09.710
of headers at a time.

00:43:09.710 --> 00:43:14.270
And then you start to download
all those and verify them.

00:43:14.270 --> 00:43:16.970
The header chain,
you get it first.

00:43:16.970 --> 00:43:18.500
And it's actually very quick.

00:43:18.500 --> 00:43:20.958
You can do it in under a minute
if you have a good internet

00:43:20.958 --> 00:43:21.590
connection.

00:43:21.590 --> 00:43:23.990
And you verify all the work
before you do anything else.

00:43:23.990 --> 00:43:27.200
So this is nice in that
the attacker, in order

00:43:27.200 --> 00:43:30.650
to sort of make you
do more work here,

00:43:30.650 --> 00:43:33.930
would have to do a
lot of proof of work.

00:43:33.930 --> 00:43:36.890
But for you, it's very
quick to verify everything.

00:43:36.890 --> 00:43:40.435
Even half a million
headers, 30 seconds

00:43:40.435 --> 00:43:42.560
if you've got a good internet
connection, something

00:43:42.560 --> 00:43:46.070
like that, because all you're
doing is one hash per header.

00:43:46.070 --> 00:43:48.710
You just download the
header, check the bits,

00:43:48.710 --> 00:43:50.360
check the time, make
sure the times are

00:43:50.360 --> 00:43:53.030
like progressing reasonably.

00:43:53.030 --> 00:43:55.270
If the times keep going
backwards for like,

00:43:55.270 --> 00:43:59.295
I think it's 10 blocks, then
you consider it invalid.

00:43:59.295 --> 00:44:00.920
But your computer
can actually do this.

00:44:00.920 --> 00:44:02.660
It's 500,000 hash functions.

00:44:02.660 --> 00:44:05.120
And I'm sure if you've
seen for the problem set,

00:44:05.120 --> 00:44:08.310
you can do that in a few
seconds in many cases.

00:44:08.310 --> 00:44:11.300
So you can verify the work
done throughout the entirety

00:44:11.300 --> 00:44:15.520
of the Bitcoin's
existence pretty quickly.

00:44:15.520 --> 00:44:17.920
So then you've got
500,000 headers.

00:44:17.920 --> 00:44:22.110
And now you need to actually
download the blocks.

00:44:22.110 --> 00:44:26.450
Any questions about
header synchronization?

00:44:26.450 --> 00:44:28.142
Seems pretty
straight-- oh, yeah.

00:44:28.142 --> 00:44:29.850
AUDIENCE: Can you
catch any of that work,

00:44:29.850 --> 00:44:33.800
since you're going to see some
of these every time you sync?

00:44:33.800 --> 00:44:35.820
TADGE DRYJA: Well, yeah,
you save it to disk.

00:44:35.820 --> 00:44:38.660
So you don't have to, like,
if you shut your computer off,

00:44:38.660 --> 00:44:40.430
turn it on the next
time, you've already

00:44:40.430 --> 00:44:41.680
got all those headers on disk.

00:44:41.680 --> 00:44:45.290
Basically, you save them to
disk once you've verified them.

00:44:45.290 --> 00:44:46.710
So you download a
couple thousand.

00:44:46.710 --> 00:44:48.377
It builds linearly,
so it's nice for you

00:44:48.377 --> 00:44:51.043
to like download them, validate,
and as you validate, write them

00:44:51.043 --> 00:44:51.560
to disk.

00:44:51.560 --> 00:44:53.810
And then when you start
backup, they're on disk.

00:44:53.810 --> 00:44:54.890
You trust your own disk.

00:44:54.890 --> 00:44:56.660
If someone goes in
and modifies things

00:44:56.660 --> 00:45:01.940
on disk between running of
Bitcoin, all bets are off.

00:45:01.940 --> 00:45:04.250
So you sort of
implicitly trust that.

00:45:04.250 --> 00:45:06.992
So yeah, that's pretty
quick, works well.

00:45:06.992 --> 00:45:08.450
Then you get to
the real hard part,

00:45:08.450 --> 00:45:10.640
where you now have to
validate all these signatures

00:45:10.640 --> 00:45:12.567
and download all
these transactions.

00:45:12.567 --> 00:45:13.400
Any other questions?

00:45:13.400 --> 00:45:15.500
Good?

00:45:15.500 --> 00:45:19.152
So then it's called IBD,
initial block download.

00:45:19.152 --> 00:45:20.360
So you get the headers first.

00:45:20.360 --> 00:45:21.290
That's quick.

00:45:21.290 --> 00:45:24.350
Now you start asking
your peers, hey, here's

00:45:24.350 --> 00:45:29.510
this header from
2009, block height 1.

00:45:29.510 --> 00:45:30.260
Here's the header.

00:45:30.260 --> 00:45:33.260
Can you give me the full block?

00:45:33.260 --> 00:45:34.250
I have the header.

00:45:34.250 --> 00:45:37.450
What are all the things that
go into the Merkle root?

00:45:37.450 --> 00:45:39.950
So you request
blocks from peers.

00:45:39.950 --> 00:45:43.010
You match the transaction lists,
the Merkle root and the header.

00:45:43.010 --> 00:45:44.900
And you process each
transaction in order.

00:45:44.900 --> 00:45:46.490
So download it.

00:45:46.490 --> 00:45:48.210
Say, OK, here's all
these transactions.

00:45:48.210 --> 00:45:50.180
Let me take the
hash of all of them.

00:45:50.180 --> 00:45:51.860
Compute the Merkle root.

00:45:51.860 --> 00:45:53.840
Make sure it matches
the Merkle root

00:45:53.840 --> 00:45:56.540
I see in the header
I've already gotten.

00:45:56.540 --> 00:45:59.340
And now process
each transaction.

00:45:59.340 --> 00:46:02.950
So what do we do to
process transactions?

00:46:02.950 --> 00:46:06.250
So you've got this UTXO DB.

00:46:06.250 --> 00:46:10.220
So this is unspent
transaction output.

00:46:10.220 --> 00:46:11.770
So all the cool--

00:46:11.770 --> 00:46:14.980
I'm sure in like 2030, there
will be a new slang term

00:46:14.980 --> 00:46:17.120
where we'll just
call money UTXOs,

00:46:17.120 --> 00:46:19.188
like, hey I've
got a lot of UTXO.

00:46:19.188 --> 00:46:20.480
I mean, I'm already doing that.

00:46:20.480 --> 00:46:22.420
And I'm pretty ahead
of the times, so.

00:46:25.020 --> 00:46:26.770
So you've got this
database, which

00:46:26.770 --> 00:46:29.410
is basically a key-value store.

00:46:29.410 --> 00:46:33.400
And it just has
transaction ID index--

00:46:33.400 --> 00:46:38.050
so this sort of how
you reference inputs

00:46:38.050 --> 00:46:41.380
in Bitcoin, the transaction
ID index as the key.

00:46:41.380 --> 00:46:44.800
And then the value
is just the output,

00:46:44.800 --> 00:46:50.000
the scriptsig and 8-byte amount.

00:46:50.000 --> 00:46:51.077
So it's pretty compact.

00:46:51.077 --> 00:46:52.410
You've got all these key values.

00:46:52.410 --> 00:46:54.570
And it's using level DB.

00:46:54.570 --> 00:46:57.290
But you could use some other
key-value store database.

00:46:57.290 --> 00:47:00.470
And the idea is, OK, every
time you get a transaction,

00:47:00.470 --> 00:47:01.827
validate all the inputs.

00:47:01.827 --> 00:47:03.410
Make sure all the
signatures are good.

00:47:03.410 --> 00:47:05.540
Make sure it's spending
things that actually

00:47:05.540 --> 00:47:07.250
exist in your UTXO set.

00:47:10.130 --> 00:47:13.870
And delete those inputs
from your UTXO DB.

00:47:13.870 --> 00:47:15.800
You say, OK, this
transaction is spending

00:47:15.800 --> 00:47:17.523
these inputs, so delete.

00:47:17.523 --> 00:47:18.440
So [INAUDIBLE], sorry.

00:47:18.440 --> 00:47:20.900
First, make sure the
transaction's valid,

00:47:20.900 --> 00:47:22.850
given your current UTXO DB.

00:47:22.850 --> 00:47:25.820
So validate that all
these inputs exist.

00:47:25.820 --> 00:47:28.450
Validate all the
signatures are correct.

00:47:28.450 --> 00:47:31.300
Then you're saying, OK,
this transaction is good.

00:47:31.300 --> 00:47:35.530
Now I modify my database by
deleting all the inputs that

00:47:35.530 --> 00:47:40.450
are consumed and adding
all these new outputs

00:47:40.450 --> 00:47:43.280
for the transaction.

00:47:43.280 --> 00:47:46.130
So this modifies the
database in place.

00:47:46.130 --> 00:47:48.380
And you're sort of
constantly reading

00:47:48.380 --> 00:47:53.440
from it to validate
inputs, and then writing

00:47:53.440 --> 00:47:55.960
to it to delete inputs,
and then writing to it

00:47:55.960 --> 00:47:58.450
again to add outputs.

00:47:58.450 --> 00:48:00.940
So it doesn't seem too bad.

00:48:00.940 --> 00:48:03.000
But there's a lot
of disk access.

00:48:03.000 --> 00:48:06.280
And the UTXO DB is a key-value
store with a lot of keys.

00:48:06.280 --> 00:48:07.690
The values are very small.

00:48:07.690 --> 00:48:10.900
So it's not like a
crazy database problem,

00:48:10.900 --> 00:48:14.500
if anyone's interested
in databases and stuff.

00:48:14.500 --> 00:48:16.090
But it can be slow.

00:48:16.090 --> 00:48:18.690
And we want to
really optimize it.

00:48:18.690 --> 00:48:22.320
So when you think the
initial block download,

00:48:22.320 --> 00:48:23.850
you're doing this
300 million times.

00:48:23.850 --> 00:48:28.120
So there's about 300 million
transactions historically.

00:48:28.120 --> 00:48:31.140
So you're validating signature,
deleting input, adding

00:48:31.140 --> 00:48:32.770
output, 300 million times.

00:48:32.770 --> 00:48:36.000
It ends up being about 170
gigabytes of downloads.

00:48:36.000 --> 00:48:37.500
And then the end
result, when you're

00:48:37.500 --> 00:48:41.160
done modifying this
database, is that you

00:48:41.160 --> 00:48:44.340
have 55 million transaction
outputs remaining.

00:48:44.340 --> 00:48:48.250
And it's about 3.2
gigabytes of disk use.

00:48:48.250 --> 00:48:53.100
So yeah, but you had to download
that 170 gigabytes to get

00:48:53.100 --> 00:48:56.850
to the 3.2-gigabyte
end state, because most

00:48:56.850 --> 00:48:59.070
of the transactions that
have been created and most

00:48:59.070 --> 00:49:01.890
of the outputs have
then later been spent.

00:49:01.890 --> 00:49:04.930
So there's a lot of churn.

00:49:04.930 --> 00:49:06.757
So yeah, of the 300 million--

00:49:06.757 --> 00:49:08.340
sorry, these are not
the same numbers.

00:49:08.340 --> 00:49:09.710
I was actually looking.

00:49:09.710 --> 00:49:11.837
How many transaction
outputs have been created

00:49:11.837 --> 00:49:12.920
throughout all of Bitcoin?

00:49:12.920 --> 00:49:14.212
And I couldn't find the number.

00:49:14.212 --> 00:49:17.120
And I didn't want to write
software to figure it out.

00:49:17.120 --> 00:49:22.460
But you can certainly figure
it out from the blockchain.

00:49:22.460 --> 00:49:24.920
But yeah, this is
transaction outputs.

00:49:24.920 --> 00:49:27.468
How many total
transactions have TXOs?

00:49:27.468 --> 00:49:28.010
I'm not sure.

00:49:28.010 --> 00:49:29.660
But yeah, so it's pretty big.

00:49:29.660 --> 00:49:30.650
But it's reasonable.

00:49:30.650 --> 00:49:32.400
Like, we can do this
on today's computers.

00:49:32.400 --> 00:49:34.700
If you've got a decent
laptop, this is possible.

00:49:34.700 --> 00:49:38.840
This total time taken
depends on a lot of factors.

00:49:38.840 --> 00:49:41.120
Has anyone actually done
initial block download

00:49:41.120 --> 00:49:42.740
and synced to Bitcoin
node, and like,

00:49:42.740 --> 00:49:45.487
want to say about how
quickly they did it or?

00:49:45.487 --> 00:49:46.820
OK, James, how long did it take?

00:49:46.820 --> 00:49:50.540
AUDIENCE: For 0.15, it's
actually quite quick.

00:49:50.540 --> 00:49:54.117
On a spinning disk it will
take about six hours maybe.

00:49:54.117 --> 00:49:55.450
TADGE DRYJA: On a spinning disk?

00:49:55.450 --> 00:49:58.760
AUDIENCE: Yeah, yeah, with the
new one, it's really quick.

00:49:58.760 --> 00:50:00.590
TADGE DRYJA:
Because I run 0.15.1

00:50:00.590 --> 00:50:02.390
on a laptop with
a spinning disk,

00:50:02.390 --> 00:50:07.220
and it'll take like overnight
to just sync up a week or so.

00:50:07.220 --> 00:50:08.690
It's really slow,
but I don't know.

00:50:08.690 --> 00:50:09.870
AUDIENCE: Like, my
mum tried to start it,

00:50:09.870 --> 00:50:10.995
and it did it from scratch.

00:50:10.995 --> 00:50:12.397
It did it in like eight hours.

00:50:12.397 --> 00:50:14.980
TADGE DRYJA: Wow, cool, so eight
hours to do the whole thing--

00:50:14.980 --> 00:50:18.130
anyone else have tried it?

00:50:18.130 --> 00:50:18.630
Yeah.

00:50:18.630 --> 00:50:20.790
AUDIENCE: A while back
it took me a week.

00:50:20.790 --> 00:50:22.748
TADGE DRYJA: Yeah, a
while back it took a week.

00:50:22.748 --> 00:50:25.270
So the software has been
improved quite a bit.

00:50:25.270 --> 00:50:28.110
So if you downloaded it--

00:50:28.110 --> 00:50:30.000
like, I first
downloaded it in 2011.

00:50:30.000 --> 00:50:32.750
And it took overnight
to download everything.

00:50:32.750 --> 00:50:34.350
And the download
was vastly smaller.

00:50:34.350 --> 00:50:36.960
It was less than a gigabyte to
download the entire blockchain.

00:50:36.960 --> 00:50:39.930
So what's interesting
is that the time taken

00:50:39.930 --> 00:50:44.310
for initial block download
over the last seven years

00:50:44.310 --> 00:50:47.340
has been somewhat constant
in that the blockchain gets

00:50:47.340 --> 00:50:48.090
bigger and bigger.

00:50:48.090 --> 00:50:49.890
But there's all
these optimizations

00:50:49.890 --> 00:50:51.990
to the code and the databases.

00:50:51.990 --> 00:50:55.160
And so that sort of keeps pace.

00:50:55.160 --> 00:50:57.830
Although actually, I'd say
recently it's gotten faster,

00:50:57.830 --> 00:50:59.730
because like 0.15--

00:50:59.730 --> 00:51:03.830
wait, 0.11 or 0.12 had
a big speed-up as well.

00:51:03.830 --> 00:51:06.185
AUDIENCE: They completely
refactored the net web code.

00:51:06.185 --> 00:51:08.060
It used to be couples
of the synchronization.

00:51:08.060 --> 00:51:09.185
And then they decoupled it.

00:51:09.185 --> 00:51:12.290
TADGE DRYJA: Yeah, I think
that was mostly Cory, right?

00:51:12.290 --> 00:51:15.200
So Cory Fields, who
also works for the DCI,

00:51:15.200 --> 00:51:18.650
helped to refactor the
code, make it a lot faster.

00:51:18.650 --> 00:51:23.090
There's definitely still
optimizations, a lot of cool--

00:51:23.090 --> 00:51:26.120
a lot of it's pretty low-level
tweaks kind of stuff.

00:51:26.120 --> 00:51:27.950
But some of them are
pretty big things.

00:51:27.950 --> 00:51:29.700
Most of the big things,
low-hanging fruit,

00:51:29.700 --> 00:51:31.370
has already been gotten.

00:51:31.370 --> 00:51:35.210
The worry is that long-term,
this just keeps going up.

00:51:35.210 --> 00:51:37.888
As the blockchain gets
bigger and longer,

00:51:37.888 --> 00:51:38.930
it's going to take heart.

00:51:38.930 --> 00:51:40.880
It's going to be
harder to validate.

00:51:40.880 --> 00:51:43.220
It can be parallelized
to some extent.

00:51:43.220 --> 00:51:46.760
But there's also network I/O
concerns, things like that.

00:51:46.760 --> 00:51:49.760
So it's tricky but doable.

00:51:49.760 --> 00:51:54.360
Any questions about
initial block download?

00:51:54.360 --> 00:51:56.130
Good?

00:51:56.130 --> 00:51:59.130
So here's a question.

00:51:59.130 --> 00:52:01.350
You've got this UTXO DB.

00:52:01.350 --> 00:52:02.690
What about this 170 gigabytes?

00:52:02.690 --> 00:52:03.690
Do you have to store it?

00:52:03.690 --> 00:52:06.602
Or can you delete it?

00:52:06.602 --> 00:52:07.810
This you can't delete, right?

00:52:11.160 --> 00:52:13.810
So you think this is OK?

00:52:13.810 --> 00:52:15.600
Yeah, you can maybe
delete some of this.

00:52:15.600 --> 00:52:17.058
Actually, there's
a lot of research

00:52:17.058 --> 00:52:22.170
into maybe we can delete
this, accumulators, cool stuff

00:52:22.170 --> 00:52:23.090
like that.

00:52:23.090 --> 00:52:25.650
It would be really cool to have
some kind of data structure

00:52:25.650 --> 00:52:27.960
where we can keep adding these--

00:52:27.960 --> 00:52:31.590
we can add, remove, and
prove, and then seek,

00:52:31.590 --> 00:52:34.168
and see if something's
in there, where it either

00:52:34.168 --> 00:52:36.710
is like constant size, or login
size, or something like that.

00:52:36.710 --> 00:52:37.627
That'd be really cool.

00:52:37.627 --> 00:52:40.140
There are constructions
like that,

00:52:40.140 --> 00:52:42.943
but they don't work for what
we're trying to do right now.

00:52:42.943 --> 00:52:44.610
But there's a lot of
research into that.

00:52:44.610 --> 00:52:47.070
If anyone here finds
some cool data structure

00:52:47.070 --> 00:52:51.290
that you can use for the UTXO
DB that doesn't keep growing

00:52:51.290 --> 00:52:54.750
linear at size of the number
of keys, everyone in Bitcoin

00:52:54.750 --> 00:52:58.740
will sing your praises forever.

00:52:58.740 --> 00:53:00.780
But it's an active
research area.

00:53:00.780 --> 00:53:06.210
So pruning-- by default-- oh,
that should be a K, not an M,

00:53:06.210 --> 00:53:06.790
sorry.

00:53:06.790 --> 00:53:09.760
There's only 500K
blocks, not 500M.

00:53:09.760 --> 00:53:13.920
Anyway, by default, your client
will download all these blocks

00:53:13.920 --> 00:53:15.720
and store them on the disk.

00:53:15.720 --> 00:53:18.600
And that's important
because what if someone

00:53:18.600 --> 00:53:22.020
else requests them from you?

00:53:22.020 --> 00:53:23.640
Everyone starts out as a noob.

00:53:23.640 --> 00:53:25.500
Someone else comes
and says, hey, guys,

00:53:25.500 --> 00:53:27.750
I just downloaded Bitcoin.

00:53:27.750 --> 00:53:30.160
What's going on for
the last nine years?

00:53:30.160 --> 00:53:31.860
And you might want
to give them blocks

00:53:31.860 --> 00:53:35.210
to let them into the system.

00:53:35.210 --> 00:53:37.500
So you can serve to
others who are doing IBD.

00:53:37.500 --> 00:53:39.980
However, if you want, and
your hard drive's small,

00:53:39.980 --> 00:53:41.610
or you have an SSD
or something, you

00:53:41.610 --> 00:53:45.540
can prune and delete the
blocks after you've done IBD,

00:53:45.540 --> 00:53:49.110
with no loss of security.

00:53:49.110 --> 00:53:52.110
Anyone think of
downsides doing so?

00:53:58.140 --> 00:53:59.790
Not really, right?

00:53:59.790 --> 00:54:03.070
The only real downside
is sort of this.

00:54:03.070 --> 00:54:04.680
Well, not everyone can prune.

00:54:04.680 --> 00:54:09.128
If everyone prunes, no new
entrants to the system.

00:54:09.128 --> 00:54:10.670
So it's a little
bit of a tricky sort

00:54:10.670 --> 00:54:13.880
of seed versus leech
kind of problem,

00:54:13.880 --> 00:54:17.322
where someone's got to be
there to serve up these blocks.

00:54:17.322 --> 00:54:18.530
You don't have to trust them.

00:54:18.530 --> 00:54:20.480
You're still validating
all the work,

00:54:20.480 --> 00:54:22.070
validating all the signatures.

00:54:22.070 --> 00:54:24.440
They can't do anything bad.

00:54:24.440 --> 00:54:28.497
But someone's got to be there
to provide the network capacity.

00:54:28.497 --> 00:54:29.330
And so it is tricky.

00:54:29.330 --> 00:54:31.010
Like, most of the
nodes on the network

00:54:31.010 --> 00:54:34.310
are behind people's cable
modem firewall kind of thing.

00:54:34.310 --> 00:54:36.860
So you can't actually
connect to them and download.

00:54:36.860 --> 00:54:39.320
And if you run a node
that does allow people

00:54:39.320 --> 00:54:41.635
to connect in and
serve them blocks,

00:54:41.635 --> 00:54:43.010
people will download
quite a bit.

00:54:43.010 --> 00:54:46.730
So I have one in the
office over there.

00:54:46.730 --> 00:54:51.600
It ends up sending out about
three terabytes a month,

00:54:51.600 --> 00:54:52.440
which is a lot.

00:54:52.440 --> 00:54:57.800
Like, it's dozens of gigabytes
a day, 20, 30, I don't know.

00:54:57.800 --> 00:54:59.570
So yeah, people are doing this.

00:54:59.570 --> 00:55:02.210
People are connecting in and
downloading all the blocks,

00:55:02.210 --> 00:55:04.070
either through IBD
or just keeping up

00:55:04.070 --> 00:55:05.510
with current transactions.

00:55:05.510 --> 00:55:07.460
So yeah, pruning is possible.

00:55:07.460 --> 00:55:12.750
But not everyone can do it, so
it's sort of an unsolved issue

00:55:12.750 --> 00:55:13.250
there.

00:55:13.250 --> 00:55:14.667
There's a lot of
research into how

00:55:14.667 --> 00:55:17.000
we can do partial
pruning, where, OK, I'm

00:55:17.000 --> 00:55:20.180
going to only store the last
month's worth of blocks, which

00:55:20.180 --> 00:55:22.460
is mostly what people do,
because a lot of people

00:55:22.460 --> 00:55:25.310
have intermittent connectivity,
where they'll turn off

00:55:25.310 --> 00:55:28.220
their node and then start it
back up again a few days later.

00:55:28.220 --> 00:55:31.930
And they just need to catch
up with the last few blocks.

00:55:31.930 --> 00:55:32.680
So pruning's cool.

00:55:32.680 --> 00:55:36.250
That's been in since
0.12 or something.

00:55:36.250 --> 00:55:38.770
So I'll go through--

00:55:38.770 --> 00:55:41.890
in practice, if you go
to your Bitcoin node,

00:55:41.890 --> 00:55:43.250
what does that actually store?

00:55:43.250 --> 00:55:45.080
And if you just go to
your Bitcoin folder,

00:55:45.080 --> 00:55:49.460
which in Unix-type OS's is
like home directory /.Bitcoin--

00:55:52.030 --> 00:55:54.490
total random aside,
I don't like how

00:55:54.490 --> 00:55:58.090
they put a dot in front of all
the really important folders.

00:55:58.090 --> 00:56:00.850
It's like they hide all the
important things, like your GPG

00:56:00.850 --> 00:56:01.880
folder, It's got a dot.

00:56:01.880 --> 00:56:03.430
And your Bitcoin
folder's got a dot.

00:56:03.430 --> 00:56:05.155
But like, downloads doesn't.

00:56:05.155 --> 00:56:08.740
And like, who cares about that?

00:56:08.740 --> 00:56:13.000
Anyway, so if you just
ls in your folder,

00:56:13.000 --> 00:56:14.620
here's all the files.

00:56:14.620 --> 00:56:17.420
And we'll just go
through it real quick.

00:56:17.420 --> 00:56:19.090
Here's the files,
and I'll describe.

00:56:19.090 --> 00:56:21.850
So there's a banlist.dat.

00:56:21.850 --> 00:56:24.010
This is a list of IP
addresses that you have

00:56:24.010 --> 00:56:27.190
banned, because they're bad.

00:56:27.190 --> 00:56:28.550
They're doing something weird.

00:56:28.550 --> 00:56:29.920
So I'll get to it at the end.

00:56:29.920 --> 00:56:33.640
I sort of am thinking of making
a ban list for the problem set,

00:56:33.640 --> 00:56:36.910
because there are some nodes
that are doing non-good things.

00:56:36.910 --> 00:56:40.390
That was what caused
yesterday's outage.

00:56:40.390 --> 00:56:41.410
Someone was connecting.

00:56:41.410 --> 00:56:43.743
Although it was really my
fault, because the server code

00:56:43.743 --> 00:56:46.030
was not verifying
inputs correctly.

00:56:46.030 --> 00:56:48.340
But yeah, in Bitcoin,
you verify everything.

00:56:48.340 --> 00:56:50.830
If people start sending you
nonsense data, or they say,

00:56:50.830 --> 00:56:52.420
hey, here's a block,
and it's wrong,

00:56:52.420 --> 00:56:55.930
or hey, here's a transaction,
and the signatures are wrong,

00:56:55.930 --> 00:56:59.600
you'll pretty quickly ban
them, because it's like, well,

00:56:59.600 --> 00:57:01.287
if they're making a mistake--

00:57:01.287 --> 00:57:01.870
it's computer.

00:57:01.870 --> 00:57:04.790
There's no excuse
for making a mistake.

00:57:04.790 --> 00:57:07.032
So either their software is
just different than mine,

00:57:07.032 --> 00:57:09.490
or something's wrong with their
software or their hardware,

00:57:09.490 --> 00:57:10.032
I don't know.

00:57:10.032 --> 00:57:11.770
But they're wasting my time.

00:57:11.770 --> 00:57:14.200
They're sending
me nonsense-- ban.

00:57:14.200 --> 00:57:16.340
So you have your own ban list.

00:57:16.340 --> 00:57:18.120
Then the blue ones are folders.

00:57:18.120 --> 00:57:20.590
I'll talk about those later.

00:57:20.590 --> 00:57:23.350
But you have peers.dat,
which is good nodes.

00:57:23.350 --> 00:57:25.660
So it's quite a
bit, 4 megabytes.

00:57:25.660 --> 00:57:28.780
And you keep track of here's
all the different nodes

00:57:28.780 --> 00:57:32.350
I've connected to for
the duration of however

00:57:32.350 --> 00:57:34.330
long I've been using Bitcoin.

00:57:34.330 --> 00:57:36.520
I keep track of all
their IP addresses,

00:57:36.520 --> 00:57:40.520
how much uptime they've had,
what I've downloaded from them.

00:57:40.520 --> 00:57:43.270
And so I sort of sort them and
put the good ones at the top.

00:57:43.270 --> 00:57:46.330
And like, OK, here's all
the different Bitcoin nodes.

00:57:46.330 --> 00:57:48.310
So next time I start
up Bitcoin, I'm

00:57:48.310 --> 00:57:50.320
going to try to connect to them.

00:57:50.320 --> 00:57:52.060
So this makes the
network very robust,

00:57:52.060 --> 00:57:54.520
because everyone
remembers everyone else.

00:57:54.520 --> 00:57:55.720
And then when they need to--

00:57:55.720 --> 00:57:58.330
if there's a network
disruption, maybe half the nodes

00:57:58.330 --> 00:58:00.460
go off the network,
you can still

00:58:00.460 --> 00:58:02.440
try to connect to all the rest.

00:58:02.440 --> 00:58:06.910
And also peers will share their
peers files, not directly,

00:58:06.910 --> 00:58:09.400
but they'll sort of take
random samplings of this file

00:58:09.400 --> 00:58:11.650
and share it with each other,
so that everyone sort of

00:58:11.650 --> 00:58:14.510
knows about everyone else.

00:58:14.510 --> 00:58:17.890
Then there's a wallet.dat,
which is very important,

00:58:17.890 --> 00:58:21.200
because that's got all
your precious UTXOs.

00:58:21.200 --> 00:58:25.490
And we'll talk about
wallets Monday, I think.

00:58:25.490 --> 00:58:27.770
There's a bitcoin.conf,
little config file.

00:58:27.770 --> 00:58:30.635
You can set some settings
and things like that;

00:58:30.635 --> 00:58:34.340
a debug file, which shows
all these weird messages;

00:58:34.340 --> 00:58:35.750
and a mempool.dat.

00:58:35.750 --> 00:58:40.130
So the mempool is a transaction
you've seen that you've not

00:58:40.130 --> 00:58:41.510
seen in a block yet.

00:58:41.510 --> 00:58:44.180
So people are
broadcasting transactions.

00:58:44.180 --> 00:58:45.860
And you store them.

00:58:45.860 --> 00:58:48.817
It used to be just in memory,
hence the word "mempool."

00:58:48.817 --> 00:58:50.900
Now it's more like disk
pool, because you actually

00:58:50.900 --> 00:58:54.170
store them on disk, because it
saves a little speed when you

00:58:54.170 --> 00:58:55.850
shut down and start up again.

00:58:55.850 --> 00:58:59.120
So any questions about just
what all these files are doing?

00:59:02.467 --> 00:59:03.800
Makes sense, so now the folders.

00:59:06.570 --> 00:59:08.850
Chainstate blocks
and database-- so any

00:59:08.850 --> 00:59:11.220
guesses onto how
big these things are

00:59:11.220 --> 00:59:14.000
based on previous slides or?

00:59:14.000 --> 00:59:15.970
So how big is
chainstate, for example?

00:59:18.830 --> 00:59:19.330
Yes?

00:59:19.330 --> 00:59:21.920
AUDIENCE: 3 gigs.

00:59:21.920 --> 00:59:24.150
TADGE DRYJA: Yeah, 3 gigs.

00:59:24.150 --> 00:59:26.972
This is a UTXO set, 3-ish gigs.

00:59:26.972 --> 00:59:27.930
This is all the blocks.

00:59:27.930 --> 00:59:28.430
What?

00:59:28.430 --> 00:59:29.388
Oh, no.

00:59:32.547 --> 00:59:34.380
And then database,
actually, I have no idea.

00:59:34.380 --> 00:59:35.710
Does anyone know what that is?

00:59:35.710 --> 00:59:38.850
There's a database folder, and
it's got one little log file.

00:59:38.850 --> 00:59:40.410
And it's like 80 kilobytes.

00:59:40.410 --> 00:59:42.170
I don't know what it is.

00:59:42.170 --> 00:59:43.550
Do you guys know?

00:59:43.550 --> 00:59:45.630
Yeah, I don't know.

00:59:45.630 --> 00:59:48.340
But there's a blocks folder,
and that's got all the blocks.

00:59:48.340 --> 00:59:52.440
And that's your
huge amount of data.

00:59:52.440 --> 00:59:55.270
And this is the UTXO
set, not too bad.

00:59:55.270 --> 00:59:57.290
So yeah, you can look in it.

00:59:57.290 --> 01:00:00.827
It's reasonable but
yeah, it's kind of big.

01:00:00.827 --> 01:00:02.410
So any questions
about the data stuff?

01:00:02.410 --> 01:00:04.660
I'm going to go into
blockchain as a database,

01:00:04.660 --> 01:00:06.790
real quick at the end.

01:00:06.790 --> 01:00:10.480
So it's 186 gigabytes,
or alternatively, you

01:00:10.480 --> 01:00:12.310
can think of it as
just 3 gigabytes.

01:00:12.310 --> 01:00:14.017
But it's a really
crummy database.

01:00:14.017 --> 01:00:15.475
So I've heard a
lot that blockchain

01:00:15.475 --> 01:00:16.750
is going to change the world.

01:00:16.750 --> 01:00:19.090
And it's like a database
that's shared among everyone.

01:00:19.090 --> 01:00:21.065
And you can query things.

01:00:21.065 --> 01:00:22.190
It's a really bad database.

01:00:22.190 --> 01:00:25.150
So for example,
I'm going to have

01:00:25.150 --> 01:00:30.820
some fun interactive
questions, where some of these

01:00:30.820 --> 01:00:31.690
are answerable.

01:00:31.690 --> 01:00:33.940
Some of these are not.

01:00:33.940 --> 01:00:37.507
And I'm posing the question
to my Bitcoin node.

01:00:37.507 --> 01:00:39.340
So I posed this question
to my Bitcoin node.

01:00:39.340 --> 01:00:42.970
Hey, remember transaction
9e95c3 dot, dot, dot,

01:00:42.970 --> 01:00:44.860
from back in 2014?

01:00:44.860 --> 01:00:48.170
And how do you think the
Bitcoin node will answer?

01:00:48.170 --> 01:00:51.640
Will it answer, or
will it not be able to?

01:00:51.640 --> 01:00:52.280
Any ideas?

01:00:52.280 --> 01:00:52.780
Yeah.

01:00:52.780 --> 01:00:55.195
AUDIENCE: Wait, where does
one be easiest [INAUDIBLE]??

01:00:58.100 --> 01:01:02.120
TADGE DRYJA: 183 plus 3,
so the total data usage

01:01:02.120 --> 01:01:05.296
on this computer is 186 gigs.

01:01:05.296 --> 01:01:09.072
The rest are kind of small.

01:01:09.072 --> 01:01:10.658
AUDIENCE: What do you mean?

01:01:10.658 --> 01:01:12.950
TADGE DRYJA: So I mean like,
when you're using Bitcoin,

01:01:12.950 --> 01:01:15.680
you've got 186 gigs
on your hard drive

01:01:15.680 --> 01:01:19.880
or your SSD devoted to Bitcoin.

01:01:19.880 --> 01:01:23.900
So you've got this 186-gigabyte
database, essentially.

01:01:23.900 --> 01:01:26.330
But it's a really
crummy database.

01:01:26.330 --> 01:01:29.670
And it can't do a lot of the
things you might expect it to.

01:01:29.670 --> 01:01:31.640
So for example, this--

01:01:31.640 --> 01:01:33.620
arbitrary transaction
from the past-- you say,

01:01:33.620 --> 01:01:36.920
hey, there was this transaction
a couple of years ago.

01:01:36.920 --> 01:01:38.780
Give me the
information about it.

01:01:38.780 --> 01:01:41.175
And what do you think the
response from the full node is?

01:01:41.175 --> 01:01:42.050
AUDIENCE: It's valid.

01:01:42.050 --> 01:01:42.657
TADGE DRYJA: What, sorry?

01:01:42.657 --> 01:01:43.850
AUDIENCE: It's
valid or not valid.

01:01:43.850 --> 01:01:44.930
TADGE DRYJA: It's
valid or not valid.

01:01:44.930 --> 01:01:45.597
Any other ideas?

01:01:45.597 --> 01:01:47.150
AUDIENCE: What's the header?

01:01:47.150 --> 01:01:47.735
TADGE DRYJA: What, sorry?

01:01:47.735 --> 01:01:48.920
AUDIENCE: What's the
header of [INAUDIBLE]??

01:01:48.920 --> 01:01:51.010
TADGE DRYJA: It asks
instead for a header.

01:01:51.010 --> 01:01:51.760
Any other ideas?

01:01:51.760 --> 01:01:55.900
Yeah, so sort of that--
it'll say, remember TX disk?

01:01:55.900 --> 01:01:58.090
No, it's somewhere
in the blocks maybe,

01:01:58.090 --> 01:01:59.500
but I have no idea where.

01:01:59.500 --> 01:02:01.530
It's not in the chainstate.

01:02:01.530 --> 01:02:04.020
So it just stores the blocks.

01:02:04.020 --> 01:02:06.460
Like, here's this block.

01:02:06.460 --> 01:02:08.170
Here's that block, in line.

01:02:08.170 --> 01:02:10.830
And if you say, hey,
there's this transaction.

01:02:10.830 --> 01:02:12.400
OK, go look for it.

01:02:12.400 --> 01:02:17.310
Oh, 2014, well, that might
be somewhere in the middle.

01:02:17.310 --> 01:02:20.662
But yeah, if you don't know
what block it's in, forget it.

01:02:20.662 --> 01:02:22.120
So it does have an
index of blocks.

01:02:22.120 --> 01:02:24.450
It'll tell you a block,
but transaction, no luck.

01:02:24.450 --> 01:02:24.950
Yeah.

01:02:24.950 --> 01:02:26.844
AUDIENCE: So it pretty much
tells you if it exists,

01:02:26.844 --> 01:02:27.480
and that's it.

01:02:27.480 --> 01:02:29.470
TADGE DRYJA: It won't even
tell you if this exists.

01:02:29.470 --> 01:02:30.095
It has no idea.

01:02:30.095 --> 01:02:32.730
AUDIENCE: What's the
[INAUDIBLE] in the block though?

01:02:32.730 --> 01:02:35.107
TADGE DRYJA: I might
have made that up.

01:02:35.107 --> 01:02:37.190
So if you're saying, hey,
here's this transaction.

01:02:37.190 --> 01:02:37.770
Do you remember it?

01:02:37.770 --> 01:02:38.470
Does it exist?

01:02:38.470 --> 01:02:39.610
I don't know.

01:02:39.610 --> 01:02:40.110
Yeah.

01:02:40.110 --> 01:02:43.048
AUDIENCE: If you ask-- if you
query about a certain block,

01:02:43.048 --> 01:02:43.840
will it be able to?

01:02:43.840 --> 01:02:45.397
TADGE DRYJA: Yes, and I'll--

01:02:45.397 --> 01:02:46.230
yeah, good question.

01:02:46.230 --> 01:02:47.310
But I'll get to that.

01:02:47.310 --> 01:02:48.760
I think it's in
the later slides.

01:02:48.760 --> 01:02:52.890
But yes, if you create
your base on a block hash,

01:02:52.890 --> 01:02:54.880
then it does have
that in the database.

01:02:54.880 --> 01:02:56.380
And it'll be able
to get it for you.

01:02:56.380 --> 01:02:57.050
Yeah, James.

01:02:57.050 --> 01:02:59.140
AUDIENCE: I know what the
database directory does.

01:02:59.140 --> 01:03:00.740
TADGE DRYJA: You know what
the database directory does.

01:03:00.740 --> 01:03:02.240
AUDIENCE: Yeah,
it's the journaling

01:03:02.240 --> 01:03:03.615
for the other databases.

01:03:03.615 --> 01:03:05.490
TADGE DRYJA: Journaling
for other databases--

01:03:05.490 --> 01:03:07.140
OK, I didn't-- yeah, cool.

01:03:07.140 --> 01:03:10.790
It's very small, so I
guess it helps things work.

01:03:10.790 --> 01:03:13.820
So this one-- do you
know this transaction?

01:03:13.820 --> 01:03:14.630
No.

01:03:14.630 --> 01:03:16.580
How about this?

01:03:16.580 --> 01:03:19.640
Well, I've got this output.

01:03:19.640 --> 01:03:24.230
It's still there in the UTX--
like, someone spent here.

01:03:24.230 --> 01:03:26.570
It's like, this a
transaction in the first one.

01:03:26.570 --> 01:03:27.920
It's an op_return output.

01:03:27.920 --> 01:03:29.360
So it's got some extra data.

01:03:29.360 --> 01:03:31.220
But op_return
means it's invalid.

01:03:31.220 --> 01:03:32.150
It can't spend it.

01:03:32.150 --> 01:03:34.202
Can you tell me
what the data is?

01:03:34.202 --> 01:03:36.640
Do you think it'll be able to?

01:03:36.640 --> 01:03:40.300
If you query, hey,
here's this output,

01:03:40.300 --> 01:03:43.340
what do you think
the response will be?

01:03:43.340 --> 01:03:45.371
Yea, nay?

01:03:45.371 --> 01:03:46.730
Nay, OK, I'm seeing nays.

01:03:46.730 --> 01:03:47.540
Yeah.

01:03:47.540 --> 01:03:48.085
Nope.

01:03:48.085 --> 01:03:50.630
If it's an op_return output,
even though it's unspent,

01:03:50.630 --> 01:03:52.280
well, it's unspendable.

01:03:52.280 --> 01:03:55.220
So you don't put it
in the UTXO database,

01:03:55.220 --> 01:03:58.715
because you just see, oh, this
output, op_return is in there.

01:03:58.715 --> 01:03:59.840
Don't bother putting it in.

01:03:59.840 --> 01:04:01.920
No one will ever be
able to spend it.

01:04:01.920 --> 01:04:03.680
So there's no
reason to put it in.

01:04:03.680 --> 01:04:05.780
So op_returns are
used to sort of commit

01:04:05.780 --> 01:04:07.970
to data and all these
different protocols.

01:04:07.970 --> 01:04:13.787
But the actual normal
code won't store them.

01:04:13.787 --> 01:04:14.370
Anything else?

01:04:14.370 --> 01:04:17.370
Next one, this one--

01:04:17.370 --> 01:04:19.808
hey, I have a public key.

01:04:19.808 --> 01:04:21.350
And here's the hash
of my public key.

01:04:21.350 --> 01:04:22.642
This is essentially an address.

01:04:22.642 --> 01:04:24.165
So we didn't talk
about addresses.

01:04:24.165 --> 01:04:26.500
But the Bitcoin addresses
that start with like a 1,

01:04:26.500 --> 01:04:29.700
and then have these
alphanumeric stuff--

01:04:29.700 --> 01:04:31.710
it's just a different
encoding, slightly shorter

01:04:31.710 --> 01:04:34.230
than hexadecimal,
for a pubkey hash.

01:04:34.230 --> 01:04:36.300
So you say, hey, I've got this.

01:04:36.300 --> 01:04:37.290
I have a private key.

01:04:37.290 --> 01:04:38.650
I just computed the public key.

01:04:38.650 --> 01:04:39.430
I hashed it.

01:04:39.430 --> 01:04:40.960
I got this.

01:04:40.960 --> 01:04:42.190
Do I have any money?

01:04:42.190 --> 01:04:43.250
I think I did.

01:04:43.250 --> 01:04:44.020
I don't remember.

01:04:44.020 --> 01:04:45.100
But I remember my private key.

01:04:45.100 --> 01:04:45.808
I backed that up.

01:04:45.808 --> 01:04:47.530
That was the important part.

01:04:47.530 --> 01:04:49.450
Everyone says keep
your private keys.

01:04:49.450 --> 01:04:51.190
So I have my private key.

01:04:51.190 --> 01:04:54.040
But all this data and
all this blockchain

01:04:54.040 --> 01:04:55.357
stuff, I lost my computer.

01:04:55.357 --> 01:04:57.190
But I have my private,
so I've got my money.

01:04:57.190 --> 01:04:58.360
How many coins do I have?

01:04:58.360 --> 01:05:00.580
How many outputs?

01:05:00.580 --> 01:05:05.310
What do you think the
full node will tell you?

01:05:05.310 --> 01:05:08.130
Yeah, any ideas?

01:05:08.130 --> 01:05:09.768
It'll say, I don't know.

01:05:09.768 --> 01:05:12.060
Well, you're going to have
to search through everything

01:05:12.060 --> 01:05:14.040
in chainstate.

01:05:14.040 --> 01:05:17.040
And it doesn't index based on
the public key script, only

01:05:17.040 --> 01:05:19.020
the transaction ID index there.

01:05:19.020 --> 01:05:24.510
It's a key-value store, and the
key is this 36-byte txid:index.

01:05:24.510 --> 01:05:26.610
So this is a very real problem.

01:05:26.610 --> 01:05:28.260
Like, OK, I backed up my key.

01:05:28.260 --> 01:05:30.630
Or I took my private keys
to some other computer

01:05:30.630 --> 01:05:32.860
or something like that.

01:05:32.860 --> 01:05:33.788
And this is fairly--

01:05:33.788 --> 01:05:34.830
it's gotten a lot faster.

01:05:34.830 --> 01:05:37.980
It used to take hours,
where you had a hard drive,

01:05:37.980 --> 01:05:41.100
and you're like, OK, import
a key to this wallet.

01:05:41.100 --> 01:05:43.430
And it's like, well, when
did you do transactions?

01:05:43.430 --> 01:05:46.060
It has to look through the
entire blockchain, [INAUDIBLE]

01:05:46.060 --> 01:05:49.110
and linearly, to see if
any of these transactions

01:05:49.110 --> 01:05:51.850
have an output that matches
that, and then says,

01:05:51.850 --> 01:05:53.550
oh, yeah, you got
money back in 2013.

01:05:53.550 --> 01:05:55.850
Oh, then you spent it--

01:05:55.850 --> 01:05:58.460
and sort of replays
things, because it

01:05:58.460 --> 01:05:59.710
doesn't have an address index.

01:06:02.530 --> 01:06:04.940
Next one-- this is an example.

01:06:04.940 --> 01:06:08.350
How many coins-- so you say,
hey, here's this output,

01:06:08.350 --> 01:06:11.860
this transaction:1, how
many coins does it have?

01:06:11.860 --> 01:06:14.970
Will the full node be
able to tell you this?

01:06:14.970 --> 01:06:17.540
Yea, nay?

01:06:17.540 --> 01:06:19.840
I'm seeing a bunch of nays.

01:06:19.840 --> 01:06:20.910
No, it will.

01:06:20.910 --> 01:06:24.610
Yeah, this is the
one thing it can do.

01:06:24.610 --> 01:06:28.413
So if you say, hey, 7434,
dot, dot, dot, colon 1,

01:06:28.413 --> 01:06:29.080
it'll know that.

01:06:29.080 --> 01:06:32.230
That's in the UTXO DB,
because that's the key

01:06:32.230 --> 01:06:34.400
that the UTXO DB sorts by.

01:06:34.400 --> 01:06:37.000
So yep, this is a UTXO.

01:06:37.000 --> 01:06:37.720
It's unspent.

01:06:37.720 --> 01:06:40.177
And it has a bunch of coins.

01:06:40.177 --> 01:06:41.260
And this is fairly recent.

01:06:41.260 --> 01:06:42.385
I was just looking through.

01:06:42.385 --> 01:06:45.160
Someone got a couple
million bucks worth, cool.

01:06:45.160 --> 01:06:46.048
Is that a million?

01:06:46.048 --> 01:06:48.970
Man.

01:06:48.970 --> 01:06:51.400
Yeah, it's a new UTXO
set, hasn't been spent,

01:06:51.400 --> 01:06:55.210
and you can sort quickly
based on txid:index pair.

01:06:55.210 --> 01:06:58.180
So I think this is in
some software called

01:06:58.180 --> 01:07:00.790
an outpoint, where it's
like, you concatenate them.

01:07:00.790 --> 01:07:03.310
And it ends up being 36 bytes.

01:07:03.310 --> 01:07:04.120
This is 32 bytes.

01:07:04.120 --> 01:07:04.915
This is 4.

01:07:04.915 --> 01:07:07.060
So you sort of have
a 36-byte outpoint,

01:07:07.060 --> 01:07:10.200
which describes what goes
into the UTXO database.

01:07:10.200 --> 01:07:11.976
AUDIENCE: But once
it gets respent,

01:07:11.976 --> 01:07:13.350
it's hard to find it again.

01:07:13.350 --> 01:07:15.330
TADGE DRYJA: Yeah,
once this is spent,

01:07:15.330 --> 01:07:18.658
you delete it from the UTXO, and
you won't remember it anymore.

01:07:18.658 --> 01:07:20.950
It'll just be, hey, you, how
many coins does this have?

01:07:20.950 --> 01:07:23.230
You're like, well-- well,
you can still answer it.

01:07:23.230 --> 01:07:25.140
You say "none."

01:07:25.140 --> 01:07:28.480
If it's spent, and you say, how
many coins does this guy have?

01:07:28.480 --> 01:07:28.980
None.

01:07:28.980 --> 01:07:30.022
It's not in the UTXO set.

01:07:32.695 --> 01:07:33.820
Yeah, so the previous one--

01:07:33.820 --> 01:07:35.028
I just copied these randomly.

01:07:38.010 --> 01:07:41.790
So any questions about what is
stored and what is not stored?

01:07:44.410 --> 01:07:47.220
Basically, keeps
track of UTXOs, and it

01:07:47.220 --> 01:07:48.970
keeps track of historic
blocks in order

01:07:48.970 --> 01:07:49.998
to give them to people.

01:07:49.998 --> 01:07:51.290
And it keeps after the headers.

01:07:51.290 --> 01:07:52.720
The headers ends up being small.

01:07:52.720 --> 01:07:54.270
All the headers
total is like, what?

01:07:54.270 --> 01:07:57.300
40 megs, something like that.

01:07:57.300 --> 01:08:00.470
So yeah, you can
add further indices.

01:08:00.470 --> 01:08:03.310
You could write software to
answer all these questions

01:08:03.310 --> 01:08:05.410
very quickly.

01:08:05.410 --> 01:08:08.500
But that's not what
Bitcoin does by default.

01:08:08.500 --> 01:08:12.220
Those types of indices would
take a lot of extra space

01:08:12.220 --> 01:08:15.440
and add a lot of CPU
or things like that.

01:08:15.440 --> 01:08:18.520
So a very common thing
is an address index,

01:08:18.520 --> 01:08:20.290
so people can ask if
they have any money.

01:08:20.290 --> 01:08:22.580
So the second to last
one, where you say,

01:08:22.580 --> 01:08:25.367
hey, I have this key hash.

01:08:25.367 --> 01:08:26.200
Do I have any money?

01:08:26.200 --> 01:08:27.850
Do I have any transactions?

01:08:27.850 --> 01:08:29.979
Having an address
index is actually

01:08:29.979 --> 01:08:32.240
pretty useful for
a lot of things,

01:08:32.240 --> 01:08:37.080
for example, importing keys or
like web wallet kind of things.

01:08:37.080 --> 01:08:42.598
But Bitcoin by default doesn't
do it, because, well, why?

01:08:42.598 --> 01:08:44.890
You can make arguments that
it would be actually useful

01:08:44.890 --> 01:08:49.109
to have in the normal
code, but we don't.

01:08:49.109 --> 01:08:52.700
Any other questions about
what indices, what it can do,

01:08:52.700 --> 01:08:54.630
what it cannot do?

01:08:54.630 --> 01:08:57.500
Somewhat counterintuitive in
many cases, where you say, hey,

01:08:57.500 --> 01:08:58.500
here's this transaction.

01:08:58.500 --> 01:09:01.180
And you can't actually find it.

01:09:01.180 --> 01:09:05.760
Or you have to scan through 180
gigabytes in order to find it.

01:09:05.760 --> 01:09:07.290
So wait, James, I
have a question.

01:09:07.290 --> 01:09:11.990
So how big is an address index
for what you were working on?

01:09:11.990 --> 01:09:14.800
AUDIENCE: Usually, equal to
the size of the chain inside.

01:09:14.800 --> 01:09:17.240
TADGE DRYJA: Wow, so it
could be hundreds of gigs.

01:09:17.240 --> 01:09:17.754
Yeah.

01:09:17.754 --> 01:09:18.962
AUDIENCE: It only takes what?

01:09:18.962 --> 01:09:21.560
At least for Bitcoin, it
takes usually multiple weeks

01:09:21.560 --> 01:09:22.258
to generate.

01:09:22.258 --> 01:09:23.550
TADGE DRYJA: Weeks to generate?

01:09:23.550 --> 01:09:24.322
I bet you, well--

01:09:24.322 --> 01:09:26.614
AUDIENCE: Although [INAUDIBLE]
usually takes a few days

01:09:26.614 --> 01:09:29.630
on the inside, which is
what [INAUDIBLE] takes--

01:09:29.630 --> 01:09:30.500
weeks.

01:09:30.500 --> 01:09:32.120
TADGE DRYJA: Well,
that's inside.

01:09:32.120 --> 01:09:37.250
So the other thing is, these
are like fairly involved sort

01:09:37.250 --> 01:09:39.350
of CSE software
engineering problems.

01:09:39.350 --> 01:09:41.510
And optimization really works.

01:09:41.510 --> 01:09:46.340
If you download like Bitcoin
0.9, it'll still work.

01:09:46.340 --> 01:09:48.229
But you're never
going to catch up.

01:09:48.229 --> 01:09:49.729
Maybe not never, I don't know.

01:09:49.729 --> 01:09:52.939
If you have a fast computer, but
it'll take months and months.

01:09:52.939 --> 01:09:55.010
And as people have been
updating the software

01:09:55.010 --> 01:09:57.230
and making it faster,
making it more efficient,

01:09:57.230 --> 01:09:58.310
now it's quite fast.

01:09:58.310 --> 01:10:01.790
And you can sync the
whole thing in a few hours

01:10:01.790 --> 01:10:03.810
on a good computer.

01:10:03.810 --> 01:10:05.780
So address index is one
of those things where

01:10:05.780 --> 01:10:08.930
it hasn't had like the full
force of all these Bitcoin

01:10:08.930 --> 01:10:12.088
protocol coder people on it,
because it's sort of seen

01:10:12.088 --> 01:10:14.630
as like, well, yeah that's kind
of a fun feature if you want.

01:10:14.630 --> 01:10:18.900
But it's not like a
core utility of Bitcoin.

01:10:18.900 --> 01:10:23.040
So yeah, it is a database,
maybe not the best way

01:10:23.040 --> 01:10:24.840
to think of it though.

01:10:24.840 --> 01:10:27.480
Don't think of the blockchain as
like a global shared database,

01:10:27.480 --> 01:10:28.470
because it sort of is.

01:10:28.470 --> 01:10:30.570
But it's a fairly
specific database

01:10:30.570 --> 01:10:34.470
that isn't useful for
many other things.

01:10:34.470 --> 01:10:36.100
Yeah, and it's also untrusted.

01:10:36.100 --> 01:10:39.250
Another part of why
is it's untrusted.

01:10:39.250 --> 01:10:41.560
Most of these things
exist so that they

01:10:41.560 --> 01:10:43.570
can be used over the
peer-to-peer peer network.

01:10:43.570 --> 01:10:45.610
If you request a block,
I'll give it to you.

01:10:45.610 --> 01:10:47.140
If you give me a
transaction, I'll

01:10:47.140 --> 01:10:49.578
match it against my UTXO set.

01:10:49.578 --> 01:10:51.370
But an address index
doesn't work that way.

01:10:51.370 --> 01:10:52.245
It's sort of trusted.

01:10:52.245 --> 01:10:54.800
I can easily omit things.

01:10:54.800 --> 01:10:56.290
If you say, hey, I've got a key.

01:10:56.290 --> 01:10:58.410
What are the transactions
involved with this key?

01:10:58.410 --> 01:11:00.160
I can omit things very
easily, and there's

01:11:00.160 --> 01:11:02.650
no way for you to
prove it or verify it.

01:11:02.650 --> 01:11:06.820
So your DB queries aren't really
given out to network peers.

01:11:06.820 --> 01:11:08.030
And network peers are scary.

01:11:08.030 --> 01:11:11.190
And you need to ban
them if they act funny.

01:11:11.190 --> 01:11:12.440
And this happens all the time.

01:11:12.440 --> 01:11:14.360
If you look through
Bitcoin logs,

01:11:14.360 --> 01:11:17.108
and you have a node that's
up, every few seconds, you're

01:11:17.108 --> 01:11:19.400
going to be disconnecting
from someone or banning them,

01:11:19.400 --> 01:11:22.340
because they're doing something
crazy, trying to hack into you

01:11:22.340 --> 01:11:23.450
or whatever.

01:11:23.450 --> 01:11:25.010
So basically, all
you're doing is

01:11:25.010 --> 01:11:27.950
you're providing headers,
blocks, transactions.

01:11:27.950 --> 01:11:30.400
And you're sharing the
other IPs and nodes.

01:11:30.400 --> 01:11:33.190
You try to simplify it.

01:11:33.190 --> 01:11:34.780
Other questions?

01:11:34.780 --> 01:11:38.530
Yeah, bad database, good for
consensus, it kind of works.

01:11:38.530 --> 01:11:41.080
Everyone's got
the same UTXO set,

01:11:41.080 --> 01:11:42.700
even though they
all really would

01:11:42.700 --> 01:11:44.270
like to change that UTXO set.

01:11:44.270 --> 01:11:46.120
I would much rather
everyone had a UTXO

01:11:46.120 --> 01:11:50.780
set where I had
those 27-coin UTXO.

01:11:50.780 --> 01:11:52.570
So almost everyone
in the systems

01:11:52.570 --> 01:11:55.540
would rather there was
a different UTXO set.

01:11:55.540 --> 01:11:58.700
And yet, they all managed to
agree on a single UTXO set,

01:11:58.700 --> 01:12:00.850
so pretty cool.