1 00:00:01,090 --> 00:00:03,460 The following content is provided under a Creative 2 00:00:03,460 --> 00:00:04,850 Commons license. 3 00:00:04,850 --> 00:00:07,060 Your support will help MIT OpenCourseWare 4 00:00:07,060 --> 00:00:11,150 continue to offer high quality educational resources for free. 5 00:00:11,150 --> 00:00:13,690 To make a donation or to view additional materials 6 00:00:13,690 --> 00:00:17,650 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,650 --> 00:00:18,550 at ocw.mit.edu. 8 00:00:22,520 --> 00:00:25,220 TADGE DRYJA: Today we're going to talk about synchronization-- 9 00:00:25,220 --> 00:00:27,950 how all these different nodes on the network 10 00:00:27,950 --> 00:00:31,790 come to consensus, how they link up. 11 00:00:31,790 --> 00:00:33,540 So it's sort of bringing it all together. 12 00:00:33,540 --> 00:00:36,110 So so far in these lectures we've 13 00:00:36,110 --> 00:00:40,910 talked about signatures, mining and blocks, transactions 14 00:00:40,910 --> 00:00:42,020 and scripts. 15 00:00:42,020 --> 00:00:43,987 And now we're going to put it all together. 16 00:00:43,987 --> 00:00:45,320 How does this all actually work? 17 00:00:45,320 --> 00:00:47,730 What do all these components come together to do? 18 00:00:47,730 --> 00:00:51,890 And how does this make this cool money? 19 00:00:51,890 --> 00:00:54,860 Quick recap on signatures that I think 20 00:00:54,860 --> 00:00:57,170 you got the idea from the homeworks, 21 00:00:57,170 --> 00:01:00,980 from the lectures-- you have these public and private keys. 22 00:01:00,980 --> 00:01:03,500 This key pair where you generate the private key, 23 00:01:03,500 --> 00:01:05,752 you distribute the public key. 24 00:01:05,752 --> 00:01:08,300 The private key can sign a message. 25 00:01:08,300 --> 00:01:14,240 And anyone can verify, given the triple public key, message, 26 00:01:14,240 --> 00:01:16,082 signature group. 27 00:01:16,082 --> 00:01:17,540 This is useful for lots of things-- 28 00:01:17,540 --> 00:01:20,390 providing identity, ownership, all sorts of things. 29 00:01:20,390 --> 00:01:22,250 And it's better than paper signatures. 30 00:01:22,250 --> 00:01:26,120 Paper signatures don't really sign the message. 31 00:01:26,120 --> 00:01:28,730 They just sort of sign the paper. 32 00:01:28,730 --> 00:01:32,090 So if you change some part of a document, 33 00:01:32,090 --> 00:01:33,980 the signature still sort of looks OK. 34 00:01:33,980 --> 00:01:35,090 Maybe you can see-- 35 00:01:35,090 --> 00:01:37,850 you want the paper to not be tampered. 36 00:01:37,850 --> 00:01:40,520 But you can sign a blank piece of paper 37 00:01:40,520 --> 00:01:43,160 and then add all this text on afterwards. 38 00:01:43,160 --> 00:01:46,380 You can't do that with these systems. 39 00:01:46,380 --> 00:01:47,890 So signatures are really cool. 40 00:01:47,890 --> 00:01:53,760 And they are sort of a necessary thing for this network to work. 41 00:01:53,760 --> 00:01:54,970 So mining and blocks-- 42 00:01:54,970 --> 00:01:57,880 I think you sort of got the idea with the Lamport signatures. 43 00:01:57,880 --> 00:02:00,880 If you've looked at the current problem set, 44 00:02:00,880 --> 00:02:03,190 it makes a lot of sense. 45 00:02:03,190 --> 00:02:06,560 You change it nonce, hash a bunch of times while you change 46 00:02:06,560 --> 00:02:07,690 a nonce a bunch of times. 47 00:02:07,690 --> 00:02:09,560 Try to get a low output. 48 00:02:09,560 --> 00:02:11,200 So you get the idea of mining. 49 00:02:11,200 --> 00:02:14,920 You're proving work by repeatedly going 50 00:02:14,920 --> 00:02:18,190 through different nonces, trying to find a certain hash output 51 00:02:18,190 --> 00:02:19,990 that there's no shortcut to. 52 00:02:19,990 --> 00:02:22,240 You just have the guess and check. 53 00:02:22,240 --> 00:02:24,340 And now if you include the previous data 54 00:02:24,340 --> 00:02:27,070 as part of your input to that hash function, 55 00:02:27,070 --> 00:02:29,500 you can make a chain of work. 56 00:02:29,500 --> 00:02:33,220 And that's what we call a blockchain I guess. 57 00:02:33,220 --> 00:02:34,955 Any questions on this so far? 58 00:02:34,955 --> 00:02:39,120 It's the basic idea from two lectures ago. 59 00:02:39,120 --> 00:02:40,880 Cool. 60 00:02:40,880 --> 00:02:44,815 And a recap of what Neha said yesterday-- 61 00:02:44,815 --> 00:02:46,190 there's transactions and scripts. 62 00:02:46,190 --> 00:02:48,800 I'm not going to go into scripts too much 63 00:02:48,800 --> 00:02:53,870 yet, because in practice, 99% of the scripts 64 00:02:53,870 --> 00:02:56,810 are just checking a public key signature. 65 00:02:56,810 --> 00:03:01,160 You can do all sorts of other crazy things with the scripts. 66 00:03:01,160 --> 00:03:04,010 But almost nobody does yet. 67 00:03:04,010 --> 00:03:06,860 Basically they say, OK, I'm sending to a public key hash. 68 00:03:06,860 --> 00:03:09,290 When I spend from it, I reveal the public key, 69 00:03:09,290 --> 00:03:12,892 check that the hash matches, check a signature. 70 00:03:12,892 --> 00:03:14,600 And transactions have inputs and outputs. 71 00:03:14,600 --> 00:03:15,860 We went over this yesterday. 72 00:03:15,860 --> 00:03:22,120 Just a sort of real-world numbers, 73 00:03:22,120 --> 00:03:25,420 this is how big things are, "ish," 74 00:03:25,420 --> 00:03:28,810 so a transaction ID and index to point 75 00:03:28,810 --> 00:03:30,280 to a previous transaction. 76 00:03:30,280 --> 00:03:32,470 The transaction ID is 32 bytes. 77 00:03:32,470 --> 00:03:36,677 The index is encoded as a four byte, so 32-bit integer, 78 00:03:36,677 --> 00:03:39,010 which is kind of overkill, because I think the most that 79 00:03:39,010 --> 00:03:39,945 there's ever-- 80 00:03:39,945 --> 00:03:41,620 the most outputs there's ever been 81 00:03:41,620 --> 00:03:43,480 is like 1,000 or something. 82 00:03:43,480 --> 00:03:47,440 So you really don't need it to be able to go up to 4 billion. 83 00:03:47,440 --> 00:03:49,450 But that's how it is. 84 00:03:49,450 --> 00:03:51,580 The signature ends up being about 100 bytes 85 00:03:51,580 --> 00:03:55,000 because you have to provide the public key, which is either 86 00:03:55,000 --> 00:03:57,190 33 or 65. 87 00:03:57,190 --> 00:03:59,530 And then the signature itself is encoded to something 88 00:03:59,530 --> 00:04:01,360 like 70 bytes. 89 00:04:01,360 --> 00:04:04,040 Both of those things could be much more efficient. 90 00:04:04,040 --> 00:04:06,462 But they're not yet. 91 00:04:06,462 --> 00:04:08,170 And the output is actually a lot smaller. 92 00:04:08,170 --> 00:04:09,730 An output, the script-- 93 00:04:09,730 --> 00:04:12,110 the main thing is your public key hash, 94 00:04:12,110 --> 00:04:14,260 which in Bitcoin is 20 bytes. 95 00:04:14,260 --> 00:04:16,510 And then you have those other little opcodes, 96 00:04:16,510 --> 00:04:18,339 which it adds a few bytes, and then 97 00:04:18,339 --> 00:04:21,459 your amount, which is-- so all amounts in Bitcoin 98 00:04:21,459 --> 00:04:25,405 are encoded as 8-byte, signed 64-bit integers. 99 00:04:25,405 --> 00:04:28,280 So you can have pretty high precision. 100 00:04:28,280 --> 00:04:32,500 And that's also overkill, since all the bitcoins ever mined, 101 00:04:32,500 --> 00:04:34,120 put together-- 102 00:04:34,120 --> 00:04:39,100 if you added them all up, it's nowhere near 64 bits to the 64. 103 00:04:39,100 --> 00:04:43,560 It's like 2 to the 40 something. 104 00:04:43,560 --> 00:04:46,180 So it's also kind of overkill but yeah, whatever. 105 00:04:46,180 --> 00:04:50,500 So this ends up being for a one input, one output transaction, 106 00:04:50,500 --> 00:04:51,640 less than 200 bytes. 107 00:04:54,300 --> 00:04:56,340 So that's a message, pretty small. 108 00:04:56,340 --> 00:04:58,986 You can broadcast it around the network. 109 00:04:58,986 --> 00:05:01,060 Inputs point to old outputs, have signatures. 110 00:05:01,060 --> 00:05:03,970 Outputs have scripts and coin amounts. 111 00:05:03,970 --> 00:05:05,630 So what do we do with all these things? 112 00:05:05,630 --> 00:05:07,360 What is the mining process? 113 00:05:07,360 --> 00:05:10,558 So in the homework, you're mining your name. 114 00:05:10,558 --> 00:05:12,100 You connect to the server, figure out 115 00:05:12,100 --> 00:05:14,980 with the last block was, put your name on, put a nonce, 116 00:05:14,980 --> 00:05:16,690 and continue to mine. 117 00:05:16,690 --> 00:05:18,940 That's not super useful, unless you want to prove that 118 00:05:18,940 --> 00:05:21,940 you're-- hey, this is me. 119 00:05:21,940 --> 00:05:24,220 In Bitcoin, the basic idea is users 120 00:05:24,220 --> 00:05:25,540 are making these transactions. 121 00:05:25,540 --> 00:05:28,520 Transactions are moving coins from one place to another, 122 00:05:28,520 --> 00:05:30,820 from one key to another. 123 00:05:30,820 --> 00:05:34,240 They make the transactions, they sign them, and they broadcast. 124 00:05:34,240 --> 00:05:36,860 I'll get in to what "broadcast" means. 125 00:05:36,860 --> 00:05:39,250 So in the current problem set, there's 126 00:05:39,250 --> 00:05:44,500 one server, which is not really a robust distributed system, 127 00:05:44,500 --> 00:05:46,480 as people may have seen yesterday 128 00:05:46,480 --> 00:05:52,390 from about 1:30 to 3:00 PM, when the whole thing went down. 129 00:05:52,390 --> 00:05:55,480 In Bitcoin, it's completely peer-to-peer. 130 00:05:55,480 --> 00:05:57,040 Every node is the same. 131 00:05:57,040 --> 00:05:59,180 They're all listening for people connecting in. 132 00:05:59,180 --> 00:06:03,040 They're all connecting out to other nodes. 133 00:06:03,040 --> 00:06:03,960 And they broadcast. 134 00:06:03,960 --> 00:06:05,410 So if someone sends you a message, 135 00:06:05,410 --> 00:06:07,790 you'll pass it on to all the other people. 136 00:06:07,790 --> 00:06:10,090 So it's called a gossip network. 137 00:06:10,090 --> 00:06:12,310 In practice, it works OK. 138 00:06:12,310 --> 00:06:15,190 It's fairly heavy load on some network traffic, 139 00:06:15,190 --> 00:06:18,880 but you can make transactions, sign them, broadcast them. 140 00:06:18,880 --> 00:06:22,190 And then someone, the miner, takes 141 00:06:22,190 --> 00:06:24,590 all these recent transactions that they've 142 00:06:24,590 --> 00:06:28,930 seen, and puts them into a block, and then does some work. 143 00:06:28,930 --> 00:06:31,080 So those transactions are now confirmed, 144 00:06:31,080 --> 00:06:34,540 and people can build the next block. 145 00:06:34,540 --> 00:06:36,840 So the only difference from the current problem set 146 00:06:36,840 --> 00:06:40,440 is instead of putting your name in, you put-- 147 00:06:40,440 --> 00:06:43,290 you can put your name-- but you also put all these messages 148 00:06:43,290 --> 00:06:44,910 that you've seen recently. 149 00:06:44,910 --> 00:06:47,160 And so you commit to them that way. 150 00:06:47,160 --> 00:06:49,930 You could do it by just sticking them all in, 151 00:06:49,930 --> 00:06:54,780 but instead, there's a bit more advanced way to do it. 152 00:06:54,780 --> 00:06:57,450 You use what's called a block header. 153 00:06:57,450 --> 00:07:00,420 So yeah, the block header itself is the message. 154 00:07:00,420 --> 00:07:03,090 So similar to the problem set, it's the block header 155 00:07:03,090 --> 00:07:06,480 that you need to hash to get a low output value, not the block 156 00:07:06,480 --> 00:07:09,040 itself, which is kind of interesting. 157 00:07:09,040 --> 00:07:12,337 And the headers have a hash of all the transactions 158 00:07:12,337 --> 00:07:12,920 in the blocks. 159 00:07:12,920 --> 00:07:14,730 So you don't just put all the transactions 160 00:07:14,730 --> 00:07:17,550 into one big megabyte data structure, 161 00:07:17,550 --> 00:07:19,860 hash the whole thing, and then try to get a low output. 162 00:07:19,860 --> 00:07:22,465 You actually do some intermediate steps first. 163 00:07:22,465 --> 00:07:24,090 And what's interesting is it's actually 164 00:07:24,090 --> 00:07:26,040 just the headers that make a chain, not 165 00:07:26,040 --> 00:07:27,580 the blocks themselves. 166 00:07:27,580 --> 00:07:30,440 So instead of blockchain, you could call it a headerchain. 167 00:07:30,440 --> 00:07:33,260 So I'll talk about headers. 168 00:07:33,260 --> 00:07:34,950 The headers are 80 bytes. 169 00:07:34,950 --> 00:07:37,650 And they're actually quite similar to the blocks 170 00:07:37,650 --> 00:07:38,900 in problem set 2. 171 00:07:38,900 --> 00:07:41,040 So the main three components are you've 172 00:07:41,040 --> 00:07:43,590 got the previous hash, the Merkle root, and the nonce. 173 00:07:43,590 --> 00:07:45,490 And so this is like in the problem set. 174 00:07:45,490 --> 00:07:47,340 You start with the previous hash. 175 00:07:47,340 --> 00:07:51,073 Then you have data that you're actually committing to. 176 00:07:51,073 --> 00:07:52,740 And then you have some data that doesn't 177 00:07:52,740 --> 00:07:55,960 have any actual meaning, just to get the work done. 178 00:07:55,960 --> 00:08:00,390 So I can-- to reference, if you look through the work 179 00:08:00,390 --> 00:08:03,780 people are doing right now, this is all public, 180 00:08:03,780 --> 00:08:05,430 the current problem set. 181 00:08:05,430 --> 00:08:07,470 There's a previous hash, which is basically 182 00:08:07,470 --> 00:08:09,760 the hash of the line above it. 183 00:08:09,760 --> 00:08:12,420 And then there's some data you're committing to, in this 184 00:08:12,420 --> 00:08:17,040 case people's user names, and then some non-meaningful data 185 00:08:17,040 --> 00:08:23,080 here with just random numbers and stuff it looks like-- 186 00:08:23,080 --> 00:08:24,800 so very similar. 187 00:08:24,800 --> 00:08:26,050 Any questions about this idea? 188 00:08:30,380 --> 00:08:30,880 Cool. 189 00:08:33,970 --> 00:08:38,070 So we use a Merkle root, which I think I talked about last week 190 00:08:38,070 --> 00:08:41,289 Monday, instead of just concatenating 191 00:08:41,289 --> 00:08:43,690 all the transactions and hashing them in together. 192 00:08:43,690 --> 00:08:44,530 You could do that. 193 00:08:44,530 --> 00:08:45,572 That actually would work. 194 00:08:45,572 --> 00:08:47,565 It wouldn't make a huge difference. 195 00:08:47,565 --> 00:08:48,940 But this is a little nicer if you 196 00:08:48,940 --> 00:08:53,050 want to prove that a transaction was in this block, 197 00:08:53,050 --> 00:08:55,840 without giving the whole block. 198 00:08:55,840 --> 00:08:59,080 So the idea is I have these TXIDs. 199 00:08:59,080 --> 00:09:04,540 And a TXID, transaction ID, is just a hash of the transaction. 200 00:09:04,540 --> 00:09:07,450 Stick all the components of the transaction into bytes, 201 00:09:07,450 --> 00:09:09,670 hash that, and you've got what's called a TXID. 202 00:09:09,670 --> 00:09:11,930 And that's how you refer to transactions. 203 00:09:15,060 --> 00:09:18,820 You hash these two together to get this intermediate point. 204 00:09:18,820 --> 00:09:20,980 Do the same thing up to the root. 205 00:09:20,980 --> 00:09:25,420 And so you can't change any of these little transaction IDs 206 00:09:25,420 --> 00:09:27,280 without changing the Merkle root. 207 00:09:27,280 --> 00:09:30,250 So it commits to all the transactions 208 00:09:30,250 --> 00:09:33,402 just the same way it would if you just concatenated them 209 00:09:33,402 --> 00:09:34,610 all together and hashed that. 210 00:09:37,150 --> 00:09:42,400 So it really-- we'll go in to, later, why this is useful. 211 00:09:42,400 --> 00:09:47,160 But in many cases, it really doesn't help too much. 212 00:09:47,160 --> 00:09:49,370 Any getting questions about Merkle, Merkle tree? 213 00:09:49,370 --> 00:09:51,360 Good. 214 00:09:51,360 --> 00:09:56,230 So the actual Bitcoin headers, which many things use, 215 00:09:56,230 --> 00:09:57,340 has a couple of fields. 216 00:09:57,340 --> 00:10:00,850 Some of them are actually not very useful. 217 00:10:00,850 --> 00:10:05,200 But the main two are previous hash, Merkle root, nonce. 218 00:10:05,200 --> 00:10:07,460 And then there's-- I'll talk about the other things. 219 00:10:07,460 --> 00:10:10,300 So there's also a version field right at the beginning. 220 00:10:10,300 --> 00:10:12,000 It's 4 bytes. 221 00:10:12,000 --> 00:10:13,750 It indicates block version. 222 00:10:13,750 --> 00:10:17,530 It's not clear what that's going to be used for in the future. 223 00:10:17,530 --> 00:10:20,970 It used to be used for sort of signaling protocol changes. 224 00:10:20,970 --> 00:10:23,500 I'm not sure that's going to be the case going forward, 225 00:10:23,500 --> 00:10:27,440 because it didn't really work very well for that. 226 00:10:27,440 --> 00:10:30,970 So right now I think they all start like 02 and then a bunch 227 00:10:30,970 --> 00:10:32,290 of zeros. 228 00:10:32,290 --> 00:10:36,110 And that's the current version, whatever that means. 229 00:10:36,110 --> 00:10:40,420 And if you mine something with a different version, 230 00:10:40,420 --> 00:10:41,960 everyone will accept it. 231 00:10:41,960 --> 00:10:43,960 But there'll be like these warnings that show up 232 00:10:43,960 --> 00:10:48,520 in your Bitcoin log files that say, warning, unknown version 233 00:10:48,520 --> 00:10:50,440 detected. 234 00:10:50,440 --> 00:10:53,710 The idea is maybe, well, if the inversion increases or changes, 235 00:10:53,710 --> 00:10:57,580 maybe there's some new rules in this system, or new opcodes, 236 00:10:57,580 --> 00:10:59,243 or new something going on. 237 00:10:59,243 --> 00:11:00,910 And you're not aware of it, so you might 238 00:11:00,910 --> 00:11:03,280 need to upgrade your software. 239 00:11:03,280 --> 00:11:06,050 That was the idea anyway. 240 00:11:06,050 --> 00:11:07,718 In practice, what happens is-- 241 00:11:07,718 --> 00:11:09,260 you'll see in your logs all the time, 242 00:11:09,260 --> 00:11:11,360 that like, unknown version detected. 243 00:11:11,360 --> 00:11:14,030 And it's just someone just set random numbers 244 00:11:14,030 --> 00:11:15,440 in the version field. 245 00:11:15,440 --> 00:11:19,680 And it doesn't seem to mean anything, so not super useful. 246 00:11:19,680 --> 00:11:21,690 Previous hash, just like in the problem set, 247 00:11:21,690 --> 00:11:24,910 it's the hash of the previous block, 32 bytes. 248 00:11:24,910 --> 00:11:27,700 Merkle root, as described a few slides before-- 249 00:11:27,700 --> 00:11:29,560 hash of all the transactions in the block. 250 00:11:32,458 --> 00:11:35,830 Time, actually kind of complex-- 251 00:11:35,830 --> 00:11:39,160 I'm not going to go into the whole thing right here. 252 00:11:39,160 --> 00:11:41,590 So far we haven't really talked about time. 253 00:11:41,590 --> 00:11:48,240 Does anyone know why we'd want time in these headers? 254 00:11:51,200 --> 00:11:51,700 Yeah. 255 00:11:51,700 --> 00:11:53,158 AUDIENCE: You had mentioned earlier 256 00:11:53,158 --> 00:11:56,670 that you don't accept blocks between a certain interval 257 00:11:56,670 --> 00:11:58,458 if they were too late. 258 00:11:58,458 --> 00:11:59,250 TADGE DRYJA: Right. 259 00:11:59,250 --> 00:12:02,850 So it makes sense intuitively that like if someone says, hey, 260 00:12:02,850 --> 00:12:06,180 I mined this in 1987. 261 00:12:06,180 --> 00:12:08,640 It's like well, that seems crazy. 262 00:12:08,640 --> 00:12:10,350 Or if someone says, here's a block. 263 00:12:10,350 --> 00:12:12,380 It came out in 2046. 264 00:12:12,380 --> 00:12:15,960 Like, this doesn't make any sense. 265 00:12:15,960 --> 00:12:17,780 So intuitively, yeah, you shouldn't 266 00:12:17,780 --> 00:12:20,000 accept things that have some crazy date that's 267 00:12:20,000 --> 00:12:21,260 clearly wrong. 268 00:12:21,260 --> 00:12:22,130 But why? 269 00:12:22,130 --> 00:12:23,320 Why do we need time at all? 270 00:12:25,660 --> 00:12:26,160 Yeah. 271 00:12:26,160 --> 00:12:28,035 AUDIENCE: If you want to lock the transaction 272 00:12:28,035 --> 00:12:29,830 until a certain time. 273 00:12:29,830 --> 00:12:32,960 TADGE DRYJA: Yeah, you could say, here's a transaction, 274 00:12:32,960 --> 00:12:37,040 and I don't want it to be valid before August 1. 275 00:12:37,040 --> 00:12:41,070 And so then, you could say, if it goes into a block, 276 00:12:41,070 --> 00:12:44,570 and that block has a timestamp before August 1, 277 00:12:44,570 --> 00:12:46,430 consider the block invalid. 278 00:12:46,430 --> 00:12:48,530 You could. 279 00:12:48,530 --> 00:12:51,980 You can also do timestamping based just on block height. 280 00:12:51,980 --> 00:12:54,240 But what's the main-- does anyone know the main reason 281 00:12:54,240 --> 00:12:55,340 to have height here? 282 00:12:55,340 --> 00:12:55,955 Yeah. 283 00:12:55,955 --> 00:12:57,372 AUDIENCE: Is it because if they're 284 00:12:57,372 --> 00:12:58,920 a competing transaction? 285 00:12:58,920 --> 00:13:03,680 And then you would pick one or the other. 286 00:13:03,680 --> 00:13:06,320 TADGE DRYJA: So the competing transactions, when they get in, 287 00:13:06,320 --> 00:13:07,890 they sort of get into a block. 288 00:13:07,890 --> 00:13:09,925 And that sort of solves that competition. 289 00:13:09,925 --> 00:13:11,300 So you have two transactions that 290 00:13:11,300 --> 00:13:13,650 both are mutually exclusive. 291 00:13:13,650 --> 00:13:16,333 Well, if they're both in the same Merkle root and both 292 00:13:16,333 --> 00:13:17,750 in the same block, then that block 293 00:13:17,750 --> 00:13:19,790 is considered invalid, because it's like, 294 00:13:19,790 --> 00:13:21,590 hey, you've given me a block. 295 00:13:21,590 --> 00:13:24,140 It's got two things that can't both exist here. 296 00:13:24,140 --> 00:13:27,020 So throw the block away. 297 00:13:27,020 --> 00:13:30,520 If you find two blocks that both seem to have-- 298 00:13:30,520 --> 00:13:31,880 that sort of collide. 299 00:13:31,880 --> 00:13:33,350 They're both pointing to the-- 300 00:13:33,350 --> 00:13:35,270 they've both got the same previous hash. 301 00:13:35,270 --> 00:13:40,550 So they're both in the same height, we call, of the chain. 302 00:13:40,550 --> 00:13:44,645 You could say, oh, well, whichever one came out first. 303 00:13:44,645 --> 00:13:46,270 I'll look at the timestamp and say, OK, 304 00:13:46,270 --> 00:13:49,780 the block that came out first will be the valid one. 305 00:13:49,780 --> 00:13:53,680 But the problem is this is claimed block creation. 306 00:13:53,680 --> 00:13:56,590 You can put whatever 4 bytes you want in there. 307 00:13:56,590 --> 00:13:58,570 And so you can always say, oh, I just 308 00:13:58,570 --> 00:14:01,690 wanted it exactly 1 second after the previous block. 309 00:14:01,690 --> 00:14:05,380 It just took me a while to broadcast it. 310 00:14:05,380 --> 00:14:07,630 So you can't really trust the timestamp 311 00:14:07,630 --> 00:14:09,460 to see which came first. 312 00:14:09,460 --> 00:14:12,010 If you could, you wouldn't need all this crazy mining stuff. 313 00:14:12,010 --> 00:14:15,520 And transactions themselves could just have a timestamp. 314 00:14:15,520 --> 00:14:17,860 And you wouldn't need this whole structure. 315 00:14:17,860 --> 00:14:20,050 So the fundamental reason you're mining 316 00:14:20,050 --> 00:14:24,643 is we can't trust people to say when they did something. 317 00:14:24,643 --> 00:14:26,810 You can always say, no, this transaction came first. 318 00:14:26,810 --> 00:14:28,450 No, this came first. 319 00:14:28,450 --> 00:14:30,160 So the real reason for this blockchain 320 00:14:30,160 --> 00:14:33,442 is, OK, we know which came before what. 321 00:14:33,442 --> 00:14:35,150 AUDIENCE: In practice, which one happens? 322 00:14:35,150 --> 00:14:37,540 Do people just lie and say it happened a second later? 323 00:14:37,540 --> 00:14:38,868 Or is it [INAUDIBLE]? 324 00:14:38,868 --> 00:14:40,660 TADGE DRYJA: Oh, in practice the timestamps 325 00:14:40,660 --> 00:14:43,810 are pretty unreliable. 326 00:14:43,810 --> 00:14:48,330 They can be off by minutes. 327 00:14:48,330 --> 00:14:50,650 It can be before the previous block's time. 328 00:14:50,650 --> 00:14:53,050 And that's OK. 329 00:14:53,050 --> 00:14:55,987 It seems intuitively like, well, that should just be a rule. 330 00:14:55,987 --> 00:14:58,570 And it probably-- it would have been cool if it was a rule and 331 00:14:58,570 --> 00:15:00,670 made things simpler from the beginning. 332 00:15:00,670 --> 00:15:04,420 But if you're pointing to a previous block, 333 00:15:04,420 --> 00:15:05,830 I'm building on top of it. 334 00:15:05,830 --> 00:15:09,870 And the previous block came out at 10:15. 335 00:15:09,870 --> 00:15:13,600 And I set my timestamp to 10:12, 3 minutes prior 336 00:15:13,600 --> 00:15:15,010 to the previous block. 337 00:15:15,010 --> 00:15:16,443 Logically, that's impossible. 338 00:15:16,443 --> 00:15:17,860 I'm referencing something, and I'm 339 00:15:17,860 --> 00:15:19,480 saying I'm coming before it. 340 00:15:19,480 --> 00:15:22,002 But the software says that's OK. 341 00:15:22,002 --> 00:15:23,710 AUDIENCE: So if we're creating a version, 342 00:15:23,710 --> 00:15:26,973 would it be useful to just get rid of version and time, 343 00:15:26,973 --> 00:15:28,640 like if we're creating a new blockchain? 344 00:15:28,640 --> 00:15:30,937 TADGE DRYJA: So version, maybe you could get rid of, 345 00:15:30,937 --> 00:15:33,520 or you could put it somewhere in the Merkle root or something. 346 00:15:33,520 --> 00:15:36,430 Time actually does have a really useful purpose. 347 00:15:39,150 --> 00:15:41,690 Does anyone, maybe if you know? 348 00:15:41,690 --> 00:15:43,710 AUDIENCE: I don't know, but does it 349 00:15:43,710 --> 00:15:45,820 play into the difficulty of the mine? 350 00:15:45,820 --> 00:15:46,500 Does it? 351 00:15:46,500 --> 00:15:49,350 TADGE DRYJA: Yeah, so the main reason for time 352 00:15:49,350 --> 00:15:53,820 here is to adjust the difficulty. 353 00:15:53,820 --> 00:15:56,970 And that happens every 2,016 blocks. 354 00:15:56,970 --> 00:15:59,910 You just look at, OK, how long did this 2,016-block period 355 00:15:59,910 --> 00:16:02,790 take according to these timestamps? 356 00:16:02,790 --> 00:16:05,080 And if it took two weeks, OK, we're good. 357 00:16:05,080 --> 00:16:06,840 The difficulty doesn't have to change. 358 00:16:06,840 --> 00:16:09,450 If it took three weeks, that means the blocks 359 00:16:09,450 --> 00:16:11,070 were coming out very slowly. 360 00:16:11,070 --> 00:16:14,130 And we need to reduce the difficulty. 361 00:16:14,130 --> 00:16:17,670 If the 2,016 blocks came out in one week, that means, 362 00:16:17,670 --> 00:16:20,470 wow, people were mining really fast. 363 00:16:20,470 --> 00:16:22,510 And so we need to increase the difficulty. 364 00:16:22,510 --> 00:16:25,320 So there's this negative feedback mechanism 365 00:16:25,320 --> 00:16:27,120 based on this time. 366 00:16:27,120 --> 00:16:29,430 And it can be tweaked. 367 00:16:29,430 --> 00:16:31,530 It's not accurate. 368 00:16:31,530 --> 00:16:34,180 You can have things coming in the wrong order. 369 00:16:34,180 --> 00:16:36,080 The general rule of thumb-- 370 00:16:36,080 --> 00:16:38,970 the rule in the software is about two hours. 371 00:16:38,970 --> 00:16:41,310 If you see something that's two hours off 372 00:16:41,310 --> 00:16:46,170 from what your internal clock says, you will reject it. 373 00:16:46,170 --> 00:16:48,720 But that's a huge gap. 374 00:16:48,720 --> 00:16:52,350 Most network systems, everyone's got their clocks 375 00:16:52,350 --> 00:16:57,000 to the same second at least, or millisecond. 376 00:16:57,000 --> 00:17:00,670 Two hours is like kind of enormous gaps. 377 00:17:00,670 --> 00:17:03,000 But the system works OK, because you've 378 00:17:03,000 --> 00:17:06,359 got these really long-term difficulty adjustments that 379 00:17:06,359 --> 00:17:09,810 only happen every 2,016 blocks, which in practice is something 380 00:17:09,810 --> 00:17:11,619 like two weeks. 381 00:17:11,619 --> 00:17:15,089 So if someone gets something a few minutes off, 382 00:17:15,089 --> 00:17:17,810 it doesn't really affect things too much. 383 00:17:17,810 --> 00:17:22,079 And it's really only used for, OK, look at the last 2,016 384 00:17:22,079 --> 00:17:25,380 blocks, two weeks-ish of work, of all these blocks, 385 00:17:25,380 --> 00:17:28,410 and see how fast we need to make things. 386 00:17:28,410 --> 00:17:33,270 So that ties into the next field, which is difficulty. 387 00:17:33,270 --> 00:17:35,340 It's in a sort of weird floating pointlike 388 00:17:35,340 --> 00:17:39,120 format with a mantissa and exponent, which 389 00:17:39,120 --> 00:17:40,410 is totally custom. 390 00:17:40,410 --> 00:17:43,620 And you kind of have to write your own code to deal with it. 391 00:17:43,620 --> 00:17:45,570 But it basically says, OK, what does 392 00:17:45,570 --> 00:17:48,420 the number have to-- what does the hash have to be below? 393 00:17:48,420 --> 00:17:50,790 It's not just number of bits. 394 00:17:50,790 --> 00:17:55,170 So in the problem set, I said 33 bits of work. 395 00:17:55,170 --> 00:17:57,420 So that's fairly easy to detect, because you just look 396 00:17:57,420 --> 00:17:59,910 for 33 0-bits in the front. 397 00:17:59,910 --> 00:18:02,040 In Bitcoin, it's not just number of bits. 398 00:18:02,040 --> 00:18:04,478 It's actually a number that it must be below. 399 00:18:04,478 --> 00:18:06,270 If it were just number of bits, the problem 400 00:18:06,270 --> 00:18:10,260 then is your adjustments are fairly coarse, because you can 401 00:18:10,260 --> 00:18:12,120 only adjust by a factor of 2. 402 00:18:12,120 --> 00:18:15,930 You can double your difficulty or half it. 403 00:18:15,930 --> 00:18:19,590 But with this, you can have much smaller difficulty adjustments 404 00:18:19,590 --> 00:18:20,940 of like a fraction of a percent. 405 00:18:23,840 --> 00:18:27,873 Yeah, this field is pretty much useless 406 00:18:27,873 --> 00:18:29,540 since you can calculate it from the time 407 00:18:29,540 --> 00:18:31,523 fields of the previous blocks. 408 00:18:31,523 --> 00:18:33,065 So you could just have it be implied. 409 00:18:35,630 --> 00:18:36,530 But it's in there. 410 00:18:36,530 --> 00:18:37,790 And you can just whatever. 411 00:18:37,790 --> 00:18:39,910 It's an extra 4 bytes. 412 00:18:39,910 --> 00:18:42,410 I don't think you actually-- like, you 413 00:18:42,410 --> 00:18:44,330 don't have to store it on a disk if you want, 414 00:18:44,330 --> 00:18:47,893 because you can just figure it out from the other things. 415 00:18:47,893 --> 00:18:52,972 AUDIENCE: Wouldn't you need it for [INAUDIBLE]?? 416 00:18:52,972 --> 00:18:55,430 TADGE DRYJA: No, because you can figure out what difficulty 417 00:18:55,430 --> 00:18:56,472 is just from the headers. 418 00:18:58,833 --> 00:18:59,750 I mean, it's in there. 419 00:18:59,750 --> 00:19:01,670 I guess it's nice if you just want 420 00:19:01,670 --> 00:19:06,880 to validate whether a single header has enough work. 421 00:19:06,880 --> 00:19:10,202 But it's like, how much work does it claim it needs? 422 00:19:10,202 --> 00:19:11,410 And then you can validate it. 423 00:19:11,410 --> 00:19:12,897 But I don't know. 424 00:19:12,897 --> 00:19:13,480 It's in there. 425 00:19:13,480 --> 00:19:16,067 It doesn't-- you could take it out and reorganize the code 426 00:19:16,067 --> 00:19:17,650 a little if you wanted to optimize it. 427 00:19:17,650 --> 00:19:20,580 But that would change so much that no one bothers. 428 00:19:20,580 --> 00:19:23,430 AUDIENCE: So when we talk about the adjusting difficulties 429 00:19:23,430 --> 00:19:27,960 and even just showing the problem or proof of work, who 430 00:19:27,960 --> 00:19:30,180 [INAUDIBLE] for the problems that 431 00:19:30,180 --> 00:19:33,260 will go to the central server? 432 00:19:33,260 --> 00:19:35,170 TADGE DRYJA: So in this, it's just everyone 433 00:19:35,170 --> 00:19:36,700 broadcasts their blocks. 434 00:19:36,700 --> 00:19:41,140 So if you've received a block or if you found a block yourself, 435 00:19:41,140 --> 00:19:44,620 you just send it to all your peers that you're connected to. 436 00:19:44,620 --> 00:19:50,180 And so there's no like, oh, this is the canonical block. 437 00:19:50,180 --> 00:19:52,180 There can be competing blocks where you have two 438 00:19:52,180 --> 00:19:56,110 at the same time and just stochastically, one of them 439 00:19:56,110 --> 00:19:59,110 will pull ahead, because, well, randomly. 440 00:19:59,110 --> 00:20:01,780 So you can have conflicting things. 441 00:20:01,780 --> 00:20:03,520 Yeah, and then the adjustments-- also, 442 00:20:03,520 --> 00:20:06,030 everyone computes the adjustments. 443 00:20:06,030 --> 00:20:08,590 And this is an actually very quick computation, 444 00:20:08,590 --> 00:20:12,670 because you're just looking at-- 445 00:20:12,670 --> 00:20:15,250 you're not even looking at 2,016 timestamps. 446 00:20:15,250 --> 00:20:17,920 You're basically just saying, OK, if height-- 447 00:20:17,920 --> 00:20:20,080 so height is just what block number it is. 448 00:20:20,080 --> 00:20:23,840 So if you're-- right now, it's about 500 million. 449 00:20:23,840 --> 00:20:26,165 No, sorry, 500,000. 450 00:20:26,165 --> 00:20:27,540 So you basically in the code just 451 00:20:27,540 --> 00:20:30,820 say, well, if height modulo 2,016 452 00:20:30,820 --> 00:20:38,800 is equal to 0, check height minus 2,016's block. 453 00:20:38,800 --> 00:20:40,570 Compare the two timestamps. 454 00:20:40,570 --> 00:20:41,950 Subtract them. 455 00:20:41,950 --> 00:20:43,810 Get a duration. 456 00:20:43,810 --> 00:20:46,810 And then compare that duration to two weeks. 457 00:20:46,810 --> 00:20:49,580 And then change the difficulty proportionally. 458 00:20:49,580 --> 00:20:51,980 So it's actually, like, super quick for everyone 459 00:20:51,980 --> 00:20:53,230 to compute the new difficulty. 460 00:20:53,230 --> 00:20:56,120 And they only do it once every two weeks. 461 00:20:56,120 --> 00:20:59,270 And it, yeah, it's pretty straightforward. 462 00:20:59,270 --> 00:21:01,690 There are weird attacks and stuff. 463 00:21:01,690 --> 00:21:05,200 And it's kind of some weird off by 1 errors, where you're-- 464 00:21:05,200 --> 00:21:06,000 I don't remember. 465 00:21:06,000 --> 00:21:07,420 Like, it's kind of confusing. 466 00:21:07,420 --> 00:21:10,150 It's also confusing because the test network, which 467 00:21:10,150 --> 00:21:15,240 I haven't gone into but will use probably in two weeks. 468 00:21:15,240 --> 00:21:19,390 There's a Bitcoin test network, which operates pretty much 469 00:21:19,390 --> 00:21:21,790 exactly the same as Bitcoin, except everyone 470 00:21:21,790 --> 00:21:24,790 agrees that the coins are not worth any money. 471 00:21:24,790 --> 00:21:27,850 What's interesting is it's actually called testnet3. 472 00:21:27,850 --> 00:21:31,540 The first two test networks have the same setup. 473 00:21:31,540 --> 00:21:33,970 However, the agreement that they were not worth any money 474 00:21:33,970 --> 00:21:36,700 broke down. 475 00:21:36,700 --> 00:21:38,530 So at testnet1, someone said, hey, 476 00:21:38,530 --> 00:21:42,580 I'll pay you a bitcoin for a million testnet coins. 477 00:21:42,580 --> 00:21:44,110 And once people saw this happening, 478 00:21:44,110 --> 00:21:45,610 they said, oh, well, you just ruined testnet. 479 00:21:45,610 --> 00:21:46,900 Now they're worth money. 480 00:21:46,900 --> 00:21:48,040 So we'll go to testnet2. 481 00:21:48,040 --> 00:21:49,220 It happened again. 482 00:21:49,220 --> 00:21:50,720 Testnet3 has had some staying power. 483 00:21:50,720 --> 00:21:55,000 I think people realized that if they try to buy testnet3 coins, 484 00:21:55,000 --> 00:21:58,365 everyone's going to leave and go to testnet4. 485 00:21:58,365 --> 00:22:02,230 So it's kind of fun. 486 00:22:02,230 --> 00:22:05,080 I'd actually be OK with testnet3 coins being worth money, 487 00:22:05,080 --> 00:22:10,270 because I have many, many thousands of them. 488 00:22:10,270 --> 00:22:18,470 But yeah, so one difference, though, 489 00:22:18,470 --> 00:22:20,770 between the test networks and the real network 490 00:22:20,770 --> 00:22:22,600 is the difficulty adjustments. 491 00:22:22,600 --> 00:22:26,080 So I think in the first test network, 492 00:22:26,080 --> 00:22:28,390 it just worked exactly like Bitcoin. 493 00:22:28,390 --> 00:22:31,600 But one of the problems was people would mine, 494 00:22:31,600 --> 00:22:34,030 and the difficulty would increase. 495 00:22:34,030 --> 00:22:36,250 And then people would stop mining, say, oh, I'm 496 00:22:36,250 --> 00:22:38,590 going to test out my mining software. 497 00:22:38,590 --> 00:22:41,095 I'll mine a couple thousand blocks. 498 00:22:41,095 --> 00:22:42,970 Maybe it only takes me a day or two to do so, 499 00:22:42,970 --> 00:22:45,760 because I have a very fast computer compared 500 00:22:45,760 --> 00:22:47,052 to the rest of the network. 501 00:22:47,052 --> 00:22:48,760 And then I say, OK, well, it works, cool. 502 00:22:48,760 --> 00:22:50,427 I'm going to go to the real network now. 503 00:22:50,427 --> 00:22:51,750 And I leave the test network. 504 00:22:51,750 --> 00:22:53,440 And now the difficulty increased, 505 00:22:53,440 --> 00:22:56,560 because let's say 2,000 or 4,000 blocks came out. 506 00:22:56,560 --> 00:22:59,410 And they came out very quickly, so the difficulty went up. 507 00:22:59,410 --> 00:23:01,180 And then all the mining power left. 508 00:23:01,180 --> 00:23:03,550 And so now blocks aren't coming out. 509 00:23:03,550 --> 00:23:07,010 And since the adjustment can be up or down 510 00:23:07,010 --> 00:23:12,190 but happens based on number of blocks, not based on time, 511 00:23:12,190 --> 00:23:15,430 if you have a very high difficulty and the very low 512 00:23:15,430 --> 00:23:18,100 hash rate relative to that difficulty, 513 00:23:18,100 --> 00:23:21,100 it can take weeks or months or years 514 00:23:21,100 --> 00:23:23,830 for the difficulty to reduce. 515 00:23:23,830 --> 00:23:28,090 So testnet3 put in this sort of difficulty nerfing 516 00:23:28,090 --> 00:23:32,950 code, which is probably wrong and not what they intended. 517 00:23:32,950 --> 00:23:36,040 And it has this thing where like if 20 minutes have gone by, 518 00:23:36,040 --> 00:23:37,300 the difficulty lowers. 519 00:23:37,300 --> 00:23:40,540 And it's kind of ugly. 520 00:23:40,540 --> 00:23:45,183 So that's the main place I've dealt with this field. 521 00:23:45,183 --> 00:23:47,350 One other rule with the restriction-- the difficulty 522 00:23:47,350 --> 00:23:50,620 can go up by at most a factor of 4 523 00:23:50,620 --> 00:23:53,770 and drop by at most a factor of 4. 524 00:23:53,770 --> 00:23:57,850 So if you mine 2,016 blocks in one day, 525 00:23:57,850 --> 00:23:59,920 the difficulty goes up 4x but does not 526 00:23:59,920 --> 00:24:05,750 go up 14x or whatever the implied would. 527 00:24:05,750 --> 00:24:06,390 Any-- go ahead. 528 00:24:06,390 --> 00:24:08,170 AUDIENCE: So the difficulty is definitely 529 00:24:08,170 --> 00:24:09,860 constant for two weeks then? 530 00:24:09,860 --> 00:24:11,080 TADGE DRYJA: Yeah. 531 00:24:11,080 --> 00:24:12,400 Well, sorry, not two weeks-- 532 00:24:12,400 --> 00:24:16,920 2,016 blocks, which is generally around two weeks, but yeah. 533 00:24:16,920 --> 00:24:18,600 AUDIENCE: So it unblocks and blocks, 534 00:24:18,600 --> 00:24:20,100 within literally an almost two-week period, 535 00:24:20,100 --> 00:24:21,380 that difficulty would be the same. 536 00:24:21,380 --> 00:24:23,797 TADGE DRYJA: Yeah, so if you actually look at the headers, 537 00:24:23,797 --> 00:24:24,905 this is just the constant. 538 00:24:24,905 --> 00:24:26,030 It just is always the same. 539 00:24:26,030 --> 00:24:27,905 So it's kind of a silly field to be in there. 540 00:24:27,905 --> 00:24:31,475 You never need it, and it's always the same. 541 00:24:31,475 --> 00:24:32,850 Any other questions about-- yeah? 542 00:24:32,850 --> 00:24:35,382 AUDIENCE: How many transactions are usually [INAUDIBLE]?? 543 00:24:35,382 --> 00:24:36,840 TADGE DRYJA: Oh, I'll get to that-- 544 00:24:36,840 --> 00:24:40,170 right now, a couple thousand, 4,000-ish. 545 00:24:40,170 --> 00:24:42,580 We'll get to that I think. 546 00:24:42,580 --> 00:24:43,830 But yeah, in the Merkle root-- 547 00:24:43,830 --> 00:24:47,190 so the height of the Merkle root's like 12-ish. 548 00:24:47,190 --> 00:24:51,450 And it goes out to maybe 4,000 transactions, 549 00:24:51,450 --> 00:24:53,820 sometimes more, sometimes very few. 550 00:24:53,820 --> 00:24:55,560 You'll find empty blocks that just 551 00:24:55,560 --> 00:24:58,990 have one transaction in them. 552 00:24:58,990 --> 00:25:02,190 And that transaction ID just becomes the Merkle root, 553 00:25:02,190 --> 00:25:03,330 because a height-- 554 00:25:03,330 --> 00:25:06,390 it's like a height-zero Merkle tree, 555 00:25:06,390 --> 00:25:08,260 but yeah, something like that. 556 00:25:08,260 --> 00:25:11,350 And then last-- pretty easy-- 557 00:25:11,350 --> 00:25:13,050 there's a nonce, 4 byte. 558 00:25:13,050 --> 00:25:14,580 Anything you want goes in there. 559 00:25:14,580 --> 00:25:18,510 You can think of it as a you went 32, 560 00:25:18,510 --> 00:25:20,850 there's no meaning to it. 561 00:25:20,850 --> 00:25:26,450 So does anyone see a problem with this nonce field? 562 00:25:26,450 --> 00:25:26,950 Yeah. 563 00:25:26,950 --> 00:25:27,992 AUDIENCE: It's too small. 564 00:25:27,992 --> 00:25:29,690 TADGE DRYJA: Yeah, it's too small. 565 00:25:29,690 --> 00:25:33,520 4 bytes-- even in the homework, people 566 00:25:33,520 --> 00:25:37,960 are using 12 something bytes for a nonce. 567 00:25:37,960 --> 00:25:40,990 With only 4 bytes of nonce, you can go through 2 568 00:25:40,990 --> 00:25:45,490 to the 32 possibilities, which is not 569 00:25:45,490 --> 00:25:47,560 enough to mine in almost all cases, 570 00:25:47,560 --> 00:25:50,290 because you're going to need to go through 2 to the 70 571 00:25:50,290 --> 00:25:53,500 possibilities to find a block. 572 00:25:53,500 --> 00:25:57,350 So what are some ideas for how do you deal with this problem? 573 00:25:57,350 --> 00:25:59,920 Like, it would be nice if it was just 8 bytes. 574 00:25:59,920 --> 00:26:01,090 That'd make things simpler. 575 00:26:01,090 --> 00:26:02,800 But the system is what it is. 576 00:26:02,800 --> 00:26:04,210 It's very hard to change. 577 00:26:04,210 --> 00:26:08,992 How, as a miner, would you work around this issue? 578 00:26:08,992 --> 00:26:10,950 AUDIENCE: Adjust the version and time. 579 00:26:10,950 --> 00:26:12,825 TADGE DRYJA: Yeah, so you can adjust version. 580 00:26:12,825 --> 00:26:16,000 So that may be why sometimes weird version numbers come up. 581 00:26:16,000 --> 00:26:17,875 Time is a good one too, since time-- 582 00:26:21,610 --> 00:26:24,350 so yeah, adjust time and also Merkle root. 583 00:26:24,350 --> 00:26:31,710 So time, if you're off by a few seconds, nobody cares. 584 00:26:31,710 --> 00:26:36,600 So use the low bits of this time field as part of your nonce. 585 00:26:36,600 --> 00:26:39,000 It's kind of in the wrong place, but you can make chips 586 00:26:39,000 --> 00:26:40,290 to sort of fiddle-- 587 00:26:40,290 --> 00:26:43,350 twiddle these bits as well. 588 00:26:43,350 --> 00:26:47,340 What's also nice is that every second you can sort of-- 589 00:26:47,340 --> 00:26:48,600 you can do it the wrong way. 590 00:26:48,600 --> 00:26:50,058 And you can say, oh, I'm just going 591 00:26:50,058 --> 00:26:54,450 to take the least significant 4 bits of my time field 592 00:26:54,450 --> 00:26:56,740 and just use them as nonce space randomly. 593 00:26:56,740 --> 00:27:00,720 What's nice is that the actual time progresses by one 594 00:27:00,720 --> 00:27:02,970 bit every second. 595 00:27:02,970 --> 00:27:08,003 So as long as your chip has enough space-- 596 00:27:08,003 --> 00:27:09,670 so you're like, OK, I've got 2 bit to 32 597 00:27:09,670 --> 00:27:14,300 here, another 4 bits here, so I'm at 2 to the 36. 598 00:27:14,300 --> 00:27:17,660 If you're chip only goes through 2 to the 36 hashes 599 00:27:17,660 --> 00:27:20,660 every second, you're good because the actual time 600 00:27:20,660 --> 00:27:22,970 progresses. 601 00:27:22,970 --> 00:27:26,660 The other way you can do it is modify the Merkle root. 602 00:27:26,660 --> 00:27:32,210 And you can do that-- so can you think of ways to modify that 603 00:27:32,210 --> 00:27:34,682 without breaking things? 604 00:27:34,682 --> 00:27:36,450 AUDIENCE: Add or drop the transaction. 605 00:27:36,450 --> 00:27:37,590 TADGE DRYJA: Yeah, you could add or you 606 00:27:37,590 --> 00:27:38,580 could drop a transaction. 607 00:27:38,580 --> 00:27:40,497 So you say, OK, I have all these transactions. 608 00:27:40,497 --> 00:27:41,612 I'm going to drop one. 609 00:27:41,612 --> 00:27:43,320 That's got some disadvantages, because it 610 00:27:43,320 --> 00:27:45,147 may pay fees to the miner. 611 00:27:45,147 --> 00:27:46,355 AUDIENCE: Changing the order? 612 00:27:46,355 --> 00:27:47,855 TADGE DRYJA: Yep, you can swap them. 613 00:27:50,550 --> 00:27:52,840 You can just say, OK, well, these two are independent. 614 00:27:52,840 --> 00:27:54,030 I'm going to swap them. 615 00:27:54,030 --> 00:27:56,780 This will change, which will change that. 616 00:27:56,780 --> 00:27:59,970 So you can swap transactions around. 617 00:27:59,970 --> 00:28:03,120 You can also edit what's called the Coinbase, which I think 618 00:28:03,120 --> 00:28:05,440 is in like one more slide. 619 00:28:05,440 --> 00:28:06,975 So yeah, so there's a bunch of ways 620 00:28:06,975 --> 00:28:08,100 that you can change things. 621 00:28:08,100 --> 00:28:09,800 And so this is really where you're going 622 00:28:09,800 --> 00:28:11,700 to have all the variation. 623 00:28:11,700 --> 00:28:15,140 You have 32-bit bytes here. 624 00:28:15,140 --> 00:28:16,803 And just even if it was just swapping, 625 00:28:16,803 --> 00:28:18,470 and if you have 1,000 transactions, swap 626 00:28:18,470 --> 00:28:19,860 in whatever order you want-- 627 00:28:19,860 --> 00:28:21,570 there's enough sort of entropy there 628 00:28:21,570 --> 00:28:23,338 that you'll be able to find it. 629 00:28:23,338 --> 00:28:25,380 So what's interesting is that the nonce is there. 630 00:28:25,380 --> 00:28:26,797 And it's important, because that's 631 00:28:26,797 --> 00:28:31,290 where sort of the high-speed mining occurs. 632 00:28:31,290 --> 00:28:35,610 But most mining chips will also have circuitry to modify this, 633 00:28:35,610 --> 00:28:38,280 because they're operating so quickly that they 634 00:28:38,280 --> 00:28:41,370 will exhaust the 4-byte nonce space 635 00:28:41,370 --> 00:28:43,390 in a fraction of a second. 636 00:28:43,390 --> 00:28:46,790 And so they'll have to swap two transactions, 637 00:28:46,790 --> 00:28:51,803 recalculate a Merkle root, which involves a few dozen hashes, 638 00:28:51,803 --> 00:28:52,720 and then go back here. 639 00:28:52,720 --> 00:28:54,970 So it actually doesn't hurt their efficiency too much, 640 00:28:54,970 --> 00:28:58,980 because, OK, I just did 4 billion hash operations. 641 00:28:58,980 --> 00:29:01,110 And then I need to do a few dozen more 642 00:29:01,110 --> 00:29:03,540 to get to the next 4 billion. 643 00:29:03,540 --> 00:29:05,710 So it doesn't hurt things too much. 644 00:29:05,710 --> 00:29:09,350 Also, in Bitcoin-- a sort of weird quirk-- 645 00:29:09,350 --> 00:29:11,820 it's called SHA-256d. 646 00:29:11,820 --> 00:29:13,380 They do SHA-256. 647 00:29:13,380 --> 00:29:17,180 And then from the output, they do a SHA-256 again, 648 00:29:17,180 --> 00:29:17,810 not sure why. 649 00:29:17,810 --> 00:29:22,767 I think in one person's first problem set, 650 00:29:22,767 --> 00:29:24,600 they inadvertently were doing the same thing 651 00:29:24,600 --> 00:29:27,710 and was like, yeah, that works. 652 00:29:27,710 --> 00:29:30,572 Satoshi, whoever he or she was, or they, 653 00:29:30,572 --> 00:29:31,530 just put that in there. 654 00:29:31,530 --> 00:29:34,550 Any questions about this header in this format and anything 655 00:29:34,550 --> 00:29:35,780 about it? 656 00:29:35,780 --> 00:29:36,770 It's pretty compact. 657 00:29:36,770 --> 00:29:37,580 It's 80 bytes. 658 00:29:41,244 --> 00:29:43,800 AUDIENCE: [INAUDIBLE] including the Merkle root. 659 00:29:43,800 --> 00:29:45,838 So if you end up mining something, 660 00:29:45,838 --> 00:29:47,380 you don't have to put anything there. 661 00:29:47,380 --> 00:29:50,497 So the only incentive is the transaction fees? 662 00:29:50,497 --> 00:29:52,330 TADGE DRYJA: In the block reward, which I'll 663 00:29:52,330 --> 00:29:53,538 get to in a second, but yeah. 664 00:29:57,220 --> 00:29:58,390 That's a good question. 665 00:29:58,390 --> 00:30:00,710 Can you put nothing in as a Merkle? 666 00:30:00,710 --> 00:30:01,430 I don't think so. 667 00:30:01,430 --> 00:30:02,770 I'm pretty sure you need one transaction. 668 00:30:02,770 --> 00:30:03,380 AUDIENCE: Oh, that's right. 669 00:30:03,380 --> 00:30:05,220 I mean, you need the base for the-- 670 00:30:05,220 --> 00:30:08,120 because even if just did it, you need a [INAUDIBLE] transaction. 671 00:30:08,120 --> 00:30:08,860 TADGE DRYJA: You can do that. 672 00:30:08,860 --> 00:30:09,860 And there's many blocks. 673 00:30:09,860 --> 00:30:12,730 So in the first year or so in 2009, 674 00:30:12,730 --> 00:30:15,880 almost all the blocks are empty and only have one transaction, 675 00:30:15,880 --> 00:30:17,080 because no one was using it. 676 00:30:17,080 --> 00:30:18,490 But people were mining. 677 00:30:18,490 --> 00:30:20,960 So it was very similar to the problem set, where 678 00:30:20,960 --> 00:30:22,960 everyone's just mining, and they're not actually 679 00:30:22,960 --> 00:30:23,668 using the system. 680 00:30:23,668 --> 00:30:25,570 AUDIENCE: Bu then right now-- 681 00:30:25,570 --> 00:30:26,340 TADGE DRYJA: Right now there's tons. 682 00:30:26,340 --> 00:30:27,700 AUDIENCE: People aren't doing that just because there's 683 00:30:27,700 --> 00:30:28,480 transaction fees. 684 00:30:28,480 --> 00:30:29,140 TADGE DRYJA: Right now-- 685 00:30:29,140 --> 00:30:29,950 AUDIENCE: It's their only incentive? 686 00:30:29,950 --> 00:30:31,090 People just decided-- 687 00:30:31,090 --> 00:30:33,100 TADGE DRYJA: No, you still get more bitcoins. 688 00:30:33,100 --> 00:30:34,330 So you still get a reward. 689 00:30:34,330 --> 00:30:40,040 The reason you'll see empty blocks now is a little tricky. 690 00:30:40,040 --> 00:30:42,440 We're not sure, because who knows? 691 00:30:42,440 --> 00:30:47,410 But it's probably because of blind mining, where you receive 692 00:30:47,410 --> 00:30:50,545 a block, and you haven't actually 693 00:30:50,545 --> 00:30:52,420 looked through the contents of the block yet. 694 00:30:52,420 --> 00:30:55,030 But you want to mine the next block. 695 00:30:55,030 --> 00:30:57,820 But you're not sure what transactions to put in. 696 00:30:57,820 --> 00:30:59,980 You see this 80-byte header, and you're like, oh, 697 00:30:59,980 --> 00:31:01,440 someone find a block. 698 00:31:01,440 --> 00:31:03,010 But you only have the header, and you 699 00:31:03,010 --> 00:31:04,135 want to build on top of it. 700 00:31:06,640 --> 00:31:08,890 You have a bunch of transactions you'd like to put in, 701 00:31:08,890 --> 00:31:12,173 but they may have already been put into the previous one. 702 00:31:12,173 --> 00:31:13,840 And so you're like, well, I have no idea 703 00:31:13,840 --> 00:31:14,965 what's in the previous one. 704 00:31:14,965 --> 00:31:17,260 I'm just going to mine a block with nothing in it, 705 00:31:17,260 --> 00:31:19,775 that way I'm sure that I'm not going to conflict 706 00:31:19,775 --> 00:31:20,650 with my previous one. 707 00:31:23,590 --> 00:31:26,740 You'll often see a block with only one transaction very soon 708 00:31:26,740 --> 00:31:28,510 after its predecessor block. 709 00:31:28,510 --> 00:31:32,053 AUDIENCE: So like every once in a while, it would check. 710 00:31:32,053 --> 00:31:34,470 So you're saying if it happened and went to a block really 711 00:31:34,470 --> 00:31:37,197 quickly, there's just more likely to be more transactions. 712 00:31:37,197 --> 00:31:39,280 TADGE DRYJA: Right, right, because a miner might-- 713 00:31:39,280 --> 00:31:41,570 it's actually an optimal strategy for a miner 714 00:31:41,570 --> 00:31:43,610 to say, look, first thing I'm going do, download 715 00:31:43,610 --> 00:31:46,580 the 80-byte header. 716 00:31:46,580 --> 00:31:48,650 Figure out if it's got a valid proof of work. 717 00:31:48,650 --> 00:31:50,990 If it does, I'm going to just assume 718 00:31:50,990 --> 00:31:53,420 it's valid for the next few seconds, 719 00:31:53,420 --> 00:31:56,000 because it probably is. 720 00:31:56,000 --> 00:31:59,430 And then I'm going to try to mine a block on top of that. 721 00:31:59,430 --> 00:32:00,680 Reference this previous block. 722 00:32:00,680 --> 00:32:01,730 Mine a block on top. 723 00:32:01,730 --> 00:32:04,580 The thing is, I have no idea what's actually in the block. 724 00:32:04,580 --> 00:32:07,040 I have no idea what contributes to this Merkle root, 725 00:32:07,040 --> 00:32:09,860 because I haven't even downloaded that data yet. 726 00:32:09,860 --> 00:32:12,890 But I can build on top of it, just from the header. 727 00:32:12,890 --> 00:32:14,360 But I can't include transactions, 728 00:32:14,360 --> 00:32:16,610 because I have no idea what transactions are in here, 729 00:32:16,610 --> 00:32:18,110 so they might conflict. 730 00:32:18,110 --> 00:32:21,920 So I'll just mine sort of blind for a second or two. 731 00:32:21,920 --> 00:32:26,960 And then download all the transactions, validate it. 732 00:32:26,960 --> 00:32:28,850 And now I can include my transactions 733 00:32:28,850 --> 00:32:30,970 that haven't been included. 734 00:32:30,970 --> 00:32:33,230 And so that happens. 735 00:32:33,230 --> 00:32:35,330 It can be an OK strategy. 736 00:32:35,330 --> 00:32:38,870 Sometimes it can lead to you mining an invalid block. 737 00:32:38,870 --> 00:32:42,180 If someone produces an invalid block, and you just see, oh, 738 00:32:42,180 --> 00:32:45,380 well, it's got proof of work, you grab that header, start 739 00:32:45,380 --> 00:32:47,960 mining on top of it. 740 00:32:47,960 --> 00:32:49,480 No matter what you mine, it's going 741 00:32:49,480 --> 00:32:52,460 to be invalid, because it's pointing to an invalid block. 742 00:32:52,460 --> 00:32:56,130 That happened 2015 or 2016. 743 00:32:56,130 --> 00:32:58,187 The summer of 2015 it happened. 744 00:32:58,187 --> 00:32:59,520 And it was like quite extensive. 745 00:32:59,520 --> 00:33:00,930 It was like seven or eight blocks 746 00:33:00,930 --> 00:33:03,870 in a row that were all invalid, because none of the miners 747 00:33:03,870 --> 00:33:06,180 were actually verifying anything. 748 00:33:06,180 --> 00:33:08,670 They were just downloading the headers from each other 749 00:33:08,670 --> 00:33:12,720 and being like, yeah, I mean, you did the work. 750 00:33:12,720 --> 00:33:14,910 So they were just assuming everyone else 751 00:33:14,910 --> 00:33:17,360 was verifying the Merkle roots. 752 00:33:17,360 --> 00:33:20,170 And yeah, so then it ended-- 753 00:33:20,170 --> 00:33:25,110 so they lost 25 times 8, however many bitcoins. 754 00:33:25,110 --> 00:33:28,890 They lost hundreds of bitcoins, which at the time was still 755 00:33:28,890 --> 00:33:32,820 worth quite a bit-- and now it's worth millions of dollars-- 756 00:33:32,820 --> 00:33:35,580 just because they weren't actually checking things. 757 00:33:35,580 --> 00:33:37,840 AUDIENCE: If I mine a block and it's empty, 758 00:33:37,840 --> 00:33:41,070 do I decrease my chances of being mined afterwards? 759 00:33:41,070 --> 00:33:44,160 TADGE DRYJA: No, actually, I would say you'd, sort 760 00:33:44,160 --> 00:33:49,690 of game theoretically, you would increase it, because you're not 761 00:33:49,690 --> 00:33:51,680 depleting the mempool. 762 00:33:51,680 --> 00:33:53,410 I need to talk about the actual mining 763 00:33:53,410 --> 00:33:54,860 coinbase and mempool and stuff. 764 00:33:54,860 --> 00:33:56,530 But yeah, it's a tricky question. 765 00:33:56,530 --> 00:33:57,940 And I'll try to get back to it. 766 00:33:57,940 --> 00:34:00,050 If I don't, bug me again. 767 00:34:00,050 --> 00:34:01,840 TX-- so in this Merkle root, you've 768 00:34:01,840 --> 00:34:03,160 got all these transactions. 769 00:34:03,160 --> 00:34:05,230 They have a specified order. 770 00:34:05,230 --> 00:34:08,050 Transaction 0 is the coinbase transaction. 771 00:34:08,050 --> 00:34:09,219 And it's special. 772 00:34:09,219 --> 00:34:11,965 It generates new coins, and it takes fees 773 00:34:11,965 --> 00:34:13,840 from all the other transactions in the block. 774 00:34:13,840 --> 00:34:16,580 I think Neha mentioned this yesterday, 775 00:34:16,580 --> 00:34:20,139 where if you have a difference between the input amounts 776 00:34:20,139 --> 00:34:24,230 and output amounts, that's implicitly a fee. 777 00:34:24,230 --> 00:34:26,090 So if your input-- 778 00:34:26,090 --> 00:34:29,380 so here, Neha's thing, you're spending 20 coins. 779 00:34:29,380 --> 00:34:31,389 You've got 5, 10, 4. 780 00:34:31,389 --> 00:34:34,060 Well, there's only 19 coins in the output, 781 00:34:34,060 --> 00:34:38,469 so there's an implicit fee of 1 coin. 782 00:34:38,469 --> 00:34:42,820 That one coin can go to transactions 0. 783 00:34:42,820 --> 00:34:47,150 So transaction 0 has essentially no input. 784 00:34:47,150 --> 00:34:50,360 Transaction 0's input field is just empty, 785 00:34:50,360 --> 00:34:53,030 anything you want to put, any bytes you want to put in there. 786 00:34:53,030 --> 00:34:55,489 Its output field generates new coins. 787 00:34:55,489 --> 00:34:57,590 So that's currently 12 and 1/2 coins. 788 00:34:57,590 --> 00:35:01,130 So currently, if you mine a block with only TX0, 789 00:35:01,130 --> 00:35:02,540 you get 12 and 1/2 coins. 790 00:35:02,540 --> 00:35:04,820 If you mine a block with thousands of transactions, 791 00:35:04,820 --> 00:35:08,092 you get the 12 and 1/2 coins plus the difference 792 00:35:08,092 --> 00:35:10,550 between the input and outputs of all the other transactions 793 00:35:10,550 --> 00:35:13,490 in the block, which can be even more than 12 794 00:35:13,490 --> 00:35:17,660 and-- which can recently, like in January or December, 795 00:35:17,660 --> 00:35:21,240 there were quite a few blocks where they're getting 25, 26, 796 00:35:21,240 --> 00:35:24,830 27 coins, because the total fees for the entire block 797 00:35:24,830 --> 00:35:29,510 were more than 12 and 1/2 coins, which is hundreds of thousands 798 00:35:29,510 --> 00:35:30,230 of dollars now. 799 00:35:30,230 --> 00:35:32,540 So it's kind of cool. 800 00:35:32,540 --> 00:35:34,130 The fees have since decreased. 801 00:35:34,130 --> 00:35:35,780 Fees are highly variable. 802 00:35:35,780 --> 00:35:38,930 And we'll talk about fees in a few more lectures. 803 00:35:38,930 --> 00:35:42,800 And it's kind of a mess, but it's an evolving area 804 00:35:42,800 --> 00:35:46,400 in this whole network thing. 805 00:35:46,400 --> 00:35:48,410 So you've got your coinbase transaction. 806 00:35:48,410 --> 00:35:49,230 That's important. 807 00:35:49,230 --> 00:35:52,550 That's why people are doing this stuff, because they want money. 808 00:35:52,550 --> 00:35:55,310 All the other transactions can be shuffled around. 809 00:35:55,310 --> 00:35:58,430 However, they can only spend outputs 810 00:35:58,430 --> 00:36:00,350 from previous transactions. 811 00:36:00,350 --> 00:36:02,540 And previous means they have an index 812 00:36:02,540 --> 00:36:04,610 within the block that's lower. 813 00:36:04,610 --> 00:36:07,040 So for example, you have transaction 814 00:36:07,040 --> 00:36:11,720 B spends an output of transaction A. Transaction A 815 00:36:11,720 --> 00:36:14,840 must come first in the block ordering. 816 00:36:14,840 --> 00:36:17,540 This makes it so that you can go through in a linear fashion 817 00:36:17,540 --> 00:36:20,180 and validate every transaction in order. 818 00:36:20,180 --> 00:36:23,153 Otherwise, you'd go through, see, OK, transaction 0, I 819 00:36:23,153 --> 00:36:24,320 don't have to validate that. 820 00:36:24,320 --> 00:36:27,228 That's coinbase-- transaction 1, transaction 2. 821 00:36:27,228 --> 00:36:28,520 And then you see transaction 3. 822 00:36:28,520 --> 00:36:30,440 It appears to be spending something 823 00:36:30,440 --> 00:36:32,640 you've never heard of. 824 00:36:32,640 --> 00:36:34,140 So that would at first-- 825 00:36:34,140 --> 00:36:35,778 that would appear to be invalid. 826 00:36:35,778 --> 00:36:38,070 And then maybe you go through another few transactions, 827 00:36:38,070 --> 00:36:42,510 say, oh, this creates the output that this thing I just 828 00:36:42,510 --> 00:36:44,270 saw before spends. 829 00:36:44,270 --> 00:36:45,630 It's sort of out of order. 830 00:36:45,630 --> 00:36:47,820 It makes things very complicated to validate. 831 00:36:47,820 --> 00:36:51,720 And so this rule ensures that if you go through and just check 832 00:36:51,720 --> 00:36:55,360 every transaction in order, it'll all make sense. 833 00:36:55,360 --> 00:36:55,860 Yes. 834 00:36:55,860 --> 00:36:58,547 AUDIENCE: Is there any benefit moving earlier or later? 835 00:36:58,547 --> 00:36:59,630 TADGE DRYJA: In the block? 836 00:36:59,630 --> 00:37:01,160 AUDIENCE: Yeah. 837 00:37:01,160 --> 00:37:02,640 TADGE DRYJA: No, I don't think so. 838 00:37:02,640 --> 00:37:06,430 I mean, I can't think of one. 839 00:37:06,430 --> 00:37:08,790 Yeah, it's just sort of random. 840 00:37:08,790 --> 00:37:11,010 A lot of times they'll organize it by fee rate. 841 00:37:11,010 --> 00:37:13,350 Or by default, they'll just organize it 842 00:37:13,350 --> 00:37:15,210 by when they saw them first. 843 00:37:15,210 --> 00:37:16,745 So it's pretty arbitrary. 844 00:37:16,745 --> 00:37:18,120 AUDIENCE: Does that mean that you 845 00:37:18,120 --> 00:37:19,825 have to wait until the transaction 846 00:37:19,825 --> 00:37:21,855 that the major output has been mined, 847 00:37:21,855 --> 00:37:23,210 in order spend that again? 848 00:37:23,210 --> 00:37:26,190 TADGE DRYJA: No, although if they did, 849 00:37:26,190 --> 00:37:28,800 that would make the software a lot simpler and easier 850 00:37:28,800 --> 00:37:29,920 to deal with. 851 00:37:29,920 --> 00:37:31,850 But that is not how it works. 852 00:37:36,920 --> 00:37:43,400 So for example-- I'll draw it-- you can have a block where 853 00:37:43,400 --> 00:37:45,350 there's-- let's do it this way. 854 00:37:45,350 --> 00:37:46,550 So you have TX0. 855 00:37:46,550 --> 00:37:51,230 There's coinbase transaction, transaction 1, transaction 2, 856 00:37:51,230 --> 00:37:53,060 transaction 3. 857 00:37:53,060 --> 00:37:56,570 And transaction 3 may be spending something that 858 00:37:56,570 --> 00:37:59,270 was generated in transaction 1. 859 00:37:59,270 --> 00:38:00,590 That can happen. 860 00:38:00,590 --> 00:38:05,300 So if you make transaction 1 broadcast it. it's unconfirmed. 861 00:38:05,300 --> 00:38:07,310 You then make transaction 3 broadcast it, 862 00:38:07,310 --> 00:38:09,150 spends transaction 1. 863 00:38:09,150 --> 00:38:10,850 The miner can put those in the blocks. 864 00:38:10,850 --> 00:38:12,380 They must put it in order. 865 00:38:12,380 --> 00:38:16,310 So if this happens, you can't switch them in order. 866 00:38:16,310 --> 00:38:18,020 But that's considered OK. 867 00:38:18,020 --> 00:38:21,680 It makes parallel-- it makes multi-core validation more 868 00:38:21,680 --> 00:38:23,255 annoying. 869 00:38:23,255 --> 00:38:26,390 If you said, no, you can only use outputs that have already 870 00:38:26,390 --> 00:38:29,840 been confirmed, then the block validation 871 00:38:29,840 --> 00:38:32,240 becomes embarrassingly parallel, because you can just 872 00:38:32,240 --> 00:38:34,955 validate every transaction independently. 873 00:38:34,955 --> 00:38:36,080 That would be kind of nice. 874 00:38:38,756 --> 00:38:40,400 There's other interesting reasons 875 00:38:40,400 --> 00:38:42,595 why this is also useful. 876 00:38:42,595 --> 00:38:43,970 I mean, if I were designing it, I 877 00:38:43,970 --> 00:38:45,680 would say you have to confirm, because it just 878 00:38:45,680 --> 00:38:46,555 makes things simpler. 879 00:38:46,555 --> 00:38:48,730 But I was not Satoshi. 880 00:38:48,730 --> 00:38:50,560 So yeah, the order is fairly arbitrary. 881 00:38:50,560 --> 00:38:53,060 Any other questions about block ordering, Merkle root stuff? 882 00:38:53,060 --> 00:38:55,102 And then we're going to have a quick intermission 883 00:38:55,102 --> 00:38:56,234 right at the halfway point. 884 00:38:58,980 --> 00:39:04,400 Sounds good, so 256-second break. 885 00:39:04,400 --> 00:39:07,070 So now I'll talk about the synchronization process. 886 00:39:07,070 --> 00:39:10,220 How does this actually work in the software 887 00:39:10,220 --> 00:39:12,140 when you download Bitcoin? 888 00:39:12,140 --> 00:39:14,300 So first, you download Bitcoin. 889 00:39:14,300 --> 00:39:16,460 You go to bitcoin.org. 890 00:39:16,460 --> 00:39:19,080 of Your friend hands you a USB drive and says, 891 00:39:19,080 --> 00:39:23,030 hey, I got some good stuff, man, this new thing called Bitcoin. 892 00:39:23,030 --> 00:39:27,110 And so you've got the Bitcoin EXE file or DMG file, 893 00:39:27,110 --> 00:39:28,702 or the binary, or the code. 894 00:39:28,702 --> 00:39:31,160 And you want to know what's been going on for the last nine 895 00:39:31,160 --> 00:39:32,510 years? 896 00:39:32,510 --> 00:39:34,010 So first, you download the binary, 897 00:39:34,010 --> 00:39:35,390 or you compile the code. 898 00:39:35,390 --> 00:39:39,757 And you verify all the GPG signatures of this code, 899 00:39:39,757 --> 00:39:41,090 if you want to do this securely. 900 00:39:41,090 --> 00:39:47,120 So I'm sure everyone has their PGP keys on the MIT PGP server 901 00:39:47,120 --> 00:39:51,770 and goes to key signing parties held on the weekends, right? 902 00:39:51,770 --> 00:39:54,047 Yeah, no? 903 00:39:54,047 --> 00:39:55,130 AUDIENCE: Keybase is cool. 904 00:39:55,130 --> 00:39:55,520 Base 905 00:39:55,520 --> 00:39:57,270 TADGE DRYJA: Keybase is also useful, yeah. 906 00:39:57,270 --> 00:40:00,950 So I have my PGP key hash on my business card. 907 00:40:00,950 --> 00:40:04,460 I don't think anyone's actually ever used it. 908 00:40:04,460 --> 00:40:06,920 But the Bitcoin nerds actually do do this. 909 00:40:06,920 --> 00:40:12,680 And they're very sort of annoying about it, 910 00:40:12,680 --> 00:40:15,740 because a really good attack vector 911 00:40:15,740 --> 00:40:19,430 is to get someone to download compromised Bitcoin code. 912 00:40:19,430 --> 00:40:21,290 It's like the best attack vector ever 913 00:40:21,290 --> 00:40:25,130 if you're trying to do something sneaky, mainly just 914 00:40:25,130 --> 00:40:26,240 to steal all the money. 915 00:40:26,240 --> 00:40:28,790 If you get them to download a Bitcoin binary that you 916 00:40:28,790 --> 00:40:32,840 control, that you put some backdoor code in, 917 00:40:32,840 --> 00:40:34,170 the code can be like two lines. 918 00:40:34,170 --> 00:40:38,240 It's like, open a TCP connection to my computer. 919 00:40:38,240 --> 00:40:39,800 Send me all the private keys. 920 00:40:39,800 --> 00:40:40,790 We're good. 921 00:40:40,790 --> 00:40:43,790 Or if you want to be more sophisticated, 922 00:40:43,790 --> 00:40:46,610 every time they click Send and type in their password, 923 00:40:46,610 --> 00:40:50,060 I just change all the addresses and all the outputs to me, 924 00:40:50,060 --> 00:40:51,910 and like, every time they try to send money. 925 00:40:51,910 --> 00:40:55,080 To UI sort of shows that they're sending money but actually just 926 00:40:55,080 --> 00:40:55,580 send to me. 927 00:40:55,580 --> 00:40:57,372 And they won't find out for a little while. 928 00:40:57,372 --> 00:40:59,600 There's a lot of things where you 929 00:40:59,600 --> 00:41:02,930 want to be running the right Bitcoin code. 930 00:41:02,930 --> 00:41:04,700 And that's a hard problem. 931 00:41:04,700 --> 00:41:07,050 Because we're sort of operating in this Trust List, 932 00:41:07,050 --> 00:41:09,590 decentralized network, how do you 933 00:41:09,590 --> 00:41:11,968 get into this in the beginning? 934 00:41:11,968 --> 00:41:13,760 If it's your friend and saying, hey, here's 935 00:41:13,760 --> 00:41:14,760 the Bitcoin I'm running. 936 00:41:14,760 --> 00:41:15,620 I know this is good. 937 00:41:15,620 --> 00:41:16,407 Then it works. 938 00:41:16,407 --> 00:41:18,740 But just a website-- what if someone hacks the website-- 939 00:41:18,740 --> 00:41:19,670 things like that? 940 00:41:19,670 --> 00:41:21,020 It's like a huge rabbit hole. 941 00:41:21,020 --> 00:41:25,040 And you can try to worry about it for years. 942 00:41:25,040 --> 00:41:26,540 But anyway, you download the binary, 943 00:41:26,540 --> 00:41:28,580 assume you've somehow gotten the binary, 944 00:41:28,580 --> 00:41:31,280 and you're pretty sure it's the right software. 945 00:41:31,280 --> 00:41:32,990 So how do you connect to this network? 946 00:41:32,990 --> 00:41:36,230 Well, there are these hardcoded DNS seeds in order 947 00:41:36,230 --> 00:41:37,670 to find peers in the beginning. 948 00:41:37,670 --> 00:41:40,640 If you know how DNS works, it's how you look up IP addresses 949 00:41:40,640 --> 00:41:41,900 based on a hostname. 950 00:41:41,900 --> 00:41:44,990 There are some servers that will return multiple different IP 951 00:41:44,990 --> 00:41:49,270 addresses every time you query them. 952 00:41:49,270 --> 00:41:54,890 And those are IP addresses of currently running 953 00:41:54,890 --> 00:41:56,042 Bitcoin nodes. 954 00:41:56,042 --> 00:41:58,250 So the idea is, OK, someone's running a Bitcoin node. 955 00:41:58,250 --> 00:42:00,020 They've got their DNS server. 956 00:42:00,020 --> 00:42:02,570 You query that DNS server, and it will hand you 957 00:42:02,570 --> 00:42:05,720 out some IP addresses. 958 00:42:05,720 --> 00:42:08,540 This is also sort of centralized, 959 00:42:08,540 --> 00:42:12,500 slash trusted, slash whatever, in that if someone compromises 960 00:42:12,500 --> 00:42:15,373 these four or five DNS servers, you 961 00:42:15,373 --> 00:42:17,540 might not be able to connect to the Bitcoin network. 962 00:42:17,540 --> 00:42:20,180 So in practice, it's not completely mathematically 963 00:42:20,180 --> 00:42:21,382 secure in Trust List. 964 00:42:21,382 --> 00:42:22,840 There's all these real-world issues 965 00:42:22,840 --> 00:42:25,280 that's like, how do I know I've got the right software? 966 00:42:25,280 --> 00:42:28,080 How do I know I'm connecting to the actual Bitcoin network? 967 00:42:28,080 --> 00:42:30,230 What if my ISP is blocking me and sending me 968 00:42:30,230 --> 00:42:33,540 to some other network, or things like that? 969 00:42:33,540 --> 00:42:36,350 So in practice, it sort of works OK right now. 970 00:42:36,350 --> 00:42:37,910 You connect to the DNS seeds. 971 00:42:37,910 --> 00:42:40,310 And then you connect to a Bitcoin node, 972 00:42:40,310 --> 00:42:41,420 and you ask for headers. 973 00:42:41,420 --> 00:42:45,140 You say, hey, I just showed up. 974 00:42:45,140 --> 00:42:46,490 I know about one header. 975 00:42:46,490 --> 00:42:49,700 There's a hardcoded header in the code called the genesis 976 00:42:49,700 --> 00:42:52,520 block that Satoshi did. 977 00:42:52,520 --> 00:42:54,530 And you say, hey, I've got this genesis block. 978 00:42:54,530 --> 00:42:58,130 Do you know anything that builds above this genesis block, that 979 00:42:58,130 --> 00:42:59,480 comes after? 980 00:42:59,480 --> 00:43:02,588 And they say, yes, I actually know 500,000 headers 981 00:43:02,588 --> 00:43:03,380 that come after it. 982 00:43:03,380 --> 00:43:05,420 And they'll start sending it to you. 983 00:43:05,420 --> 00:43:07,500 They send it to you in a couple of thousand 984 00:43:07,500 --> 00:43:09,710 of headers at a time. 985 00:43:09,710 --> 00:43:14,270 And then you start to download all those and verify them. 986 00:43:14,270 --> 00:43:16,970 The header chain, you get it first. 987 00:43:16,970 --> 00:43:18,500 And it's actually very quick. 988 00:43:18,500 --> 00:43:20,958 You can do it in under a minute if you have a good internet 989 00:43:20,958 --> 00:43:21,590 connection. 990 00:43:21,590 --> 00:43:23,990 And you verify all the work before you do anything else. 991 00:43:23,990 --> 00:43:27,200 So this is nice in that the attacker, in order 992 00:43:27,200 --> 00:43:30,650 to sort of make you do more work here, 993 00:43:30,650 --> 00:43:33,930 would have to do a lot of proof of work. 994 00:43:33,930 --> 00:43:36,890 But for you, it's very quick to verify everything. 995 00:43:36,890 --> 00:43:40,435 Even half a million headers, 30 seconds 996 00:43:40,435 --> 00:43:42,560 if you've got a good internet connection, something 997 00:43:42,560 --> 00:43:46,070 like that, because all you're doing is one hash per header. 998 00:43:46,070 --> 00:43:48,710 You just download the header, check the bits, 999 00:43:48,710 --> 00:43:50,360 check the time, make sure the times are 1000 00:43:50,360 --> 00:43:53,030 like progressing reasonably. 1001 00:43:53,030 --> 00:43:55,270 If the times keep going backwards for like, 1002 00:43:55,270 --> 00:43:59,295 I think it's 10 blocks, then you consider it invalid. 1003 00:43:59,295 --> 00:44:00,920 But your computer can actually do this. 1004 00:44:00,920 --> 00:44:02,660 It's 500,000 hash functions. 1005 00:44:02,660 --> 00:44:05,120 And I'm sure if you've seen for the problem set, 1006 00:44:05,120 --> 00:44:08,310 you can do that in a few seconds in many cases. 1007 00:44:08,310 --> 00:44:11,300 So you can verify the work done throughout the entirety 1008 00:44:11,300 --> 00:44:15,520 of the Bitcoin's existence pretty quickly. 1009 00:44:15,520 --> 00:44:17,920 So then you've got 500,000 headers. 1010 00:44:17,920 --> 00:44:22,110 And now you need to actually download the blocks. 1011 00:44:22,110 --> 00:44:26,450 Any questions about header synchronization? 1012 00:44:26,450 --> 00:44:28,142 Seems pretty straight-- oh, yeah. 1013 00:44:28,142 --> 00:44:29,850 AUDIENCE: Can you catch any of that work, 1014 00:44:29,850 --> 00:44:33,800 since you're going to see some of these every time you sync? 1015 00:44:33,800 --> 00:44:35,820 TADGE DRYJA: Well, yeah, you save it to disk. 1016 00:44:35,820 --> 00:44:38,660 So you don't have to, like, if you shut your computer off, 1017 00:44:38,660 --> 00:44:40,430 turn it on the next time, you've already 1018 00:44:40,430 --> 00:44:41,680 got all those headers on disk. 1019 00:44:41,680 --> 00:44:45,290 Basically, you save them to disk once you've verified them. 1020 00:44:45,290 --> 00:44:46,710 So you download a couple thousand. 1021 00:44:46,710 --> 00:44:48,377 It builds linearly, so it's nice for you 1022 00:44:48,377 --> 00:44:51,043 to like download them, validate, and as you validate, write them 1023 00:44:51,043 --> 00:44:51,560 to disk. 1024 00:44:51,560 --> 00:44:53,810 And then when you start backup, they're on disk. 1025 00:44:53,810 --> 00:44:54,890 You trust your own disk. 1026 00:44:54,890 --> 00:44:56,660 If someone goes in and modifies things 1027 00:44:56,660 --> 00:45:01,940 on disk between running of Bitcoin, all bets are off. 1028 00:45:01,940 --> 00:45:04,250 So you sort of implicitly trust that. 1029 00:45:04,250 --> 00:45:06,992 So yeah, that's pretty quick, works well. 1030 00:45:06,992 --> 00:45:08,450 Then you get to the real hard part, 1031 00:45:08,450 --> 00:45:10,640 where you now have to validate all these signatures 1032 00:45:10,640 --> 00:45:12,567 and download all these transactions. 1033 00:45:12,567 --> 00:45:13,400 Any other questions? 1034 00:45:13,400 --> 00:45:15,500 Good? 1035 00:45:15,500 --> 00:45:19,152 So then it's called IBD, initial block download. 1036 00:45:19,152 --> 00:45:20,360 So you get the headers first. 1037 00:45:20,360 --> 00:45:21,290 That's quick. 1038 00:45:21,290 --> 00:45:24,350 Now you start asking your peers, hey, here's 1039 00:45:24,350 --> 00:45:29,510 this header from 2009, block height 1. 1040 00:45:29,510 --> 00:45:30,260 Here's the header. 1041 00:45:30,260 --> 00:45:33,260 Can you give me the full block? 1042 00:45:33,260 --> 00:45:34,250 I have the header. 1043 00:45:34,250 --> 00:45:37,450 What are all the things that go into the Merkle root? 1044 00:45:37,450 --> 00:45:39,950 So you request blocks from peers. 1045 00:45:39,950 --> 00:45:43,010 You match the transaction lists, the Merkle root and the header. 1046 00:45:43,010 --> 00:45:44,900 And you process each transaction in order. 1047 00:45:44,900 --> 00:45:46,490 So download it. 1048 00:45:46,490 --> 00:45:48,210 Say, OK, here's all these transactions. 1049 00:45:48,210 --> 00:45:50,180 Let me take the hash of all of them. 1050 00:45:50,180 --> 00:45:51,860 Compute the Merkle root. 1051 00:45:51,860 --> 00:45:53,840 Make sure it matches the Merkle root 1052 00:45:53,840 --> 00:45:56,540 I see in the header I've already gotten. 1053 00:45:56,540 --> 00:45:59,340 And now process each transaction. 1054 00:45:59,340 --> 00:46:02,950 So what do we do to process transactions? 1055 00:46:02,950 --> 00:46:06,250 So you've got this UTXO DB. 1056 00:46:06,250 --> 00:46:10,220 So this is unspent transaction output. 1057 00:46:10,220 --> 00:46:11,770 So all the cool-- 1058 00:46:11,770 --> 00:46:14,980 I'm sure in like 2030, there will be a new slang term 1059 00:46:14,980 --> 00:46:17,120 where we'll just call money UTXOs, 1060 00:46:17,120 --> 00:46:19,188 like, hey I've got a lot of UTXO. 1061 00:46:19,188 --> 00:46:20,480 I mean, I'm already doing that. 1062 00:46:20,480 --> 00:46:22,420 And I'm pretty ahead of the times, so. 1063 00:46:25,020 --> 00:46:26,770 So you've got this database, which 1064 00:46:26,770 --> 00:46:29,410 is basically a key-value store. 1065 00:46:29,410 --> 00:46:33,400 And it just has transaction ID index-- 1066 00:46:33,400 --> 00:46:38,050 so this sort of how you reference inputs 1067 00:46:38,050 --> 00:46:41,380 in Bitcoin, the transaction ID index as the key. 1068 00:46:41,380 --> 00:46:44,800 And then the value is just the output, 1069 00:46:44,800 --> 00:46:50,000 the scriptsig and 8-byte amount. 1070 00:46:50,000 --> 00:46:51,077 So it's pretty compact. 1071 00:46:51,077 --> 00:46:52,410 You've got all these key values. 1072 00:46:52,410 --> 00:46:54,570 And it's using level DB. 1073 00:46:54,570 --> 00:46:57,290 But you could use some other key-value store database. 1074 00:46:57,290 --> 00:47:00,470 And the idea is, OK, every time you get a transaction, 1075 00:47:00,470 --> 00:47:01,827 validate all the inputs. 1076 00:47:01,827 --> 00:47:03,410 Make sure all the signatures are good. 1077 00:47:03,410 --> 00:47:05,540 Make sure it's spending things that actually 1078 00:47:05,540 --> 00:47:07,250 exist in your UTXO set. 1079 00:47:10,130 --> 00:47:13,870 And delete those inputs from your UTXO DB. 1080 00:47:13,870 --> 00:47:15,800 You say, OK, this transaction is spending 1081 00:47:15,800 --> 00:47:17,523 these inputs, so delete. 1082 00:47:17,523 --> 00:47:18,440 So [INAUDIBLE], sorry. 1083 00:47:18,440 --> 00:47:20,900 First, make sure the transaction's valid, 1084 00:47:20,900 --> 00:47:22,850 given your current UTXO DB. 1085 00:47:22,850 --> 00:47:25,820 So validate that all these inputs exist. 1086 00:47:25,820 --> 00:47:28,450 Validate all the signatures are correct. 1087 00:47:28,450 --> 00:47:31,300 Then you're saying, OK, this transaction is good. 1088 00:47:31,300 --> 00:47:35,530 Now I modify my database by deleting all the inputs that 1089 00:47:35,530 --> 00:47:40,450 are consumed and adding all these new outputs 1090 00:47:40,450 --> 00:47:43,280 for the transaction. 1091 00:47:43,280 --> 00:47:46,130 So this modifies the database in place. 1092 00:47:46,130 --> 00:47:48,380 And you're sort of constantly reading 1093 00:47:48,380 --> 00:47:53,440 from it to validate inputs, and then writing 1094 00:47:53,440 --> 00:47:55,960 to it to delete inputs, and then writing to it 1095 00:47:55,960 --> 00:47:58,450 again to add outputs. 1096 00:47:58,450 --> 00:48:00,940 So it doesn't seem too bad. 1097 00:48:00,940 --> 00:48:03,000 But there's a lot of disk access. 1098 00:48:03,000 --> 00:48:06,280 And the UTXO DB is a key-value store with a lot of keys. 1099 00:48:06,280 --> 00:48:07,690 The values are very small. 1100 00:48:07,690 --> 00:48:10,900 So it's not like a crazy database problem, 1101 00:48:10,900 --> 00:48:14,500 if anyone's interested in databases and stuff. 1102 00:48:14,500 --> 00:48:16,090 But it can be slow. 1103 00:48:16,090 --> 00:48:18,690 And we want to really optimize it. 1104 00:48:18,690 --> 00:48:22,320 So when you think the initial block download, 1105 00:48:22,320 --> 00:48:23,850 you're doing this 300 million times. 1106 00:48:23,850 --> 00:48:28,120 So there's about 300 million transactions historically. 1107 00:48:28,120 --> 00:48:31,140 So you're validating signature, deleting input, adding 1108 00:48:31,140 --> 00:48:32,770 output, 300 million times. 1109 00:48:32,770 --> 00:48:36,000 It ends up being about 170 gigabytes of downloads. 1110 00:48:36,000 --> 00:48:37,500 And then the end result, when you're 1111 00:48:37,500 --> 00:48:41,160 done modifying this database, is that you 1112 00:48:41,160 --> 00:48:44,340 have 55 million transaction outputs remaining. 1113 00:48:44,340 --> 00:48:48,250 And it's about 3.2 gigabytes of disk use. 1114 00:48:48,250 --> 00:48:53,100 So yeah, but you had to download that 170 gigabytes to get 1115 00:48:53,100 --> 00:48:56,850 to the 3.2-gigabyte end state, because most 1116 00:48:56,850 --> 00:48:59,070 of the transactions that have been created and most 1117 00:48:59,070 --> 00:49:01,890 of the outputs have then later been spent. 1118 00:49:01,890 --> 00:49:04,930 So there's a lot of churn. 1119 00:49:04,930 --> 00:49:06,757 So yeah, of the 300 million-- 1120 00:49:06,757 --> 00:49:08,340 sorry, these are not the same numbers. 1121 00:49:08,340 --> 00:49:09,710 I was actually looking. 1122 00:49:09,710 --> 00:49:11,837 How many transaction outputs have been created 1123 00:49:11,837 --> 00:49:12,920 throughout all of Bitcoin? 1124 00:49:12,920 --> 00:49:14,212 And I couldn't find the number. 1125 00:49:14,212 --> 00:49:17,120 And I didn't want to write software to figure it out. 1126 00:49:17,120 --> 00:49:22,460 But you can certainly figure it out from the blockchain. 1127 00:49:22,460 --> 00:49:24,920 But yeah, this is transaction outputs. 1128 00:49:24,920 --> 00:49:27,468 How many total transactions have TXOs? 1129 00:49:27,468 --> 00:49:28,010 I'm not sure. 1130 00:49:28,010 --> 00:49:29,660 But yeah, so it's pretty big. 1131 00:49:29,660 --> 00:49:30,650 But it's reasonable. 1132 00:49:30,650 --> 00:49:32,400 Like, we can do this on today's computers. 1133 00:49:32,400 --> 00:49:34,700 If you've got a decent laptop, this is possible. 1134 00:49:34,700 --> 00:49:38,840 This total time taken depends on a lot of factors. 1135 00:49:38,840 --> 00:49:41,120 Has anyone actually done initial block download 1136 00:49:41,120 --> 00:49:42,740 and synced to Bitcoin node, and like, 1137 00:49:42,740 --> 00:49:45,487 want to say about how quickly they did it or? 1138 00:49:45,487 --> 00:49:46,820 OK, James, how long did it take? 1139 00:49:46,820 --> 00:49:50,540 AUDIENCE: For 0.15, it's actually quite quick. 1140 00:49:50,540 --> 00:49:54,117 On a spinning disk it will take about six hours maybe. 1141 00:49:54,117 --> 00:49:55,450 TADGE DRYJA: On a spinning disk? 1142 00:49:55,450 --> 00:49:58,760 AUDIENCE: Yeah, yeah, with the new one, it's really quick. 1143 00:49:58,760 --> 00:50:00,590 TADGE DRYJA: Because I run 0.15.1 1144 00:50:00,590 --> 00:50:02,390 on a laptop with a spinning disk, 1145 00:50:02,390 --> 00:50:07,220 and it'll take like overnight to just sync up a week or so. 1146 00:50:07,220 --> 00:50:08,690 It's really slow, but I don't know. 1147 00:50:08,690 --> 00:50:09,870 AUDIENCE: Like, my mum tried to start it, 1148 00:50:09,870 --> 00:50:10,995 and it did it from scratch. 1149 00:50:10,995 --> 00:50:12,397 It did it in like eight hours. 1150 00:50:12,397 --> 00:50:14,980 TADGE DRYJA: Wow, cool, so eight hours to do the whole thing-- 1151 00:50:14,980 --> 00:50:18,130 anyone else have tried it? 1152 00:50:18,130 --> 00:50:18,630 Yeah. 1153 00:50:18,630 --> 00:50:20,790 AUDIENCE: A while back it took me a week. 1154 00:50:20,790 --> 00:50:22,748 TADGE DRYJA: Yeah, a while back it took a week. 1155 00:50:22,748 --> 00:50:25,270 So the software has been improved quite a bit. 1156 00:50:25,270 --> 00:50:28,110 So if you downloaded it-- 1157 00:50:28,110 --> 00:50:30,000 like, I first downloaded it in 2011. 1158 00:50:30,000 --> 00:50:32,750 And it took overnight to download everything. 1159 00:50:32,750 --> 00:50:34,350 And the download was vastly smaller. 1160 00:50:34,350 --> 00:50:36,960 It was less than a gigabyte to download the entire blockchain. 1161 00:50:36,960 --> 00:50:39,930 So what's interesting is that the time taken 1162 00:50:39,930 --> 00:50:44,310 for initial block download over the last seven years 1163 00:50:44,310 --> 00:50:47,340 has been somewhat constant in that the blockchain gets 1164 00:50:47,340 --> 00:50:48,090 bigger and bigger. 1165 00:50:48,090 --> 00:50:49,890 But there's all these optimizations 1166 00:50:49,890 --> 00:50:51,990 to the code and the databases. 1167 00:50:51,990 --> 00:50:55,160 And so that sort of keeps pace. 1168 00:50:55,160 --> 00:50:57,830 Although actually, I'd say recently it's gotten faster, 1169 00:50:57,830 --> 00:50:59,730 because like 0.15-- 1170 00:50:59,730 --> 00:51:03,830 wait, 0.11 or 0.12 had a big speed-up as well. 1171 00:51:03,830 --> 00:51:06,185 AUDIENCE: They completely refactored the net web code. 1172 00:51:06,185 --> 00:51:08,060 It used to be couples of the synchronization. 1173 00:51:08,060 --> 00:51:09,185 And then they decoupled it. 1174 00:51:09,185 --> 00:51:12,290 TADGE DRYJA: Yeah, I think that was mostly Cory, right? 1175 00:51:12,290 --> 00:51:15,200 So Cory Fields, who also works for the DCI, 1176 00:51:15,200 --> 00:51:18,650 helped to refactor the code, make it a lot faster. 1177 00:51:18,650 --> 00:51:23,090 There's definitely still optimizations, a lot of cool-- 1178 00:51:23,090 --> 00:51:26,120 a lot of it's pretty low-level tweaks kind of stuff. 1179 00:51:26,120 --> 00:51:27,950 But some of them are pretty big things. 1180 00:51:27,950 --> 00:51:29,700 Most of the big things, low-hanging fruit, 1181 00:51:29,700 --> 00:51:31,370 has already been gotten. 1182 00:51:31,370 --> 00:51:35,210 The worry is that long-term, this just keeps going up. 1183 00:51:35,210 --> 00:51:37,888 As the blockchain gets bigger and longer, 1184 00:51:37,888 --> 00:51:38,930 it's going to take heart. 1185 00:51:38,930 --> 00:51:40,880 It's going to be harder to validate. 1186 00:51:40,880 --> 00:51:43,220 It can be parallelized to some extent. 1187 00:51:43,220 --> 00:51:46,760 But there's also network I/O concerns, things like that. 1188 00:51:46,760 --> 00:51:49,760 So it's tricky but doable. 1189 00:51:49,760 --> 00:51:54,360 Any questions about initial block download? 1190 00:51:54,360 --> 00:51:56,130 Good? 1191 00:51:56,130 --> 00:51:59,130 So here's a question. 1192 00:51:59,130 --> 00:52:01,350 You've got this UTXO DB. 1193 00:52:01,350 --> 00:52:02,690 What about this 170 gigabytes? 1194 00:52:02,690 --> 00:52:03,690 Do you have to store it? 1195 00:52:03,690 --> 00:52:06,602 Or can you delete it? 1196 00:52:06,602 --> 00:52:07,810 This you can't delete, right? 1197 00:52:11,160 --> 00:52:13,810 So you think this is OK? 1198 00:52:13,810 --> 00:52:15,600 Yeah, you can maybe delete some of this. 1199 00:52:15,600 --> 00:52:17,058 Actually, there's a lot of research 1200 00:52:17,058 --> 00:52:22,170 into maybe we can delete this, accumulators, cool stuff 1201 00:52:22,170 --> 00:52:23,090 like that. 1202 00:52:23,090 --> 00:52:25,650 It would be really cool to have some kind of data structure 1203 00:52:25,650 --> 00:52:27,960 where we can keep adding these-- 1204 00:52:27,960 --> 00:52:31,590 we can add, remove, and prove, and then seek, 1205 00:52:31,590 --> 00:52:34,168 and see if something's in there, where it either 1206 00:52:34,168 --> 00:52:36,710 is like constant size, or login size, or something like that. 1207 00:52:36,710 --> 00:52:37,627 That'd be really cool. 1208 00:52:37,627 --> 00:52:40,140 There are constructions like that, 1209 00:52:40,140 --> 00:52:42,943 but they don't work for what we're trying to do right now. 1210 00:52:42,943 --> 00:52:44,610 But there's a lot of research into that. 1211 00:52:44,610 --> 00:52:47,070 If anyone here finds some cool data structure 1212 00:52:47,070 --> 00:52:51,290 that you can use for the UTXO DB that doesn't keep growing 1213 00:52:51,290 --> 00:52:54,750 linear at size of the number of keys, everyone in Bitcoin 1214 00:52:54,750 --> 00:52:58,740 will sing your praises forever. 1215 00:52:58,740 --> 00:53:00,780 But it's an active research area. 1216 00:53:00,780 --> 00:53:06,210 So pruning-- by default-- oh, that should be a K, not an M, 1217 00:53:06,210 --> 00:53:06,790 sorry. 1218 00:53:06,790 --> 00:53:09,760 There's only 500K blocks, not 500M. 1219 00:53:09,760 --> 00:53:13,920 Anyway, by default, your client will download all these blocks 1220 00:53:13,920 --> 00:53:15,720 and store them on the disk. 1221 00:53:15,720 --> 00:53:18,600 And that's important because what if someone 1222 00:53:18,600 --> 00:53:22,020 else requests them from you? 1223 00:53:22,020 --> 00:53:23,640 Everyone starts out as a noob. 1224 00:53:23,640 --> 00:53:25,500 Someone else comes and says, hey, guys, 1225 00:53:25,500 --> 00:53:27,750 I just downloaded Bitcoin. 1226 00:53:27,750 --> 00:53:30,160 What's going on for the last nine years? 1227 00:53:30,160 --> 00:53:31,860 And you might want to give them blocks 1228 00:53:31,860 --> 00:53:35,210 to let them into the system. 1229 00:53:35,210 --> 00:53:37,500 So you can serve to others who are doing IBD. 1230 00:53:37,500 --> 00:53:39,980 However, if you want, and your hard drive's small, 1231 00:53:39,980 --> 00:53:41,610 or you have an SSD or something, you 1232 00:53:41,610 --> 00:53:45,540 can prune and delete the blocks after you've done IBD, 1233 00:53:45,540 --> 00:53:49,110 with no loss of security. 1234 00:53:49,110 --> 00:53:52,110 Anyone think of downsides doing so? 1235 00:53:58,140 --> 00:53:59,790 Not really, right? 1236 00:53:59,790 --> 00:54:03,070 The only real downside is sort of this. 1237 00:54:03,070 --> 00:54:04,680 Well, not everyone can prune. 1238 00:54:04,680 --> 00:54:09,128 If everyone prunes, no new entrants to the system. 1239 00:54:09,128 --> 00:54:10,670 So it's a little bit of a tricky sort 1240 00:54:10,670 --> 00:54:13,880 of seed versus leech kind of problem, 1241 00:54:13,880 --> 00:54:17,322 where someone's got to be there to serve up these blocks. 1242 00:54:17,322 --> 00:54:18,530 You don't have to trust them. 1243 00:54:18,530 --> 00:54:20,480 You're still validating all the work, 1244 00:54:20,480 --> 00:54:22,070 validating all the signatures. 1245 00:54:22,070 --> 00:54:24,440 They can't do anything bad. 1246 00:54:24,440 --> 00:54:28,497 But someone's got to be there to provide the network capacity. 1247 00:54:28,497 --> 00:54:29,330 And so it is tricky. 1248 00:54:29,330 --> 00:54:31,010 Like, most of the nodes on the network 1249 00:54:31,010 --> 00:54:34,310 are behind people's cable modem firewall kind of thing. 1250 00:54:34,310 --> 00:54:36,860 So you can't actually connect to them and download. 1251 00:54:36,860 --> 00:54:39,320 And if you run a node that does allow people 1252 00:54:39,320 --> 00:54:41,635 to connect in and serve them blocks, 1253 00:54:41,635 --> 00:54:43,010 people will download quite a bit. 1254 00:54:43,010 --> 00:54:46,730 So I have one in the office over there. 1255 00:54:46,730 --> 00:54:51,600 It ends up sending out about three terabytes a month, 1256 00:54:51,600 --> 00:54:52,440 which is a lot. 1257 00:54:52,440 --> 00:54:57,800 Like, it's dozens of gigabytes a day, 20, 30, I don't know. 1258 00:54:57,800 --> 00:54:59,570 So yeah, people are doing this. 1259 00:54:59,570 --> 00:55:02,210 People are connecting in and downloading all the blocks, 1260 00:55:02,210 --> 00:55:04,070 either through IBD or just keeping up 1261 00:55:04,070 --> 00:55:05,510 with current transactions. 1262 00:55:05,510 --> 00:55:07,460 So yeah, pruning is possible. 1263 00:55:07,460 --> 00:55:12,750 But not everyone can do it, so it's sort of an unsolved issue 1264 00:55:12,750 --> 00:55:13,250 there. 1265 00:55:13,250 --> 00:55:14,667 There's a lot of research into how 1266 00:55:14,667 --> 00:55:17,000 we can do partial pruning, where, OK, I'm 1267 00:55:17,000 --> 00:55:20,180 going to only store the last month's worth of blocks, which 1268 00:55:20,180 --> 00:55:22,460 is mostly what people do, because a lot of people 1269 00:55:22,460 --> 00:55:25,310 have intermittent connectivity, where they'll turn off 1270 00:55:25,310 --> 00:55:28,220 their node and then start it back up again a few days later. 1271 00:55:28,220 --> 00:55:31,930 And they just need to catch up with the last few blocks. 1272 00:55:31,930 --> 00:55:32,680 So pruning's cool. 1273 00:55:32,680 --> 00:55:36,250 That's been in since 0.12 or something. 1274 00:55:36,250 --> 00:55:38,770 So I'll go through-- 1275 00:55:38,770 --> 00:55:41,890 in practice, if you go to your Bitcoin node, 1276 00:55:41,890 --> 00:55:43,250 what does that actually store? 1277 00:55:43,250 --> 00:55:45,080 And if you just go to your Bitcoin folder, 1278 00:55:45,080 --> 00:55:49,460 which in Unix-type OS's is like home directory /.Bitcoin-- 1279 00:55:52,030 --> 00:55:54,490 total random aside, I don't like how 1280 00:55:54,490 --> 00:55:58,090 they put a dot in front of all the really important folders. 1281 00:55:58,090 --> 00:56:00,850 It's like they hide all the important things, like your GPG 1282 00:56:00,850 --> 00:56:01,880 folder, It's got a dot. 1283 00:56:01,880 --> 00:56:03,430 And your Bitcoin folder's got a dot. 1284 00:56:03,430 --> 00:56:05,155 But like, downloads doesn't. 1285 00:56:05,155 --> 00:56:08,740 And like, who cares about that? 1286 00:56:08,740 --> 00:56:13,000 Anyway, so if you just ls in your folder, 1287 00:56:13,000 --> 00:56:14,620 here's all the files. 1288 00:56:14,620 --> 00:56:17,420 And we'll just go through it real quick. 1289 00:56:17,420 --> 00:56:19,090 Here's the files, and I'll describe. 1290 00:56:19,090 --> 00:56:21,850 So there's a banlist.dat. 1291 00:56:21,850 --> 00:56:24,010 This is a list of IP addresses that you have 1292 00:56:24,010 --> 00:56:27,190 banned, because they're bad. 1293 00:56:27,190 --> 00:56:28,550 They're doing something weird. 1294 00:56:28,550 --> 00:56:29,920 So I'll get to it at the end. 1295 00:56:29,920 --> 00:56:33,640 I sort of am thinking of making a ban list for the problem set, 1296 00:56:33,640 --> 00:56:36,910 because there are some nodes that are doing non-good things. 1297 00:56:36,910 --> 00:56:40,390 That was what caused yesterday's outage. 1298 00:56:40,390 --> 00:56:41,410 Someone was connecting. 1299 00:56:41,410 --> 00:56:43,743 Although it was really my fault, because the server code 1300 00:56:43,743 --> 00:56:46,030 was not verifying inputs correctly. 1301 00:56:46,030 --> 00:56:48,340 But yeah, in Bitcoin, you verify everything. 1302 00:56:48,340 --> 00:56:50,830 If people start sending you nonsense data, or they say, 1303 00:56:50,830 --> 00:56:52,420 hey, here's a block, and it's wrong, 1304 00:56:52,420 --> 00:56:55,930 or hey, here's a transaction, and the signatures are wrong, 1305 00:56:55,930 --> 00:56:59,600 you'll pretty quickly ban them, because it's like, well, 1306 00:56:59,600 --> 00:57:01,287 if they're making a mistake-- 1307 00:57:01,287 --> 00:57:01,870 it's computer. 1308 00:57:01,870 --> 00:57:04,790 There's no excuse for making a mistake. 1309 00:57:04,790 --> 00:57:07,032 So either their software is just different than mine, 1310 00:57:07,032 --> 00:57:09,490 or something's wrong with their software or their hardware, 1311 00:57:09,490 --> 00:57:10,032 I don't know. 1312 00:57:10,032 --> 00:57:11,770 But they're wasting my time. 1313 00:57:11,770 --> 00:57:14,200 They're sending me nonsense-- ban. 1314 00:57:14,200 --> 00:57:16,340 So you have your own ban list. 1315 00:57:16,340 --> 00:57:18,120 Then the blue ones are folders. 1316 00:57:18,120 --> 00:57:20,590 I'll talk about those later. 1317 00:57:20,590 --> 00:57:23,350 But you have peers.dat, which is good nodes. 1318 00:57:23,350 --> 00:57:25,660 So it's quite a bit, 4 megabytes. 1319 00:57:25,660 --> 00:57:28,780 And you keep track of here's all the different nodes 1320 00:57:28,780 --> 00:57:32,350 I've connected to for the duration of however 1321 00:57:32,350 --> 00:57:34,330 long I've been using Bitcoin. 1322 00:57:34,330 --> 00:57:36,520 I keep track of all their IP addresses, 1323 00:57:36,520 --> 00:57:40,520 how much uptime they've had, what I've downloaded from them. 1324 00:57:40,520 --> 00:57:43,270 And so I sort of sort them and put the good ones at the top. 1325 00:57:43,270 --> 00:57:46,330 And like, OK, here's all the different Bitcoin nodes. 1326 00:57:46,330 --> 00:57:48,310 So next time I start up Bitcoin, I'm 1327 00:57:48,310 --> 00:57:50,320 going to try to connect to them. 1328 00:57:50,320 --> 00:57:52,060 So this makes the network very robust, 1329 00:57:52,060 --> 00:57:54,520 because everyone remembers everyone else. 1330 00:57:54,520 --> 00:57:55,720 And then when they need to-- 1331 00:57:55,720 --> 00:57:58,330 if there's a network disruption, maybe half the nodes 1332 00:57:58,330 --> 00:58:00,460 go off the network, you can still 1333 00:58:00,460 --> 00:58:02,440 try to connect to all the rest. 1334 00:58:02,440 --> 00:58:06,910 And also peers will share their peers files, not directly, 1335 00:58:06,910 --> 00:58:09,400 but they'll sort of take random samplings of this file 1336 00:58:09,400 --> 00:58:11,650 and share it with each other, so that everyone sort of 1337 00:58:11,650 --> 00:58:14,510 knows about everyone else. 1338 00:58:14,510 --> 00:58:17,890 Then there's a wallet.dat, which is very important, 1339 00:58:17,890 --> 00:58:21,200 because that's got all your precious UTXOs. 1340 00:58:21,200 --> 00:58:25,490 And we'll talk about wallets Monday, I think. 1341 00:58:25,490 --> 00:58:27,770 There's a bitcoin.conf, little config file. 1342 00:58:27,770 --> 00:58:30,635 You can set some settings and things like that; 1343 00:58:30,635 --> 00:58:34,340 a debug file, which shows all these weird messages; 1344 00:58:34,340 --> 00:58:35,750 and a mempool.dat. 1345 00:58:35,750 --> 00:58:40,130 So the mempool is a transaction you've seen that you've not 1346 00:58:40,130 --> 00:58:41,510 seen in a block yet. 1347 00:58:41,510 --> 00:58:44,180 So people are broadcasting transactions. 1348 00:58:44,180 --> 00:58:45,860 And you store them. 1349 00:58:45,860 --> 00:58:48,817 It used to be just in memory, hence the word "mempool." 1350 00:58:48,817 --> 00:58:50,900 Now it's more like disk pool, because you actually 1351 00:58:50,900 --> 00:58:54,170 store them on disk, because it saves a little speed when you 1352 00:58:54,170 --> 00:58:55,850 shut down and start up again. 1353 00:58:55,850 --> 00:58:59,120 So any questions about just what all these files are doing? 1354 00:59:02,467 --> 00:59:03,800 Makes sense, so now the folders. 1355 00:59:06,570 --> 00:59:08,850 Chainstate blocks and database-- so any 1356 00:59:08,850 --> 00:59:11,220 guesses onto how big these things are 1357 00:59:11,220 --> 00:59:14,000 based on previous slides or? 1358 00:59:14,000 --> 00:59:15,970 So how big is chainstate, for example? 1359 00:59:18,830 --> 00:59:19,330 Yes? 1360 00:59:19,330 --> 00:59:21,920 AUDIENCE: 3 gigs. 1361 00:59:21,920 --> 00:59:24,150 TADGE DRYJA: Yeah, 3 gigs. 1362 00:59:24,150 --> 00:59:26,972 This is a UTXO set, 3-ish gigs. 1363 00:59:26,972 --> 00:59:27,930 This is all the blocks. 1364 00:59:27,930 --> 00:59:28,430 What? 1365 00:59:28,430 --> 00:59:29,388 Oh, no. 1366 00:59:32,547 --> 00:59:34,380 And then database, actually, I have no idea. 1367 00:59:34,380 --> 00:59:35,710 Does anyone know what that is? 1368 00:59:35,710 --> 00:59:38,850 There's a database folder, and it's got one little log file. 1369 00:59:38,850 --> 00:59:40,410 And it's like 80 kilobytes. 1370 00:59:40,410 --> 00:59:42,170 I don't know what it is. 1371 00:59:42,170 --> 00:59:43,550 Do you guys know? 1372 00:59:43,550 --> 00:59:45,630 Yeah, I don't know. 1373 00:59:45,630 --> 00:59:48,340 But there's a blocks folder, and that's got all the blocks. 1374 00:59:48,340 --> 00:59:52,440 And that's your huge amount of data. 1375 00:59:52,440 --> 00:59:55,270 And this is the UTXO set, not too bad. 1376 00:59:55,270 --> 00:59:57,290 So yeah, you can look in it. 1377 00:59:57,290 --> 01:00:00,827 It's reasonable but yeah, it's kind of big. 1378 01:00:00,827 --> 01:00:02,410 So any questions about the data stuff? 1379 01:00:02,410 --> 01:00:04,660 I'm going to go into blockchain as a database, 1380 01:00:04,660 --> 01:00:06,790 real quick at the end. 1381 01:00:06,790 --> 01:00:10,480 So it's 186 gigabytes, or alternatively, you 1382 01:00:10,480 --> 01:00:12,310 can think of it as just 3 gigabytes. 1383 01:00:12,310 --> 01:00:14,017 But it's a really crummy database. 1384 01:00:14,017 --> 01:00:15,475 So I've heard a lot that blockchain 1385 01:00:15,475 --> 01:00:16,750 is going to change the world. 1386 01:00:16,750 --> 01:00:19,090 And it's like a database that's shared among everyone. 1387 01:00:19,090 --> 01:00:21,065 And you can query things. 1388 01:00:21,065 --> 01:00:22,190 It's a really bad database. 1389 01:00:22,190 --> 01:00:25,150 So for example, I'm going to have 1390 01:00:25,150 --> 01:00:30,820 some fun interactive questions, where some of these 1391 01:00:30,820 --> 01:00:31,690 are answerable. 1392 01:00:31,690 --> 01:00:33,940 Some of these are not. 1393 01:00:33,940 --> 01:00:37,507 And I'm posing the question to my Bitcoin node. 1394 01:00:37,507 --> 01:00:39,340 So I posed this question to my Bitcoin node. 1395 01:00:39,340 --> 01:00:42,970 Hey, remember transaction 9e95c3 dot, dot, dot, 1396 01:00:42,970 --> 01:00:44,860 from back in 2014? 1397 01:00:44,860 --> 01:00:48,170 And how do you think the Bitcoin node will answer? 1398 01:00:48,170 --> 01:00:51,640 Will it answer, or will it not be able to? 1399 01:00:51,640 --> 01:00:52,280 Any ideas? 1400 01:00:52,280 --> 01:00:52,780 Yeah. 1401 01:00:52,780 --> 01:00:55,195 AUDIENCE: Wait, where does one be easiest [INAUDIBLE]?? 1402 01:00:58,100 --> 01:01:02,120 TADGE DRYJA: 183 plus 3, so the total data usage 1403 01:01:02,120 --> 01:01:05,296 on this computer is 186 gigs. 1404 01:01:05,296 --> 01:01:09,072 The rest are kind of small. 1405 01:01:09,072 --> 01:01:10,658 AUDIENCE: What do you mean? 1406 01:01:10,658 --> 01:01:12,950 TADGE DRYJA: So I mean like, when you're using Bitcoin, 1407 01:01:12,950 --> 01:01:15,680 you've got 186 gigs on your hard drive 1408 01:01:15,680 --> 01:01:19,880 or your SSD devoted to Bitcoin. 1409 01:01:19,880 --> 01:01:23,900 So you've got this 186-gigabyte database, essentially. 1410 01:01:23,900 --> 01:01:26,330 But it's a really crummy database. 1411 01:01:26,330 --> 01:01:29,670 And it can't do a lot of the things you might expect it to. 1412 01:01:29,670 --> 01:01:31,640 So for example, this-- 1413 01:01:31,640 --> 01:01:33,620 arbitrary transaction from the past-- you say, 1414 01:01:33,620 --> 01:01:36,920 hey, there was this transaction a couple of years ago. 1415 01:01:36,920 --> 01:01:38,780 Give me the information about it. 1416 01:01:38,780 --> 01:01:41,175 And what do you think the response from the full node is? 1417 01:01:41,175 --> 01:01:42,050 AUDIENCE: It's valid. 1418 01:01:42,050 --> 01:01:42,657 TADGE DRYJA: What, sorry? 1419 01:01:42,657 --> 01:01:43,850 AUDIENCE: It's valid or not valid. 1420 01:01:43,850 --> 01:01:44,930 TADGE DRYJA: It's valid or not valid. 1421 01:01:44,930 --> 01:01:45,597 Any other ideas? 1422 01:01:45,597 --> 01:01:47,150 AUDIENCE: What's the header? 1423 01:01:47,150 --> 01:01:47,735 TADGE DRYJA: What, sorry? 1424 01:01:47,735 --> 01:01:48,920 AUDIENCE: What's the header of [INAUDIBLE]?? 1425 01:01:48,920 --> 01:01:51,010 TADGE DRYJA: It asks instead for a header. 1426 01:01:51,010 --> 01:01:51,760 Any other ideas? 1427 01:01:51,760 --> 01:01:55,900 Yeah, so sort of that-- it'll say, remember TX disk? 1428 01:01:55,900 --> 01:01:58,090 No, it's somewhere in the blocks maybe, 1429 01:01:58,090 --> 01:01:59,500 but I have no idea where. 1430 01:01:59,500 --> 01:02:01,530 It's not in the chainstate. 1431 01:02:01,530 --> 01:02:04,020 So it just stores the blocks. 1432 01:02:04,020 --> 01:02:06,460 Like, here's this block. 1433 01:02:06,460 --> 01:02:08,170 Here's that block, in line. 1434 01:02:08,170 --> 01:02:10,830 And if you say, hey, there's this transaction. 1435 01:02:10,830 --> 01:02:12,400 OK, go look for it. 1436 01:02:12,400 --> 01:02:17,310 Oh, 2014, well, that might be somewhere in the middle. 1437 01:02:17,310 --> 01:02:20,662 But yeah, if you don't know what block it's in, forget it. 1438 01:02:20,662 --> 01:02:22,120 So it does have an index of blocks. 1439 01:02:22,120 --> 01:02:24,450 It'll tell you a block, but transaction, no luck. 1440 01:02:24,450 --> 01:02:24,950 Yeah. 1441 01:02:24,950 --> 01:02:26,844 AUDIENCE: So it pretty much tells you if it exists, 1442 01:02:26,844 --> 01:02:27,480 and that's it. 1443 01:02:27,480 --> 01:02:29,470 TADGE DRYJA: It won't even tell you if this exists. 1444 01:02:29,470 --> 01:02:30,095 It has no idea. 1445 01:02:30,095 --> 01:02:32,730 AUDIENCE: What's the [INAUDIBLE] in the block though? 1446 01:02:32,730 --> 01:02:35,107 TADGE DRYJA: I might have made that up. 1447 01:02:35,107 --> 01:02:37,190 So if you're saying, hey, here's this transaction. 1448 01:02:37,190 --> 01:02:37,770 Do you remember it? 1449 01:02:37,770 --> 01:02:38,470 Does it exist? 1450 01:02:38,470 --> 01:02:39,610 I don't know. 1451 01:02:39,610 --> 01:02:40,110 Yeah. 1452 01:02:40,110 --> 01:02:43,048 AUDIENCE: If you ask-- if you query about a certain block, 1453 01:02:43,048 --> 01:02:43,840 will it be able to? 1454 01:02:43,840 --> 01:02:45,397 TADGE DRYJA: Yes, and I'll-- 1455 01:02:45,397 --> 01:02:46,230 yeah, good question. 1456 01:02:46,230 --> 01:02:47,310 But I'll get to that. 1457 01:02:47,310 --> 01:02:48,760 I think it's in the later slides. 1458 01:02:48,760 --> 01:02:52,890 But yes, if you create your base on a block hash, 1459 01:02:52,890 --> 01:02:54,880 then it does have that in the database. 1460 01:02:54,880 --> 01:02:56,380 And it'll be able to get it for you. 1461 01:02:56,380 --> 01:02:57,050 Yeah, James. 1462 01:02:57,050 --> 01:02:59,140 AUDIENCE: I know what the database directory does. 1463 01:02:59,140 --> 01:03:00,740 TADGE DRYJA: You know what the database directory does. 1464 01:03:00,740 --> 01:03:02,240 AUDIENCE: Yeah, it's the journaling 1465 01:03:02,240 --> 01:03:03,615 for the other databases. 1466 01:03:03,615 --> 01:03:05,490 TADGE DRYJA: Journaling for other databases-- 1467 01:03:05,490 --> 01:03:07,140 OK, I didn't-- yeah, cool. 1468 01:03:07,140 --> 01:03:10,790 It's very small, so I guess it helps things work. 1469 01:03:10,790 --> 01:03:13,820 So this one-- do you know this transaction? 1470 01:03:13,820 --> 01:03:14,630 No. 1471 01:03:14,630 --> 01:03:16,580 How about this? 1472 01:03:16,580 --> 01:03:19,640 Well, I've got this output. 1473 01:03:19,640 --> 01:03:24,230 It's still there in the UTX-- like, someone spent here. 1474 01:03:24,230 --> 01:03:26,570 It's like, this a transaction in the first one. 1475 01:03:26,570 --> 01:03:27,920 It's an op_return output. 1476 01:03:27,920 --> 01:03:29,360 So it's got some extra data. 1477 01:03:29,360 --> 01:03:31,220 But op_return means it's invalid. 1478 01:03:31,220 --> 01:03:32,150 It can't spend it. 1479 01:03:32,150 --> 01:03:34,202 Can you tell me what the data is? 1480 01:03:34,202 --> 01:03:36,640 Do you think it'll be able to? 1481 01:03:36,640 --> 01:03:40,300 If you query, hey, here's this output, 1482 01:03:40,300 --> 01:03:43,340 what do you think the response will be? 1483 01:03:43,340 --> 01:03:45,371 Yea, nay? 1484 01:03:45,371 --> 01:03:46,730 Nay, OK, I'm seeing nays. 1485 01:03:46,730 --> 01:03:47,540 Yeah. 1486 01:03:47,540 --> 01:03:48,085 Nope. 1487 01:03:48,085 --> 01:03:50,630 If it's an op_return output, even though it's unspent, 1488 01:03:50,630 --> 01:03:52,280 well, it's unspendable. 1489 01:03:52,280 --> 01:03:55,220 So you don't put it in the UTXO database, 1490 01:03:55,220 --> 01:03:58,715 because you just see, oh, this output, op_return is in there. 1491 01:03:58,715 --> 01:03:59,840 Don't bother putting it in. 1492 01:03:59,840 --> 01:04:01,920 No one will ever be able to spend it. 1493 01:04:01,920 --> 01:04:03,680 So there's no reason to put it in. 1494 01:04:03,680 --> 01:04:05,780 So op_returns are used to sort of commit 1495 01:04:05,780 --> 01:04:07,970 to data and all these different protocols. 1496 01:04:07,970 --> 01:04:13,787 But the actual normal code won't store them. 1497 01:04:13,787 --> 01:04:14,370 Anything else? 1498 01:04:14,370 --> 01:04:17,370 Next one, this one-- 1499 01:04:17,370 --> 01:04:19,808 hey, I have a public key. 1500 01:04:19,808 --> 01:04:21,350 And here's the hash of my public key. 1501 01:04:21,350 --> 01:04:22,642 This is essentially an address. 1502 01:04:22,642 --> 01:04:24,165 So we didn't talk about addresses. 1503 01:04:24,165 --> 01:04:26,500 But the Bitcoin addresses that start with like a 1, 1504 01:04:26,500 --> 01:04:29,700 and then have these alphanumeric stuff-- 1505 01:04:29,700 --> 01:04:31,710 it's just a different encoding, slightly shorter 1506 01:04:31,710 --> 01:04:34,230 than hexadecimal, for a pubkey hash. 1507 01:04:34,230 --> 01:04:36,300 So you say, hey, I've got this. 1508 01:04:36,300 --> 01:04:37,290 I have a private key. 1509 01:04:37,290 --> 01:04:38,650 I just computed the public key. 1510 01:04:38,650 --> 01:04:39,430 I hashed it. 1511 01:04:39,430 --> 01:04:40,960 I got this. 1512 01:04:40,960 --> 01:04:42,190 Do I have any money? 1513 01:04:42,190 --> 01:04:43,250 I think I did. 1514 01:04:43,250 --> 01:04:44,020 I don't remember. 1515 01:04:44,020 --> 01:04:45,100 But I remember my private key. 1516 01:04:45,100 --> 01:04:45,808 I backed that up. 1517 01:04:45,808 --> 01:04:47,530 That was the important part. 1518 01:04:47,530 --> 01:04:49,450 Everyone says keep your private keys. 1519 01:04:49,450 --> 01:04:51,190 So I have my private key. 1520 01:04:51,190 --> 01:04:54,040 But all this data and all this blockchain 1521 01:04:54,040 --> 01:04:55,357 stuff, I lost my computer. 1522 01:04:55,357 --> 01:04:57,190 But I have my private, so I've got my money. 1523 01:04:57,190 --> 01:04:58,360 How many coins do I have? 1524 01:04:58,360 --> 01:05:00,580 How many outputs? 1525 01:05:00,580 --> 01:05:05,310 What do you think the full node will tell you? 1526 01:05:05,310 --> 01:05:08,130 Yeah, any ideas? 1527 01:05:08,130 --> 01:05:09,768 It'll say, I don't know. 1528 01:05:09,768 --> 01:05:12,060 Well, you're going to have to search through everything 1529 01:05:12,060 --> 01:05:14,040 in chainstate. 1530 01:05:14,040 --> 01:05:17,040 And it doesn't index based on the public key script, only 1531 01:05:17,040 --> 01:05:19,020 the transaction ID index there. 1532 01:05:19,020 --> 01:05:24,510 It's a key-value store, and the key is this 36-byte txid:index. 1533 01:05:24,510 --> 01:05:26,610 So this is a very real problem. 1534 01:05:26,610 --> 01:05:28,260 Like, OK, I backed up my key. 1535 01:05:28,260 --> 01:05:30,630 Or I took my private keys to some other computer 1536 01:05:30,630 --> 01:05:32,860 or something like that. 1537 01:05:32,860 --> 01:05:33,788 And this is fairly-- 1538 01:05:33,788 --> 01:05:34,830 it's gotten a lot faster. 1539 01:05:34,830 --> 01:05:37,980 It used to take hours, where you had a hard drive, 1540 01:05:37,980 --> 01:05:41,100 and you're like, OK, import a key to this wallet. 1541 01:05:41,100 --> 01:05:43,430 And it's like, well, when did you do transactions? 1542 01:05:43,430 --> 01:05:46,060 It has to look through the entire blockchain, [INAUDIBLE] 1543 01:05:46,060 --> 01:05:49,110 and linearly, to see if any of these transactions 1544 01:05:49,110 --> 01:05:51,850 have an output that matches that, and then says, 1545 01:05:51,850 --> 01:05:53,550 oh, yeah, you got money back in 2013. 1546 01:05:53,550 --> 01:05:55,850 Oh, then you spent it-- 1547 01:05:55,850 --> 01:05:58,460 and sort of replays things, because it 1548 01:05:58,460 --> 01:05:59,710 doesn't have an address index. 1549 01:06:02,530 --> 01:06:04,940 Next one-- this is an example. 1550 01:06:04,940 --> 01:06:08,350 How many coins-- so you say, hey, here's this output, 1551 01:06:08,350 --> 01:06:11,860 this transaction:1, how many coins does it have? 1552 01:06:11,860 --> 01:06:14,970 Will the full node be able to tell you this? 1553 01:06:14,970 --> 01:06:17,540 Yea, nay? 1554 01:06:17,540 --> 01:06:19,840 I'm seeing a bunch of nays. 1555 01:06:19,840 --> 01:06:20,910 No, it will. 1556 01:06:20,910 --> 01:06:24,610 Yeah, this is the one thing it can do. 1557 01:06:24,610 --> 01:06:28,413 So if you say, hey, 7434, dot, dot, dot, colon 1, 1558 01:06:28,413 --> 01:06:29,080 it'll know that. 1559 01:06:29,080 --> 01:06:32,230 That's in the UTXO DB, because that's the key 1560 01:06:32,230 --> 01:06:34,400 that the UTXO DB sorts by. 1561 01:06:34,400 --> 01:06:37,000 So yep, this is a UTXO. 1562 01:06:37,000 --> 01:06:37,720 It's unspent. 1563 01:06:37,720 --> 01:06:40,177 And it has a bunch of coins. 1564 01:06:40,177 --> 01:06:41,260 And this is fairly recent. 1565 01:06:41,260 --> 01:06:42,385 I was just looking through. 1566 01:06:42,385 --> 01:06:45,160 Someone got a couple million bucks worth, cool. 1567 01:06:45,160 --> 01:06:46,048 Is that a million? 1568 01:06:46,048 --> 01:06:48,970 Man. 1569 01:06:48,970 --> 01:06:51,400 Yeah, it's a new UTXO set, hasn't been spent, 1570 01:06:51,400 --> 01:06:55,210 and you can sort quickly based on txid:index pair. 1571 01:06:55,210 --> 01:06:58,180 So I think this is in some software called 1572 01:06:58,180 --> 01:07:00,790 an outpoint, where it's like, you concatenate them. 1573 01:07:00,790 --> 01:07:03,310 And it ends up being 36 bytes. 1574 01:07:03,310 --> 01:07:04,120 This is 32 bytes. 1575 01:07:04,120 --> 01:07:04,915 This is 4. 1576 01:07:04,915 --> 01:07:07,060 So you sort of have a 36-byte outpoint, 1577 01:07:07,060 --> 01:07:10,200 which describes what goes into the UTXO database. 1578 01:07:10,200 --> 01:07:11,976 AUDIENCE: But once it gets respent, 1579 01:07:11,976 --> 01:07:13,350 it's hard to find it again. 1580 01:07:13,350 --> 01:07:15,330 TADGE DRYJA: Yeah, once this is spent, 1581 01:07:15,330 --> 01:07:18,658 you delete it from the UTXO, and you won't remember it anymore. 1582 01:07:18,658 --> 01:07:20,950 It'll just be, hey, you, how many coins does this have? 1583 01:07:20,950 --> 01:07:23,230 You're like, well-- well, you can still answer it. 1584 01:07:23,230 --> 01:07:25,140 You say "none." 1585 01:07:25,140 --> 01:07:28,480 If it's spent, and you say, how many coins does this guy have? 1586 01:07:28,480 --> 01:07:28,980 None. 1587 01:07:28,980 --> 01:07:30,022 It's not in the UTXO set. 1588 01:07:32,695 --> 01:07:33,820 Yeah, so the previous one-- 1589 01:07:33,820 --> 01:07:35,028 I just copied these randomly. 1590 01:07:38,010 --> 01:07:41,790 So any questions about what is stored and what is not stored? 1591 01:07:44,410 --> 01:07:47,220 Basically, keeps track of UTXOs, and it 1592 01:07:47,220 --> 01:07:48,970 keeps track of historic blocks in order 1593 01:07:48,970 --> 01:07:49,998 to give them to people. 1594 01:07:49,998 --> 01:07:51,290 And it keeps after the headers. 1595 01:07:51,290 --> 01:07:52,720 The headers ends up being small. 1596 01:07:52,720 --> 01:07:54,270 All the headers total is like, what? 1597 01:07:54,270 --> 01:07:57,300 40 megs, something like that. 1598 01:07:57,300 --> 01:08:00,470 So yeah, you can add further indices. 1599 01:08:00,470 --> 01:08:03,310 You could write software to answer all these questions 1600 01:08:03,310 --> 01:08:05,410 very quickly. 1601 01:08:05,410 --> 01:08:08,500 But that's not what Bitcoin does by default. 1602 01:08:08,500 --> 01:08:12,220 Those types of indices would take a lot of extra space 1603 01:08:12,220 --> 01:08:15,440 and add a lot of CPU or things like that. 1604 01:08:15,440 --> 01:08:18,520 So a very common thing is an address index, 1605 01:08:18,520 --> 01:08:20,290 so people can ask if they have any money. 1606 01:08:20,290 --> 01:08:22,580 So the second to last one, where you say, 1607 01:08:22,580 --> 01:08:25,367 hey, I have this key hash. 1608 01:08:25,367 --> 01:08:26,200 Do I have any money? 1609 01:08:26,200 --> 01:08:27,850 Do I have any transactions? 1610 01:08:27,850 --> 01:08:29,979 Having an address index is actually 1611 01:08:29,979 --> 01:08:32,240 pretty useful for a lot of things, 1612 01:08:32,240 --> 01:08:37,080 for example, importing keys or like web wallet kind of things. 1613 01:08:37,080 --> 01:08:42,598 But Bitcoin by default doesn't do it, because, well, why? 1614 01:08:42,598 --> 01:08:44,890 You can make arguments that it would be actually useful 1615 01:08:44,890 --> 01:08:49,109 to have in the normal code, but we don't. 1616 01:08:49,109 --> 01:08:52,700 Any other questions about what indices, what it can do, 1617 01:08:52,700 --> 01:08:54,630 what it cannot do? 1618 01:08:54,630 --> 01:08:57,500 Somewhat counterintuitive in many cases, where you say, hey, 1619 01:08:57,500 --> 01:08:58,500 here's this transaction. 1620 01:08:58,500 --> 01:09:01,180 And you can't actually find it. 1621 01:09:01,180 --> 01:09:05,760 Or you have to scan through 180 gigabytes in order to find it. 1622 01:09:05,760 --> 01:09:07,290 So wait, James, I have a question. 1623 01:09:07,290 --> 01:09:11,990 So how big is an address index for what you were working on? 1624 01:09:11,990 --> 01:09:14,800 AUDIENCE: Usually, equal to the size of the chain inside. 1625 01:09:14,800 --> 01:09:17,240 TADGE DRYJA: Wow, so it could be hundreds of gigs. 1626 01:09:17,240 --> 01:09:17,754 Yeah. 1627 01:09:17,754 --> 01:09:18,962 AUDIENCE: It only takes what? 1628 01:09:18,962 --> 01:09:21,560 At least for Bitcoin, it takes usually multiple weeks 1629 01:09:21,560 --> 01:09:22,258 to generate. 1630 01:09:22,258 --> 01:09:23,550 TADGE DRYJA: Weeks to generate? 1631 01:09:23,550 --> 01:09:24,322 I bet you, well-- 1632 01:09:24,322 --> 01:09:26,614 AUDIENCE: Although [INAUDIBLE] usually takes a few days 1633 01:09:26,614 --> 01:09:29,630 on the inside, which is what [INAUDIBLE] takes-- 1634 01:09:29,630 --> 01:09:30,500 weeks. 1635 01:09:30,500 --> 01:09:32,120 TADGE DRYJA: Well, that's inside. 1636 01:09:32,120 --> 01:09:37,250 So the other thing is, these are like fairly involved sort 1637 01:09:37,250 --> 01:09:39,350 of CSE software engineering problems. 1638 01:09:39,350 --> 01:09:41,510 And optimization really works. 1639 01:09:41,510 --> 01:09:46,340 If you download like Bitcoin 0.9, it'll still work. 1640 01:09:46,340 --> 01:09:48,229 But you're never going to catch up. 1641 01:09:48,229 --> 01:09:49,729 Maybe not never, I don't know. 1642 01:09:49,729 --> 01:09:52,939 If you have a fast computer, but it'll take months and months. 1643 01:09:52,939 --> 01:09:55,010 And as people have been updating the software 1644 01:09:55,010 --> 01:09:57,230 and making it faster, making it more efficient, 1645 01:09:57,230 --> 01:09:58,310 now it's quite fast. 1646 01:09:58,310 --> 01:10:01,790 And you can sync the whole thing in a few hours 1647 01:10:01,790 --> 01:10:03,810 on a good computer. 1648 01:10:03,810 --> 01:10:05,780 So address index is one of those things where 1649 01:10:05,780 --> 01:10:08,930 it hasn't had like the full force of all these Bitcoin 1650 01:10:08,930 --> 01:10:12,088 protocol coder people on it, because it's sort of seen 1651 01:10:12,088 --> 01:10:14,630 as like, well, yeah that's kind of a fun feature if you want. 1652 01:10:14,630 --> 01:10:18,900 But it's not like a core utility of Bitcoin. 1653 01:10:18,900 --> 01:10:23,040 So yeah, it is a database, maybe not the best way 1654 01:10:23,040 --> 01:10:24,840 to think of it though. 1655 01:10:24,840 --> 01:10:27,480 Don't think of the blockchain as like a global shared database, 1656 01:10:27,480 --> 01:10:28,470 because it sort of is. 1657 01:10:28,470 --> 01:10:30,570 But it's a fairly specific database 1658 01:10:30,570 --> 01:10:34,470 that isn't useful for many other things. 1659 01:10:34,470 --> 01:10:36,100 Yeah, and it's also untrusted. 1660 01:10:36,100 --> 01:10:39,250 Another part of why is it's untrusted. 1661 01:10:39,250 --> 01:10:41,560 Most of these things exist so that they 1662 01:10:41,560 --> 01:10:43,570 can be used over the peer-to-peer peer network. 1663 01:10:43,570 --> 01:10:45,610 If you request a block, I'll give it to you. 1664 01:10:45,610 --> 01:10:47,140 If you give me a transaction, I'll 1665 01:10:47,140 --> 01:10:49,578 match it against my UTXO set. 1666 01:10:49,578 --> 01:10:51,370 But an address index doesn't work that way. 1667 01:10:51,370 --> 01:10:52,245 It's sort of trusted. 1668 01:10:52,245 --> 01:10:54,800 I can easily omit things. 1669 01:10:54,800 --> 01:10:56,290 If you say, hey, I've got a key. 1670 01:10:56,290 --> 01:10:58,410 What are the transactions involved with this key? 1671 01:10:58,410 --> 01:11:00,160 I can omit things very easily, and there's 1672 01:11:00,160 --> 01:11:02,650 no way for you to prove it or verify it. 1673 01:11:02,650 --> 01:11:06,820 So your DB queries aren't really given out to network peers. 1674 01:11:06,820 --> 01:11:08,030 And network peers are scary. 1675 01:11:08,030 --> 01:11:11,190 And you need to ban them if they act funny. 1676 01:11:11,190 --> 01:11:12,440 And this happens all the time. 1677 01:11:12,440 --> 01:11:14,360 If you look through Bitcoin logs, 1678 01:11:14,360 --> 01:11:17,108 and you have a node that's up, every few seconds, you're 1679 01:11:17,108 --> 01:11:19,400 going to be disconnecting from someone or banning them, 1680 01:11:19,400 --> 01:11:22,340 because they're doing something crazy, trying to hack into you 1681 01:11:22,340 --> 01:11:23,450 or whatever. 1682 01:11:23,450 --> 01:11:25,010 So basically, all you're doing is 1683 01:11:25,010 --> 01:11:27,950 you're providing headers, blocks, transactions. 1684 01:11:27,950 --> 01:11:30,400 And you're sharing the other IPs and nodes. 1685 01:11:30,400 --> 01:11:33,190 You try to simplify it. 1686 01:11:33,190 --> 01:11:34,780 Other questions? 1687 01:11:34,780 --> 01:11:38,530 Yeah, bad database, good for consensus, it kind of works. 1688 01:11:38,530 --> 01:11:41,080 Everyone's got the same UTXO set, 1689 01:11:41,080 --> 01:11:42,700 even though they all really would 1690 01:11:42,700 --> 01:11:44,270 like to change that UTXO set. 1691 01:11:44,270 --> 01:11:46,120 I would much rather everyone had a UTXO 1692 01:11:46,120 --> 01:11:50,780 set where I had those 27-coin UTXO. 1693 01:11:50,780 --> 01:11:52,570 So almost everyone in the systems 1694 01:11:52,570 --> 01:11:55,540 would rather there was a different UTXO set. 1695 01:11:55,540 --> 01:11:58,700 And yet, they all managed to agree on a single UTXO set, 1696 01:11:58,700 --> 01:12:00,850 so pretty cool.