1 00:00:00,090 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,810 Commons license. 3 00:00:03,810 --> 00:00:06,050 Your support will help MIT OpenCourseWare 4 00:00:06,050 --> 00:00:10,140 continue to offer high quality educational resources for free. 5 00:00:10,140 --> 00:00:12,690 To make a donation or to view additional materials 6 00:00:12,690 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,261 at ocw.mit.edu. 8 00:00:21,169 --> 00:00:22,210 JEREMY KEPNER: All right. 9 00:00:22,210 --> 00:00:23,840 I want to thank you all for coming 10 00:00:23,840 --> 00:00:28,710 to the third lecture of the Signal Processing 11 00:00:28,710 --> 00:00:31,262 on Databases class. 12 00:00:31,262 --> 00:00:33,340 So we will go-- just to remind folks, 13 00:00:33,340 --> 00:00:36,200 all the material is available in the distribution 14 00:00:36,200 --> 00:00:38,820 of the software, which you all have 15 00:00:38,820 --> 00:00:41,470 in your [INAUDIBLE] accounts. 16 00:00:41,470 --> 00:00:44,540 Here, we're going to go to lecture two. 17 00:00:49,310 --> 00:00:50,400 Courses. 18 00:00:50,400 --> 00:00:52,840 Course, which brings it together. 19 00:00:52,840 --> 00:00:54,530 Signal processing, which is based 20 00:00:54,530 --> 00:00:57,720 on detection theory, which is based on linear algebra 21 00:00:57,720 --> 00:01:02,320 with databases, which are based on strings and searching 22 00:01:02,320 --> 00:01:06,260 and really bringing those two concepts together 23 00:01:06,260 --> 00:01:07,410 in the course. 24 00:01:07,410 --> 00:01:10,880 So that we can use the mathematics 25 00:01:10,880 --> 00:01:16,930 we know from detection theory and apply it into new domains. 26 00:01:16,930 --> 00:01:22,070 And I'm glad to see we have a full house again. 27 00:01:22,070 --> 00:01:26,560 So far, I've failed in scaring off people. 28 00:01:26,560 --> 00:01:28,290 So we'll see if we can do a better job. 29 00:01:28,290 --> 00:01:30,726 I think this particular lecture should 30 00:01:30,726 --> 00:01:31,850 do a very good job of that. 31 00:01:35,780 --> 00:01:37,660 We're going to be getting into a little bit 32 00:01:37,660 --> 00:01:40,810 some more of the mathematics that 33 00:01:40,810 --> 00:01:45,940 underpins the associative array construct that is in the D4M 34 00:01:45,940 --> 00:01:49,040 technology that we're using in the class 35 00:01:49,040 --> 00:01:57,130 and gets into some group theory concepts that might be 36 00:01:57,130 --> 00:01:59,620 a long time since any of you had any of that or even 37 00:01:59,620 --> 00:02:00,700 recall what that is. 38 00:02:00,700 --> 00:02:05,300 It's actually not that complicated. 39 00:02:05,300 --> 00:02:07,320 Personally, this is my favorite lecture 40 00:02:07,320 --> 00:02:11,560 in terms of it really gets into the deeper underlying elements. 41 00:02:14,890 --> 00:02:18,180 I would fully expect it to make your head hurt a little bit. 42 00:02:18,180 --> 00:02:22,880 We're going to do sort of a fairly rapid blast 43 00:02:22,880 --> 00:02:26,610 through the mathematics here. 44 00:02:26,610 --> 00:02:29,200 And you won't really have to know this really 45 00:02:29,200 --> 00:02:31,620 to effectively use it. 46 00:02:31,620 --> 00:02:36,500 Although, it's good to know that it exists 47 00:02:36,500 --> 00:02:38,000 and that there might be pieces of it 48 00:02:38,000 --> 00:02:42,560 that you'll want to drive into deeper. 49 00:02:42,560 --> 00:02:45,740 And so really, the title of this talk 50 00:02:45,740 --> 00:02:48,410 is called Spreadsheets, Big Tables, and the Algebra 51 00:02:48,410 --> 00:02:50,150 Associative Arrays. 52 00:02:50,150 --> 00:02:52,110 We've lectured on that topic in the past. 53 00:02:52,110 --> 00:02:56,170 So with that, we will get right into it. 54 00:02:56,170 --> 00:02:57,960 So here's the outline for the lecture. 55 00:02:57,960 --> 00:02:59,910 And I should say, you know, there's 56 00:02:59,910 --> 00:03:01,590 a balance between the lecture, and then 57 00:03:01,590 --> 00:03:03,120 we'll have some examples. 58 00:03:03,120 --> 00:03:07,200 This particular lecture will be a little bit more heavy 59 00:03:07,200 --> 00:03:08,210 on the lecture side. 60 00:03:08,210 --> 00:03:11,620 And the example piece is a little bit shorter. 61 00:03:11,620 --> 00:03:15,280 And so a little bit more lecture and a little less example 62 00:03:15,280 --> 00:03:17,980 on this particular class. 63 00:03:17,980 --> 00:03:20,720 So we're going to talk about kind of a mathematical concept 64 00:03:20,720 --> 00:03:22,490 for what are spreadsheets. 65 00:03:22,490 --> 00:03:24,370 And in the introduction here, talk 66 00:03:24,370 --> 00:03:26,070 about what our theoretical goals are, 67 00:03:26,070 --> 00:03:28,120 really what associative arrays are. 68 00:03:28,120 --> 00:03:30,090 Getting into the mathematical definitions, 69 00:03:30,090 --> 00:03:31,180 what is the group theory? 70 00:03:31,180 --> 00:03:36,340 And then leading into the sort of traditional linear algebra, 71 00:03:36,340 --> 00:03:39,292 vector spaces and other types of things, 72 00:03:39,292 --> 00:03:40,750 sort of getting you a sense of this 73 00:03:40,750 --> 00:03:44,780 really is what connects those two things at the deeper 74 00:03:44,780 --> 00:03:45,770 mathematical level. 75 00:03:49,240 --> 00:03:51,930 So what are spreadsheets and big tables? 76 00:03:51,930 --> 00:03:59,740 So spreadsheets-- obviously, Microsoft Excel 77 00:03:59,740 --> 00:04:02,800 tends to dominate the spreadsheet world-- 78 00:04:02,800 --> 00:04:05,580 are arguably the most commonly used analytics structure 79 00:04:05,580 --> 00:04:06,940 on Earth. 80 00:04:06,940 --> 00:04:10,830 Probably 100 million people, if not a larger number, 81 00:04:10,830 --> 00:04:15,620 use a spreadsheet every single day. 82 00:04:15,620 --> 00:04:18,329 Perhaps a good the fraction of the planet 83 00:04:18,329 --> 00:04:23,620 now has used a spreadsheet at some point or the other. 84 00:04:23,620 --> 00:04:26,600 Big tables, which are these large triple store 85 00:04:26,600 --> 00:04:32,120 databases that we have discussed in prior lectures, 86 00:04:32,120 --> 00:04:35,550 are really what is used to store most of the analyzed data 87 00:04:35,550 --> 00:04:36,050 on Earth. 88 00:04:36,050 --> 00:04:41,660 So Google, Amazon, they all have these giant triple store 89 00:04:41,660 --> 00:04:47,940 databases, which can be thought about as gigantic spreadsheets 90 00:04:47,940 --> 00:04:51,480 really, rows and columns and values and stuff. 91 00:04:51,480 --> 00:04:55,170 And the power is that they really can simultaneously 92 00:04:55,170 --> 00:04:58,510 store diverse data. 93 00:04:58,510 --> 00:05:01,160 You can hold in the same data structure strings, 94 00:05:01,160 --> 00:05:03,260 and dates, and integers, and reels, 95 00:05:03,260 --> 00:05:05,480 and all different types of data. 96 00:05:05,480 --> 00:05:09,230 And you can treat them very differently. 97 00:05:09,230 --> 00:05:11,330 As we've shown here, this is just one spreadsheet. 98 00:05:11,330 --> 00:05:13,640 We can treat it as a matrix. 99 00:05:13,640 --> 00:05:20,040 We can treat it as functions, hash table, little mini 100 00:05:20,040 --> 00:05:21,580 databases. 101 00:05:21,580 --> 00:05:23,720 All in the same spreadsheet, we can kind of 102 00:05:23,720 --> 00:05:25,090 hold this information. 103 00:05:25,090 --> 00:05:29,666 So it's a very powerful concept, the spreadsheet. 104 00:05:29,666 --> 00:05:35,420 Yet, we lack any formal mathematical basis for this. 105 00:05:35,420 --> 00:05:38,870 In fact, I looked up in the American Mathematical 106 00:05:38,870 --> 00:05:44,180 Association in the SIAM databases the word spreadsheet. 107 00:05:44,180 --> 00:05:46,390 And it did not appear in a single title 108 00:05:46,390 --> 00:05:49,312 or an abstract other than, say, maybe with reference 109 00:05:49,312 --> 00:05:50,020 to some software. 110 00:05:50,020 --> 00:05:56,910 But this core thing that we use every day-- 111 00:05:56,910 --> 00:05:59,217 every single time we do a Google search we're using it, 112 00:05:59,217 --> 00:06:00,800 every single time we do a spreadsheet, 113 00:06:00,800 --> 00:06:07,210 we're using it-- has no formal mathematical structure. 114 00:06:07,210 --> 00:06:11,030 And so that seems like a little bit of a problem. 115 00:06:11,030 --> 00:06:13,930 Maybe we should have some formal mathematical structure. 116 00:06:13,930 --> 00:06:17,600 And so this mathematical thing called an associative array 117 00:06:17,600 --> 00:06:20,760 actually gives us a mathematical structure that 118 00:06:20,760 --> 00:06:23,920 really encompasses this in a fairly nice way. 119 00:06:23,920 --> 00:06:25,910 I'm sure there's cases that it doesn't. 120 00:06:25,910 --> 00:06:29,540 But the core of what it is really, really works. 121 00:06:29,540 --> 00:06:33,090 So I think that's a powerful concept. 122 00:06:33,090 --> 00:06:34,760 So what is our overall goal here? 123 00:06:34,760 --> 00:06:36,520 So we want to create a formal basis 124 00:06:36,520 --> 00:06:39,260 for working with these types of data structures in a more 125 00:06:39,260 --> 00:06:42,470 mathematically rigorous way. 126 00:06:42,470 --> 00:06:44,410 This allows us to create better algorithms. 127 00:06:44,410 --> 00:06:47,680 Because we can now apply the traditional tools 128 00:06:47,680 --> 00:06:53,910 of linear algebra and detection theory to these types of data. 129 00:06:53,910 --> 00:06:56,720 It saves time in terms of, generally, 130 00:06:56,720 --> 00:06:58,770 if you have the mathematical background, 131 00:06:58,770 --> 00:07:02,020 you can implement things using a lot less code, which 132 00:07:02,020 --> 00:07:03,970 usually means a lot less lines. 133 00:07:03,970 --> 00:07:07,470 And often, you can build on top of existing 134 00:07:07,470 --> 00:07:10,830 optimized libraries, which give you better performance. 135 00:07:10,830 --> 00:07:13,920 And we throw out a number of 50 x less effort. 136 00:07:13,920 --> 00:07:18,640 For those with the required training, they can do this. 137 00:07:18,640 --> 00:07:23,330 And that is something that we've observed a number of occasions. 138 00:07:23,330 --> 00:07:27,600 And you know, it's good for managers, too. 139 00:07:27,600 --> 00:07:31,100 I've tried to recruit group leaders at Lincoln Laboratory. 140 00:07:31,100 --> 00:07:33,980 It's like, you really should use D4M. 141 00:07:33,980 --> 00:07:37,260 I could connect it to SAP no problem. 142 00:07:37,260 --> 00:07:39,660 Get them feeling like they're actually doing programming 143 00:07:39,660 --> 00:07:42,158 again, you know? 144 00:07:42,158 --> 00:07:44,590 I haven't had too many takers of that yet. 145 00:07:44,590 --> 00:07:48,190 So hopefully, maybe as you folks start using it, 146 00:07:48,190 --> 00:07:50,520 they'll see you using it. 147 00:07:50,520 --> 00:07:52,340 Binding to the various enterprise 148 00:07:52,340 --> 00:07:56,100 databases at the laboratory is very easy to do. 149 00:07:56,100 --> 00:07:59,640 And using D4M to analyze it is fairly easy to do. 150 00:07:59,640 --> 00:08:04,530 So I actually use it now instead of-- if I'm going to use Excel, 151 00:08:04,530 --> 00:08:07,750 I'll often be like, no, I'll just write a CSV file 152 00:08:07,750 --> 00:08:11,180 and just use D4M for manipulating basic Excel 153 00:08:11,180 --> 00:08:13,030 types of operations. 154 00:08:13,030 --> 00:08:16,650 Because it feels a lot more natural to me. 155 00:08:16,650 --> 00:08:19,960 And so I certainly have done that. 156 00:08:25,620 --> 00:08:28,030 So we're going to get into this associative array 157 00:08:28,030 --> 00:08:31,280 concept, which we discussed in some of the earlier lectures. 158 00:08:31,280 --> 00:08:34,179 We're going to get into it more deeply now. 159 00:08:34,179 --> 00:08:42,510 But the real benefit of the associative array concept 160 00:08:42,510 --> 00:08:47,500 is it naturally pulls together four different ways 161 00:08:47,500 --> 00:08:52,080 of thinking about your data into one structure. 162 00:08:52,080 --> 00:08:54,360 And you can apply the reasoning from any 163 00:08:54,360 --> 00:08:58,790 of these four ways of thinking sort of interchangeably now. 164 00:08:58,790 --> 00:09:04,835 So we can think about data as associative arrays, 165 00:09:04,835 --> 00:09:10,050 as arrays that are indexed by words. 166 00:09:10,050 --> 00:09:14,220 This is similar to what Perl does in hashes of hashes, where 167 00:09:14,220 --> 00:09:16,960 we can have row keys that are words, 168 00:09:16,960 --> 00:09:18,850 and column keys that are words, and then 169 00:09:18,850 --> 00:09:20,225 values that are words or numbers. 170 00:09:23,130 --> 00:09:26,810 This concept is one to one with our triple store databases. 171 00:09:26,810 --> 00:09:29,460 So we can go naturally from triples to this 172 00:09:29,460 --> 00:09:31,820 and inserting data into a database. 173 00:09:31,820 --> 00:09:34,409 And if our data is an associative array, 174 00:09:34,409 --> 00:09:36,700 we can very quickly say, please insert into a database. 175 00:09:36,700 --> 00:09:39,850 And if we do a query, it can return an associative array 176 00:09:39,850 --> 00:09:42,330 to us very nicely. 177 00:09:42,330 --> 00:09:47,300 This obviously connects to graphs 178 00:09:47,300 --> 00:09:49,420 if we want to store relationships and graphs 179 00:09:49,420 --> 00:09:51,320 between vertices and entities. 180 00:09:51,320 --> 00:09:53,690 So we have alice, bob, cited, which 181 00:09:53,690 --> 00:09:56,830 is a duel to the linear algebraic matrix 182 00:09:56,830 --> 00:09:59,410 formulation of graphs. 183 00:09:59,410 --> 00:10:04,570 Row, Alice, column, Bob, their edge associated between them. 184 00:10:04,570 --> 00:10:07,270 So we have these four very powerful ways 185 00:10:07,270 --> 00:10:08,810 of looking at our data. 186 00:10:08,810 --> 00:10:11,070 And they're now all sort of brought together 187 00:10:11,070 --> 00:10:13,090 as one concept. 188 00:10:13,090 --> 00:10:16,110 So I kind of argued it's one of the best deals in math 189 00:10:16,110 --> 00:10:16,760 you'll get. 190 00:10:16,760 --> 00:10:19,030 If you know one, you get the other three for free. 191 00:10:19,030 --> 00:10:22,170 And that's really what the whole point of math is, right? 192 00:10:22,170 --> 00:10:24,620 If you give me a whole bunch of information, 193 00:10:24,620 --> 00:10:27,310 I can calculate something else that you didn't know. 194 00:10:27,310 --> 00:10:28,890 And typically, in signal processing 195 00:10:28,890 --> 00:10:30,556 that deal is more like, well, if give me 196 00:10:30,556 --> 00:10:33,570 50 pieces of information, I can give you the 51st. 197 00:10:33,570 --> 00:10:37,040 Here, if you give me one, I can give you three. 198 00:10:37,040 --> 00:10:38,290 So it's a really good bargain. 199 00:10:41,510 --> 00:10:46,180 As a little anecdote, I was working with my daughter 200 00:10:46,180 --> 00:10:49,360 teaching her the parallel geometry lines problem. 201 00:10:49,360 --> 00:10:51,350 You know, parallel lines. 202 00:10:51,350 --> 00:10:53,530 I was telling her this is the best deal in math. 203 00:10:53,530 --> 00:10:56,071 Because you know one angle, and you get the other seven free. 204 00:10:56,071 --> 00:10:57,687 So we're not quite as good as that. 205 00:10:57,687 --> 00:10:59,520 That's probably the best deal in math going. 206 00:10:59,520 --> 00:11:01,390 And that's why they teach it so much. 207 00:11:01,390 --> 00:11:05,060 If you know one angle, you get the other seven for free. 208 00:11:05,060 --> 00:11:06,900 Here, if you know one of these concepts, 209 00:11:06,900 --> 00:11:08,920 you essentially get the other for free. 210 00:11:08,920 --> 00:11:10,890 So it's a very powerful concept. 211 00:11:10,890 --> 00:11:16,200 And it's really kind of at the core of the benefit 212 00:11:16,200 --> 00:11:17,345 of this technology. 213 00:11:23,110 --> 00:11:25,400 As much as possible, we try and make it 214 00:11:25,400 --> 00:11:28,120 that any operation on an associative array 215 00:11:28,120 --> 00:11:31,480 returns another associative array. 216 00:11:31,480 --> 00:11:34,120 So this gives you sort of the mathematical concept 217 00:11:34,120 --> 00:11:34,700 of closure. 218 00:11:34,700 --> 00:11:38,180 Linear algebra works because the vast majority 219 00:11:38,180 --> 00:11:41,820 of operations on a matrix return another matrix. 220 00:11:41,820 --> 00:11:45,015 And so you can compose these together mathematically 221 00:11:45,015 --> 00:11:47,850 and create sequences of operation, 222 00:11:47,850 --> 00:11:50,940 which are conceptually easy to think about. 223 00:11:50,940 --> 00:11:55,200 So because of that, we can create operations 224 00:11:55,200 --> 00:11:58,800 like adding associative arrays, subtracting associative arrays, 225 00:11:58,800 --> 00:12:02,460 anding, [? warring ?] them, multiplying them together. 226 00:12:02,460 --> 00:12:04,590 And the result of every one of these operations 227 00:12:04,590 --> 00:12:07,120 will be another associative array. 228 00:12:07,120 --> 00:12:10,820 And so that's a very powerful, powerful concept. 229 00:12:10,820 --> 00:12:15,275 And likewise, we can do very easy query operations 230 00:12:15,275 --> 00:12:20,400 if we want to get rows Alice and Bob, row Alice, rows 231 00:12:20,400 --> 00:12:25,070 beginning with al, rows Alice to Bob, the first two rows, 232 00:12:25,070 --> 00:12:27,840 everything equal to 47. 233 00:12:27,840 --> 00:12:30,450 And every single one of these will return 234 00:12:30,450 --> 00:12:32,690 another associative array. 235 00:12:32,690 --> 00:12:35,200 So again, this compositional concept 236 00:12:35,200 --> 00:12:36,840 is very, very important. 237 00:12:36,840 --> 00:12:39,370 It's not to say we don't use the triples formulation, too. 238 00:12:39,370 --> 00:12:41,680 And we have routines that very quickly bounce you 239 00:12:41,680 --> 00:12:42,847 back and forth between them. 240 00:12:42,847 --> 00:12:44,554 Because there are times you're just like, 241 00:12:44,554 --> 00:12:46,570 no, I want to view this is as a set of triples. 242 00:12:46,570 --> 00:12:48,611 And I want to do new operations that way as well. 243 00:12:48,611 --> 00:12:50,140 So we support them both. 244 00:12:50,140 --> 00:12:55,080 But this is really sort of the core composable aspect. 245 00:12:55,080 --> 00:12:58,640 In fact, most people initially develop their codes 246 00:12:58,640 --> 00:13:02,220 in this formalism, and then maybe 247 00:13:02,220 --> 00:13:05,670 change some of their lines to working on native triples 248 00:13:05,670 --> 00:13:10,350 if they need to get some improvement in performance, 249 00:13:10,350 --> 00:13:12,990 memory handling, or the side like that. 250 00:13:12,990 --> 00:13:14,674 And we certainly do that, too. 251 00:13:18,160 --> 00:13:19,940 These associative arrays are actually 252 00:13:19,940 --> 00:13:21,710 very easy to implement. 253 00:13:21,710 --> 00:13:26,080 And the whole D4M library-- it's probably a little larger now, 254 00:13:26,080 --> 00:13:30,370 but when I wrote this-- it was about 2,000 lines, 255 00:13:30,370 --> 00:13:33,520 At which is very easy to do in programming environments that 256 00:13:33,520 --> 00:13:37,620 have first class support for two dimensional arrays, operator 257 00:13:37,620 --> 00:13:38,130 overloading. 258 00:13:38,130 --> 00:13:40,050 We're overloading operators here like crazy 259 00:13:40,050 --> 00:13:42,560 here, the plus, the minus, the and, parentheses, 260 00:13:42,560 --> 00:13:44,570 all that kind of stuff. 261 00:13:44,570 --> 00:13:46,460 And also, have first class support 262 00:13:46,460 --> 00:13:47,710 for sparse linear algebra. 263 00:13:47,710 --> 00:13:49,300 Internal, under the covers, we're 264 00:13:49,300 --> 00:13:52,920 using sparse linear algebra like crazy to make all this work. 265 00:13:56,175 --> 00:14:00,000 Of the languages on Earth that have those features, 266 00:14:00,000 --> 00:14:02,900 MATLAB is by far the most popular one, 267 00:14:02,900 --> 00:14:05,270 which is why we have chosen that language 268 00:14:05,270 --> 00:14:07,445 to implement these features. 269 00:14:07,445 --> 00:14:09,570 There are other languages that have these features. 270 00:14:09,570 --> 00:14:11,700 And you can implement that. 271 00:14:11,700 --> 00:14:14,610 But there are other languages that don't. 272 00:14:14,610 --> 00:14:16,940 And if you want to implement D4M in those languages, 273 00:14:16,940 --> 00:14:18,160 you certainly can. 274 00:14:18,160 --> 00:14:20,090 You just have to write more code. 275 00:14:20,090 --> 00:14:22,430 So this is sort of the language in which it can be done 276 00:14:22,430 --> 00:14:24,290 in the minimum amount of code. 277 00:14:24,290 --> 00:14:27,150 And again, we find that for complex analytics 278 00:14:27,150 --> 00:14:29,300 that people typically can get their work done 279 00:14:29,300 --> 00:14:34,680 with 50 times less code than equivalent Java and SQL. 280 00:14:34,680 --> 00:14:37,150 And this naturally leads to high performance parallel 281 00:14:37,150 --> 00:14:40,160 implementations, because these are all arrays and matrices. 282 00:14:40,160 --> 00:14:43,000 And we have a very well-established literature 283 00:14:43,000 --> 00:14:46,090 in the community about how to do parallel linear algebra, 284 00:14:46,090 --> 00:14:49,290 how to make matrices work in parallel. 285 00:14:49,290 --> 00:14:50,947 It's a very well-studied discipline. 286 00:14:50,947 --> 00:14:52,280 And we have a lot of technology. 287 00:14:58,010 --> 00:14:59,910 Just to remind people of how we actually 288 00:14:59,910 --> 00:15:03,120 store the data in our databases and how the data will often 289 00:15:03,120 --> 00:15:05,240 come out in our associate arrays, 290 00:15:05,240 --> 00:15:07,630 we use this exploded schema. 291 00:15:07,630 --> 00:15:10,960 So if we had data that looked like a standard table 292 00:15:10,960 --> 00:15:14,790 here with some kind of key and three columns 293 00:15:14,790 --> 00:15:17,600 here with different values associated 294 00:15:17,600 --> 00:15:20,070 with every one of them, if we were to insert these 295 00:15:20,070 --> 00:15:23,150 into a standard, say, SQL type table, 296 00:15:23,150 --> 00:15:29,690 we would just simply create a table like this in SQL. 297 00:15:29,690 --> 00:15:32,500 But if we wanted to look up anything quickly, 298 00:15:32,500 --> 00:15:34,620 we would have to create ancillary tables that 299 00:15:34,620 --> 00:15:39,050 would store indices to allow us to do things very quickly. 300 00:15:39,050 --> 00:15:41,960 And if we wanted to add a totally new type of data, 301 00:15:41,960 --> 00:15:46,140 we would have to rethink our schema. 302 00:15:46,140 --> 00:15:49,360 So what we typically do is we create this exploded schema. 303 00:15:49,360 --> 00:15:53,430 Because our triple stores are very comfortable 304 00:15:53,430 --> 00:15:56,310 dynamically storing very large numbers of columns. 305 00:15:56,310 --> 00:16:00,800 So we essentially typically take the type and append the value. 306 00:16:00,800 --> 00:16:02,430 So we create a series of triples here, 307 00:16:02,430 --> 00:16:04,510 usually with some minimal information 308 00:16:04,510 --> 00:16:06,990 for the actual stored value, something 309 00:16:06,990 --> 00:16:08,490 we would never want to look up. 310 00:16:08,490 --> 00:16:10,550 Again, by itself this doesn't give you much. 311 00:16:10,550 --> 00:16:14,470 Because most of these triple stores have an orientation, 312 00:16:14,470 --> 00:16:15,710 typically row oriented. 313 00:16:15,710 --> 00:16:20,240 So you can get any row quickly with relatively little effort. 314 00:16:24,370 --> 00:16:28,710 And so we also store the transpose of that. 315 00:16:28,710 --> 00:16:31,850 We now create a table pair, such that we can look up 316 00:16:31,850 --> 00:16:35,270 any row or column very quickly. 317 00:16:35,270 --> 00:16:38,000 And so essentially, we've indexed all the data. 318 00:16:38,000 --> 00:16:41,230 And you can now do things very, very quickly. 319 00:16:41,230 --> 00:16:43,540 So that's a very powerful, powerful concept. 320 00:16:43,540 --> 00:16:45,170 And we exploit it all time. 321 00:16:45,170 --> 00:16:47,000 And of course, this whole concept 322 00:16:47,000 --> 00:16:48,750 looks very much like an associative array. 323 00:16:48,750 --> 00:16:51,170 Mathematically, we can treat it as an associative array. 324 00:17:05,280 --> 00:17:05,780 All right. 325 00:17:05,780 --> 00:17:07,651 So that's sort of the easy part. 326 00:17:07,651 --> 00:17:09,109 I think we've covered a lot of this 327 00:17:09,109 --> 00:17:10,450 before in previous lectures. 328 00:17:10,450 --> 00:17:11,825 Now, we're going to get into some 329 00:17:11,825 --> 00:17:14,250 of the more mathematical elements, which 330 00:17:14,250 --> 00:17:16,140 aren't necessarily hard. 331 00:17:16,140 --> 00:17:22,960 They just impose notation that we don't necessarily 332 00:17:22,960 --> 00:17:25,970 use in our regular day to day work lives. 333 00:17:25,970 --> 00:17:27,345 So we're going to talk about what 334 00:17:27,345 --> 00:17:30,490 are the values associations, keys, functions, 335 00:17:30,490 --> 00:17:32,420 and getting into things like matrix multiply. 336 00:17:36,360 --> 00:17:42,520 So mathematically, we treat an associative array 337 00:17:42,520 --> 00:17:45,300 as sets of keys and values that are 338 00:17:45,300 --> 00:17:48,700 drawn from infinite strict totally ordered set, 339 00:17:48,700 --> 00:17:52,150 we'll call it S. You might be like, what is that? 340 00:17:52,150 --> 00:17:54,310 What is an infinite strict totally ordered set? 341 00:17:54,310 --> 00:17:57,250 It's just a set where there's an ordering, where 342 00:17:57,250 --> 00:17:59,270 basically any two elements in the set, 343 00:17:59,270 --> 00:18:02,420 there's a function that will tell you 344 00:18:02,420 --> 00:18:04,410 one is greater than the other. 345 00:18:04,410 --> 00:18:06,400 And they also have an equal operation, too. 346 00:18:06,400 --> 00:18:08,870 So it will tell you that, you know, to give it any two 347 00:18:08,870 --> 00:18:12,130 values, I can ask the question is one greater than the other 348 00:18:12,130 --> 00:18:13,170 or are they equal? 349 00:18:13,170 --> 00:18:16,310 So that's the strict part. 350 00:18:16,310 --> 00:18:20,800 So that encompasses a lot. 351 00:18:20,800 --> 00:18:23,900 Obviously, it encompasses the usual things like real numbers, 352 00:18:23,900 --> 00:18:25,620 and integers, and things like that. 353 00:18:25,620 --> 00:18:28,790 But it also encompasses strings if we 354 00:18:28,790 --> 00:18:31,310 impose an ordering on them. 355 00:18:31,310 --> 00:18:34,470 And we will choose lexicographic ordering almost always 356 00:18:34,470 --> 00:18:36,890 to be our order. 357 00:18:36,890 --> 00:18:39,339 In fact, in the implementation, we sort of impose that. 358 00:18:39,339 --> 00:18:40,880 We always use lexicographic ordering. 359 00:18:40,880 --> 00:18:43,880 We don't really have another ordering that is readily 360 00:18:43,880 --> 00:18:47,110 at our disposal. 361 00:18:47,110 --> 00:18:53,770 So given those keys and values, an associative array 362 00:18:53,770 --> 00:18:59,350 is a partial function from a d dimensional set 363 00:18:59,350 --> 00:19:03,520 of keys to a value. 364 00:19:03,520 --> 00:19:08,940 So if we have the vector of key, so typically d will be 2. 365 00:19:08,940 --> 00:19:14,340 And the first k1 will be the row key, and then a column key. 366 00:19:14,340 --> 00:19:19,570 And it will map out its partial function 367 00:19:19,570 --> 00:19:22,440 from the d keys to one value. 368 00:19:22,440 --> 00:19:28,150 So it will look like this, an associative array 369 00:19:28,150 --> 00:19:32,300 with a vector of keys, typically 2. 370 00:19:32,300 --> 00:19:39,640 I will have a value vi and that it is empty otherwise. 371 00:19:42,350 --> 00:19:46,280 So this is its function, because we only define it 372 00:19:46,280 --> 00:19:48,310 where these keys exist. 373 00:19:48,310 --> 00:19:50,900 So you can Imagine that every singe associative 374 00:19:50,900 --> 00:19:54,700 array is a function over the entire space 375 00:19:54,700 --> 00:19:56,660 of all possible keys. 376 00:19:56,660 --> 00:19:59,630 But we're only giving you the ones that are defined. 377 00:19:59,630 --> 00:20:02,070 And everywhere else, it's undefined. 378 00:20:02,070 --> 00:20:02,770 OK. 379 00:20:02,770 --> 00:20:06,460 This is actually a fairly large break from linear algebra, 380 00:20:06,460 --> 00:20:12,640 where we can have functions-- a matrix is really 381 00:20:12,640 --> 00:20:16,520 a function of the indices, the ij indices, which are integers. 382 00:20:16,520 --> 00:20:18,430 Here, we allow them to be anything. 383 00:20:18,430 --> 00:20:20,680 And we formally support the concept 384 00:20:20,680 --> 00:20:23,890 of a completely empty row or column. 385 00:20:23,890 --> 00:20:27,470 So associative arrays don't allow that. 386 00:20:27,470 --> 00:20:31,580 They only store not empty information. 387 00:20:31,580 --> 00:20:38,070 And in our implementation, we tend to store 0 as equivalent 388 00:20:38,070 --> 00:20:40,110 to null in this space. 389 00:20:40,110 --> 00:20:43,230 Because the underlying sparse matrix implementation 390 00:20:43,230 --> 00:20:43,820 does that. 391 00:20:43,820 --> 00:20:47,180 0 is essentially treated as the null character. 392 00:20:47,180 --> 00:20:50,290 So that's a big difference. 393 00:20:50,290 --> 00:20:53,870 Now, we can still use our linear algebraic intuition, 394 00:20:53,870 --> 00:20:58,890 but that's the fundamental difference here. 395 00:20:58,890 --> 00:21:01,700 So binary operations on any two associative 396 00:21:01,700 --> 00:21:03,280 arrays-- so I have associative array 397 00:21:03,280 --> 00:21:09,050 A1 and A1 with this binary operation-- 398 00:21:09,050 --> 00:21:14,680 are defined essentially by two functions. 399 00:21:14,680 --> 00:21:17,310 The sort of upper function here says how 400 00:21:17,310 --> 00:21:20,260 we're going to treat the keys. 401 00:21:20,260 --> 00:21:23,020 So if we have two associate arrays, 402 00:21:23,020 --> 00:21:25,820 their keys, some of them can overlap. 403 00:21:25,820 --> 00:21:27,640 Some of them cannot overlap. 404 00:21:27,640 --> 00:21:29,370 And our function will either choose 405 00:21:29,370 --> 00:21:33,920 to look for the union or the intersection of those. 406 00:21:33,920 --> 00:21:36,560 So for instance, if we do addition, 407 00:21:36,560 --> 00:21:43,700 that's most associated with unioning the underlying keys. 408 00:21:43,700 --> 00:21:49,040 If we do other operations, like and, that's 409 00:21:49,040 --> 00:21:51,830 most consistent with intersecting 410 00:21:51,830 --> 00:21:54,260 the two sets of keys. 411 00:21:54,260 --> 00:21:55,910 So obviously, if it's union-like, 412 00:21:55,910 --> 00:21:59,350 the result will always be non-empty. 413 00:21:59,350 --> 00:22:01,600 If the function is intersection-like, 414 00:22:01,600 --> 00:22:04,550 then there's a possibility you can do a binary operation. 415 00:22:04,550 --> 00:22:08,820 And the result will be an empty associative array. 416 00:22:08,820 --> 00:22:14,500 So we have that choice there. 417 00:22:14,500 --> 00:22:16,740 And we choose union and intersection, 418 00:22:16,740 --> 00:22:19,049 because they're the ones that are-- you 419 00:22:19,049 --> 00:22:20,090 could do other functions. 420 00:22:20,090 --> 00:22:24,180 But they're the ones that are formally covered in set theory 421 00:22:24,180 --> 00:22:27,730 and keep us from having to relax this very 422 00:22:27,730 --> 00:22:30,220 general condition here. 423 00:22:30,220 --> 00:22:31,970 We can have equality. 424 00:22:31,970 --> 00:22:35,320 So we can check for intersection. 425 00:22:35,320 --> 00:22:37,100 And so we're OK there. 426 00:22:43,440 --> 00:22:49,260 So if we have an associative array, 427 00:22:49,260 --> 00:22:53,630 A1 with a set of keys here, ki, and a value, v1, 428 00:22:53,630 --> 00:22:56,910 and another associative array with an intersecting 429 00:22:56,910 --> 00:23:04,470 ki with v2, then A3 ki will be given by this function. 430 00:23:07,250 --> 00:23:10,770 It's either going to be the keys overlap. 431 00:23:10,770 --> 00:23:15,610 So if ki are the same, they'll overlap. 432 00:23:15,610 --> 00:23:17,800 And so there you go. 433 00:23:17,800 --> 00:23:18,810 And likewise, with this. 434 00:23:18,810 --> 00:23:19,685 You have this choice. 435 00:23:19,685 --> 00:23:22,780 It could either be a union or an intersection. 436 00:23:22,780 --> 00:23:25,240 And then what we do with the values-- like, 437 00:23:25,240 --> 00:23:28,130 we now have the collision is what we'll call these. 438 00:23:28,130 --> 00:23:30,323 Two keys from two associative arrays, the row keys 439 00:23:30,323 --> 00:23:31,830 and the column keys are the same. 440 00:23:31,830 --> 00:23:34,300 Well, now, we want to do something. 441 00:23:34,300 --> 00:23:38,450 We apply this function f, which we call the collision function. 442 00:23:38,450 --> 00:23:41,910 And that will determine what the actual result is, what 443 00:23:41,910 --> 00:23:44,590 the new value of A3 will be. 444 00:23:49,970 --> 00:23:58,464 If one of these was empty, then obviously the union of this we 445 00:23:58,464 --> 00:23:59,880 don't even have to worry about it. 446 00:23:59,880 --> 00:24:03,050 That's kind of the way we approach this. 447 00:24:03,050 --> 00:24:09,380 If there is no collision, so that 448 00:24:09,380 --> 00:24:12,480 means you're applying your function with this, 449 00:24:12,480 --> 00:24:14,470 then f is never called. 450 00:24:14,470 --> 00:24:18,170 So we always just do that or that. 451 00:24:18,170 --> 00:24:20,030 So if it's a union function, then you just 452 00:24:20,030 --> 00:24:22,157 get v. If it's an intersection function, 453 00:24:22,157 --> 00:24:23,740 then you would just get the empty set. 454 00:24:23,740 --> 00:24:26,655 And the underlying function never is called. 455 00:24:26,655 --> 00:24:29,030 And you might be like, well, why would I care about that? 456 00:24:29,030 --> 00:24:33,470 Well, later on I'm going to get to deal with things like, well, 457 00:24:33,470 --> 00:24:37,770 what is the zero in this math? 458 00:24:37,770 --> 00:24:40,600 Or what is one in this math? 459 00:24:40,600 --> 00:24:44,150 And this will allow me to sort of address some of those cases 460 00:24:44,150 --> 00:24:48,980 by saying, look, if there's no key intersection, 461 00:24:48,980 --> 00:24:51,130 we can do that. 462 00:24:51,130 --> 00:24:54,970 So the high level usage of associative arrays 463 00:24:54,970 --> 00:24:57,665 is essentially dictated by this mathematics. 464 00:25:00,170 --> 00:25:03,010 And again, part of the reason we chose this very general 465 00:25:03,010 --> 00:25:06,810 definition is that, by definition, 466 00:25:06,810 --> 00:25:09,404 numbers, reals, and integers are all included here. 467 00:25:09,404 --> 00:25:10,570 We have not thrown them out. 468 00:25:10,570 --> 00:25:12,380 And we've also included strings. 469 00:25:12,380 --> 00:25:15,780 I should say at any time we can say, look, our values are just 470 00:25:15,780 --> 00:25:17,070 real numbers. 471 00:25:17,070 --> 00:25:19,890 And then we essentially get the full power of linear algebra. 472 00:25:19,890 --> 00:25:21,570 But that's true of this sort of card 473 00:25:21,570 --> 00:25:23,920 that we always try and play last. 474 00:25:23,920 --> 00:25:24,420 OK. 475 00:25:24,420 --> 00:25:27,394 We try and keep our mathematics as broad as we can. 476 00:25:27,394 --> 00:25:29,310 And if then there's particular instances like, 477 00:25:29,310 --> 00:25:32,536 oh, well, I need the values to be real numbers or complex 478 00:25:32,536 --> 00:25:34,160 numbers here, or integers, or whatever. 479 00:25:34,160 --> 00:25:35,580 We can always say, all right, the values are that. 480 00:25:35,580 --> 00:25:38,320 And now, we get a whole bunch of additional properties. 481 00:25:38,320 --> 00:25:41,230 But we try and build up the mathematics 482 00:25:41,230 --> 00:25:45,434 in this general way first, and then sort of play that later. 483 00:25:45,434 --> 00:25:46,320 AUDIENCE: Question. 484 00:25:46,320 --> 00:25:47,111 JEREMY KEPNER: Yes. 485 00:25:47,111 --> 00:25:50,892 AUDIENCE: So the way you've-- the general definition of keys, 486 00:25:50,892 --> 00:25:53,380 that's more general than you would need to analyze 487 00:25:53,380 --> 00:25:54,630 spreadsheets. 488 00:25:54,630 --> 00:25:56,521 Wouldn't it be sufficient for spreadsheets 489 00:25:56,521 --> 00:26:01,600 to have your keys be integers [INAUDIBLE]? 490 00:26:01,600 --> 00:26:08,550 JEREMY KEPNER: So I believe that actually spreadsheets 491 00:26:08,550 --> 00:26:13,090 store things as R1, and R2, R3 internally and C1, that they 492 00:26:13,090 --> 00:26:16,880 actually use a internal triple representation, 493 00:26:16,880 --> 00:26:22,020 and then project that into this integer space. 494 00:26:22,020 --> 00:26:26,490 So what the actual spreadsheet does, I think, 495 00:26:26,490 --> 00:26:29,430 kind of depends on the specific spreadsheet. 496 00:26:29,430 --> 00:26:34,030 So the question was-- they can't really hear you on this mic. 497 00:26:34,030 --> 00:26:37,990 Is this more than you need to do a spreadsheet? 498 00:26:37,990 --> 00:26:43,450 Could you get away with just integer indexing? 499 00:26:43,450 --> 00:26:45,860 And I would say perhaps. 500 00:26:45,860 --> 00:26:46,840 I don't really know. 501 00:26:46,840 --> 00:26:48,710 And as I said, I think internally they 502 00:26:48,710 --> 00:26:53,500 actually do something that's more akin to this. 503 00:26:53,500 --> 00:26:59,080 Certainly when you do math in Microsoft Excel, 504 00:26:59,080 --> 00:27:00,690 it has letters. 505 00:27:00,690 --> 00:27:03,904 You give it A1, B1, those types of things. 506 00:27:03,904 --> 00:27:06,890 AUDIENCE: But that's just because they're using letters 507 00:27:06,890 --> 00:27:09,637 instead of a second integer. 508 00:27:09,637 --> 00:27:10,470 JEREMY KEPNER: Yeah. 509 00:27:10,470 --> 00:27:10,790 Yeah. 510 00:27:10,790 --> 00:27:11,970 One could view it that way. 511 00:27:11,970 --> 00:27:13,550 But people like it. 512 00:27:13,550 --> 00:27:15,105 So we do what people like, right? 513 00:27:15,105 --> 00:27:15,730 AUDIENCE: Yeah. 514 00:27:15,730 --> 00:27:22,170 But you don't need an additional mathematical structure. 515 00:27:22,170 --> 00:27:24,030 [INAUDIBLE] 516 00:27:24,030 --> 00:27:26,511 You can just make it one to one [INAUDIBLE] letters to-- 517 00:27:26,511 --> 00:27:28,177 AUDIENCE: [INAUDIBLE] like in Excel, you 518 00:27:28,177 --> 00:27:31,020 have different worksheets, too, and different columns. 519 00:27:31,020 --> 00:27:33,840 So you have to reference both of those. 520 00:27:33,840 --> 00:27:35,865 AUDIENCE: You have to reference row and column. 521 00:27:35,865 --> 00:27:37,255 But you just need a pair of integers. 522 00:27:37,255 --> 00:27:38,838 AUDIENCE: Well, it's a row and column. 523 00:27:38,838 --> 00:27:40,960 But then it's also the worksheets that you're on. 524 00:27:40,960 --> 00:27:41,420 AUDIENCE: OK. 525 00:27:41,420 --> 00:27:41,530 So, three. 526 00:27:41,530 --> 00:27:42,654 AUDIENCE: So there's three. 527 00:27:42,654 --> 00:27:44,970 But they're not going to sequentially number all those. 528 00:27:44,970 --> 00:27:49,144 Because it's a sparse space that you're using, right, probably? 529 00:27:49,144 --> 00:27:51,710 So I'm sure they do some caching. 530 00:27:51,710 --> 00:27:54,475 I don't think they store a number for every column 531 00:27:54,475 --> 00:27:55,612 that you're not using. 532 00:27:55,612 --> 00:27:57,320 AUDIENCE: Certainly, you can try that out 533 00:27:57,320 --> 00:27:58,320 by creating an empty spreadsheet, 534 00:27:58,320 --> 00:27:59,695 putting the number in one corner, 535 00:27:59,695 --> 00:28:03,050 putting a number in another corner really far away 536 00:28:03,050 --> 00:28:05,624 and see if it [INAUDIBLE] two [INAUDIBLE]. 537 00:28:05,624 --> 00:28:07,040 JEREMY KEPNER: It doesn't do that. 538 00:28:07,040 --> 00:28:08,060 Obviously, it does compress. 539 00:28:08,060 --> 00:28:09,010 AUDIENCE: [INAUDIBLE]. 540 00:28:09,010 --> 00:28:10,970 JEREMY KEPNER: Yeah. 541 00:28:10,970 --> 00:28:15,650 So mathematically, it may be more than what is minimally 542 00:28:15,650 --> 00:28:18,060 required to do a spreadsheet. 543 00:28:18,060 --> 00:28:20,910 We will discover, though, it is very useful 544 00:28:20,910 --> 00:28:23,390 and, in a certain sense, allows you to do things 545 00:28:23,390 --> 00:28:26,490 with spreadsheets that are hard to do 546 00:28:26,490 --> 00:28:29,480 in the existing technologies, which is why I use it. 547 00:28:29,480 --> 00:28:31,500 There's operations that I can do with this 548 00:28:31,500 --> 00:28:32,490 that are very natural. 549 00:28:32,490 --> 00:28:34,540 And the spreadsheet data is there. 550 00:28:34,540 --> 00:28:36,712 And I would like to be able to do it. 551 00:28:36,712 --> 00:28:37,560 AUDIENCE: So Jeremy? 552 00:28:37,560 --> 00:28:38,351 JEREMY KEPNER: Yes. 553 00:28:38,351 --> 00:28:39,870 AUDIENCE: The actual binary operator 554 00:28:39,870 --> 00:28:45,230 plus in associative array, is it union or intersection? 555 00:28:45,230 --> 00:28:47,670 JEREMY KEPNER: So on the keys, it 556 00:28:47,670 --> 00:28:56,010 will be union where the value is an intersection 557 00:28:56,010 --> 00:28:56,960 with an empty set. 558 00:28:56,960 --> 00:28:59,100 We don't care what the actual numerical thing is. 559 00:28:59,100 --> 00:29:02,810 We'll just return the other thing. 560 00:29:02,810 --> 00:29:05,770 Where there is a collision, and so now you have to resolve-- 561 00:29:05,770 --> 00:29:09,370 and that's kind of why we get a lot of mileage here. 562 00:29:09,370 --> 00:29:11,760 Because most of the time, we are very few collisions. 563 00:29:11,760 --> 00:29:14,870 And so it's not like we have to worry about huge things. 564 00:29:14,870 --> 00:29:19,260 So where there are collisions, if the values are numbers, 565 00:29:19,260 --> 00:29:21,780 then it'll just do the normal addition. 566 00:29:21,780 --> 00:29:26,940 If the values are strings, then it's somewhat undefined 567 00:29:26,940 --> 00:29:27,950 what we mean there. 568 00:29:30,660 --> 00:29:36,880 And I forget if D4M throws an error in that situation 569 00:29:36,880 --> 00:29:38,490 if the values are strings. 570 00:29:38,490 --> 00:29:40,510 It might default to just doing something 571 00:29:40,510 --> 00:29:43,260 like taking the max or the min and saying, 572 00:29:43,260 --> 00:29:46,515 you tried to add two things for which the formal addition 573 00:29:46,515 --> 00:29:48,390 collision function doesn't really make sense. 574 00:29:48,390 --> 00:29:50,430 And I'm just going to put something in there. 575 00:29:50,430 --> 00:29:53,050 We can test that out at the end of the class. 576 00:29:53,050 --> 00:29:54,520 AUDIENCE: So a collision function 577 00:29:54,520 --> 00:29:57,904 is a function of that type of the value? 578 00:29:57,904 --> 00:29:58,695 JEREMY KEPNER: Yes. 579 00:29:58,695 --> 00:29:59,340 AUDIENCE: OK. 580 00:29:59,340 --> 00:30:00,330 JEREMY KEPNER: It has to be a function 581 00:30:00,330 --> 00:30:01,371 of the type of the value. 582 00:30:01,371 --> 00:30:04,070 Because if I'm going to add a real number with a string, 583 00:30:04,070 --> 00:30:05,680 what do I do? 584 00:30:05,680 --> 00:30:06,920 Because I can do that. 585 00:30:06,920 --> 00:30:07,910 Yes, another question. 586 00:30:07,910 --> 00:30:09,930 AUDIENCE: To follow up on the previous point, 587 00:30:09,930 --> 00:30:13,070 it seems like in the case for addition, for example, 588 00:30:13,070 --> 00:30:16,500 there might be cases where you'd actually want union, then add, 589 00:30:16,500 --> 00:30:18,566 or intersection, then add. 590 00:30:18,566 --> 00:30:21,990 It kind of depends on what the nulls in your array 591 00:30:21,990 --> 00:30:23,245 actually represent. 592 00:30:23,245 --> 00:30:26,720 JEREMY KEPNER: Oh, there are so many functions 593 00:30:26,720 --> 00:30:28,410 where you can think, well, I'd kind of 594 00:30:28,410 --> 00:30:31,165 like to have it mean this in this context. 595 00:30:31,165 --> 00:30:31,790 AUDIENCE: Yeah. 596 00:30:31,790 --> 00:30:34,290 So the question is can you pass in 597 00:30:34,290 --> 00:30:37,737 as a parameter in D4M I want you to union, then operate, 598 00:30:37,737 --> 00:30:39,320 I want you to intersect, then operate? 599 00:30:39,320 --> 00:30:43,820 Or did they all just assume the most logical and do it? 600 00:30:43,820 --> 00:30:51,990 So bigger lever question was sometimes you 601 00:30:51,990 --> 00:30:54,150 maybe would like plus to mean some 602 00:30:54,150 --> 00:30:57,940 of the other conceptual ways that we would mean it. 603 00:30:57,940 --> 00:30:58,950 OK. 604 00:30:58,950 --> 00:31:01,540 So in fact, we see that a lot. 605 00:31:01,540 --> 00:31:04,170 There's a lot of things. 606 00:31:04,170 --> 00:31:07,600 Now, in fact, we'll get into that space. 607 00:31:07,600 --> 00:31:11,490 And there's hundreds, thousands, millions 608 00:31:11,490 --> 00:31:16,880 of potential possible algebras that you can possibly want. 609 00:31:16,880 --> 00:31:20,580 So what we have done is we think about the function. 610 00:31:20,580 --> 00:31:23,820 And we think about in this mathematical context. 611 00:31:23,820 --> 00:31:26,800 And we see what its mathematical implications 612 00:31:26,800 --> 00:31:30,680 are in terms of is it consistent with the overall group 613 00:31:30,680 --> 00:31:32,970 theory of the mathematics, in which case 614 00:31:32,970 --> 00:31:37,650 we know that if someone uses it that the intuition will carry 615 00:31:37,650 --> 00:31:38,330 through? 616 00:31:38,330 --> 00:31:40,020 Or is it kind of forking them off 617 00:31:40,020 --> 00:31:42,100 into it's something you might want to do, 618 00:31:42,100 --> 00:31:44,560 but really the result is now taking you 619 00:31:44,560 --> 00:31:47,360 into a sort of undefined space? 620 00:31:47,360 --> 00:31:51,820 So that's kind of a thing we use for doing that. 621 00:31:51,820 --> 00:31:59,160 We don't have the formal support in this of a multiply operation 622 00:31:59,160 --> 00:32:03,920 that allows you to pass in an arbitrary 623 00:32:03,920 --> 00:32:06,850 function for doing that. 624 00:32:06,850 --> 00:32:09,010 I should say there are various tricks that 625 00:32:09,010 --> 00:32:12,580 allow you to do that in just a couple lines or two. 626 00:32:12,580 --> 00:32:15,150 Basically, what you can do is you 627 00:32:15,150 --> 00:32:17,291 start with give me the intersection of the unions 628 00:32:17,291 --> 00:32:17,790 first. 629 00:32:17,790 --> 00:32:19,750 Now, I know I have only collisions. 630 00:32:19,750 --> 00:32:21,026 Now, give me the values. 631 00:32:21,026 --> 00:32:22,400 Apply whatever operator you want. 632 00:32:22,400 --> 00:32:24,340 Stuff the values back in, and you're done. 633 00:32:24,340 --> 00:32:27,380 And so you can kind of work your way through that. 634 00:32:27,380 --> 00:32:32,590 And by doing that, you're doing it in a way such that you know, 635 00:32:32,590 --> 00:32:34,590 yeah, you're outside of an algebra here. 636 00:32:34,590 --> 00:32:36,519 You're just kind of doing what you're doing. 637 00:32:36,519 --> 00:32:37,810 So encourage people to do that. 638 00:32:37,810 --> 00:32:39,185 But I'm just telling you in terms 639 00:32:39,185 --> 00:32:42,400 of how we choose what functions we pull and formally support. 640 00:32:42,400 --> 00:32:44,150 And there's a few that I've been wrestling 641 00:32:44,150 --> 00:32:49,930 with for a long time, for a very long time 642 00:32:49,930 --> 00:32:52,741 and just haven't been able to decide what to do. 643 00:32:52,741 --> 00:32:54,240 Maybe I should just make a decision. 644 00:32:54,240 --> 00:32:54,986 Yeah. 645 00:32:54,986 --> 00:32:56,770 AUDIENCE: Like a real simple example, 646 00:32:56,770 --> 00:32:59,380 if I had an array, one, two, three, and ray 647 00:32:59,380 --> 00:33:02,752 b was null on all three, there would only be one collision. 648 00:33:02,752 --> 00:33:05,210 And if we were in add, it would only add them at the last-- 649 00:33:05,210 --> 00:33:05,960 JEREMY KEPNER: No. 650 00:33:05,960 --> 00:33:07,572 If it was an addition, it was-- 651 00:33:07,572 --> 00:33:08,898 AUDIENCE: So one, two, three. 652 00:33:08,898 --> 00:33:12,880 And then the other one was null, null, three. 653 00:33:12,880 --> 00:33:15,761 JEREMY KEPNER: So but they have the same column or something? 654 00:33:15,761 --> 00:33:16,510 These are vectors? 655 00:33:16,510 --> 00:33:16,860 AUDIENCE: Yes. 656 00:33:16,860 --> 00:33:18,401 JEREMY KEPNER: These are row vectors? 657 00:33:18,401 --> 00:33:18,980 Yes. 658 00:33:18,980 --> 00:33:23,440 So basically we have a one column vector with three, 659 00:33:23,440 --> 00:33:24,840 another column with one. 660 00:33:24,840 --> 00:33:29,770 And when we add them together, you would get a three. 661 00:33:29,770 --> 00:33:31,380 Because it would be union. 662 00:33:31,380 --> 00:33:32,391 It would union the keys. 663 00:33:32,391 --> 00:33:33,890 And then the only addition operation 664 00:33:33,890 --> 00:33:36,580 would be performed on the collision. 665 00:33:36,580 --> 00:33:40,610 We did one or and an and operation, then the result 666 00:33:40,610 --> 00:33:44,330 would just be one, an associative array 667 00:33:44,330 --> 00:33:46,630 with one element in it. 668 00:33:46,630 --> 00:33:50,499 AUDIENCE: And we can do either one depending on the context? 669 00:33:50,499 --> 00:33:51,540 I kind of lost you there. 670 00:33:51,540 --> 00:33:53,869 Because the question was is it union or intersection. 671 00:33:53,869 --> 00:33:55,660 JEREMY KEPNER: It depends on the operation. 672 00:33:55,660 --> 00:34:00,160 So plus formally is closer to set addition, 673 00:34:00,160 --> 00:34:02,530 and so therefore, is a union operation. 674 00:34:02,530 --> 00:34:06,240 And is formally closer to set intersection, 675 00:34:06,240 --> 00:34:08,980 and so it's an intersection operation there. 676 00:34:08,980 --> 00:34:12,120 Regardless, in both cases, the collision function 677 00:34:12,120 --> 00:34:14,219 is only applied where there is a collision. 678 00:34:16,960 --> 00:34:18,369 Yes. 679 00:34:18,369 --> 00:34:20,741 AUDIENCE: I'm getting a little confused by [INAUDIBLE] 680 00:34:20,741 --> 00:34:25,896 formally defined operations on associative arrays 681 00:34:25,896 --> 00:34:27,273 in terms of values. 682 00:34:27,273 --> 00:34:31,119 You've used a quality on keys at this point. 683 00:34:31,119 --> 00:34:34,409 You haven't defined any other operations such as [INAUDIBLE] 684 00:34:34,409 --> 00:34:36,760 intersection [INAUDIBLE]. 685 00:34:36,760 --> 00:34:40,430 JEREMY KEPNER: Well, so the keys are part of a set. 686 00:34:40,430 --> 00:34:44,730 And so they get the union and intersection properties 687 00:34:44,730 --> 00:34:47,570 of strict totally ordered set, which 688 00:34:47,570 --> 00:34:50,136 is the intuitive use of union. 689 00:34:50,136 --> 00:34:50,802 AUDIENCE: Right. 690 00:34:50,802 --> 00:34:52,969 But they belong to the same set, [INAUDIBLE] 691 00:34:52,969 --> 00:34:54,949 the same [INAUDIBLE] set. 692 00:34:54,949 --> 00:34:58,414 Yet, you've only defined binary operations 693 00:34:58,414 --> 00:34:59,830 on associative arrays. 694 00:34:59,830 --> 00:35:03,880 You haven't defined them on the elements of the set. 695 00:35:03,880 --> 00:35:09,497 JEREMY KEPNER: So we'll get into that a little bit later. 696 00:35:09,497 --> 00:35:10,830 AUDIENCE: That's what I thought. 697 00:35:10,830 --> 00:35:12,288 JEREMY KEPNER: But they essentially 698 00:35:12,288 --> 00:35:16,310 have the binary operations of a strict totally ordered set, 699 00:35:16,310 --> 00:35:17,981 which is equality and less than. 700 00:35:17,981 --> 00:35:20,064 AUDIENCE: It seems to me the bulk of the questions 701 00:35:20,064 --> 00:35:22,672 were about manipulations on keys we haven't even 702 00:35:22,672 --> 00:35:23,380 talked about yet. 703 00:35:23,380 --> 00:35:23,800 JEREMY KEPNER: Yes. 704 00:35:23,800 --> 00:35:24,300 Yes. 705 00:35:24,300 --> 00:35:26,570 Well, there's another 40 some slides. 706 00:35:29,810 --> 00:35:33,070 But actually, you've gotten a lot of the good questions here. 707 00:35:33,070 --> 00:35:35,970 So the thing though to recognize is that there's 708 00:35:35,970 --> 00:35:37,750 a lot of choices here. 709 00:35:37,750 --> 00:35:41,340 And the algebra you are in is determined by the function 710 00:35:41,340 --> 00:35:42,540 that you choose. 711 00:35:42,540 --> 00:35:44,550 And actually, you'll find yourself 712 00:35:44,550 --> 00:35:47,450 switching between different algebras fairly frequently. 713 00:35:47,450 --> 00:35:48,777 That's how we use spreadsheets. 714 00:35:48,777 --> 00:35:51,360 Sometimes we do this operation, and now we're in this algebra. 715 00:35:51,360 --> 00:35:53,526 And we do this operation, now we're in this algebra. 716 00:35:53,526 --> 00:35:56,772 And so that's the big thing to kind of get away there. 717 00:35:56,772 --> 00:35:58,480 So let's get into this a little bit more. 718 00:35:58,480 --> 00:36:00,900 That's sort of the big overall. 719 00:36:00,900 --> 00:36:03,770 So I think we've talked about this. 720 00:36:03,770 --> 00:36:08,550 Let S is an infinitely strict totally ordered set. 721 00:36:08,550 --> 00:36:11,210 Total order Is an implementation, not 722 00:36:11,210 --> 00:36:13,020 a theoretical requirement. 723 00:36:13,020 --> 00:36:17,140 So the fact that we were imposing this infinite strict-- 724 00:36:17,140 --> 00:36:19,630 that they're totally ordered, that we don't just 725 00:36:19,630 --> 00:36:22,880 allow equality, is more of an implementation detail. 726 00:36:22,880 --> 00:36:25,460 It is very useful for me to internally store 727 00:36:25,460 --> 00:36:28,030 things in order so I could look things up quickly. 728 00:36:28,030 --> 00:36:31,020 However, mathematically, strictly, we just 729 00:36:31,020 --> 00:36:37,000 had a test of equality, all the math hangs together there. 730 00:36:37,000 --> 00:36:40,310 All value's and keys are drawn from the set. 731 00:36:40,310 --> 00:36:45,120 And the allowable operations on them, of two values or keys, 732 00:36:45,120 --> 00:36:47,930 is less than, equal to, or greater than. 733 00:36:47,930 --> 00:36:51,300 So those are the three functions that we essentially allow. 734 00:36:51,300 --> 00:36:53,820 Strict totally ordered set only is 735 00:36:53,820 --> 00:36:56,210 two, which is less than, less than or equal to. 736 00:36:56,210 --> 00:36:59,020 But you get essentially the third one for free. 737 00:36:59,020 --> 00:37:01,459 In addition, we have the concept of three special symbols 738 00:37:01,459 --> 00:37:03,000 here, which we've talked about, which 739 00:37:03,000 --> 00:37:10,100 is the null, the empty set, a least element in the set, 740 00:37:10,100 --> 00:37:11,780 and a maximal element to the set. 741 00:37:11,780 --> 00:37:16,420 So v less than or equal to plus infinity is always true. 742 00:37:16,420 --> 00:37:19,380 And plus infinity is, we're saying, 743 00:37:19,380 --> 00:37:22,490 the maximal element is a part of this set. 744 00:37:22,490 --> 00:37:24,800 Greater than or equal to negative infinity 745 00:37:24,800 --> 00:37:25,570 is always true. 746 00:37:28,870 --> 00:37:35,020 In all set theory, the empty set is a formal member of all sets. 747 00:37:35,020 --> 00:37:43,600 So now we'll talk about-- did that sort of get 748 00:37:43,600 --> 00:37:47,230 to your question about the operations that 749 00:37:47,230 --> 00:37:48,420 are defined on the keys? 750 00:37:48,420 --> 00:37:49,265 AUDIENCE: Yeah. 751 00:37:49,265 --> 00:37:49,764 Yeah. 752 00:37:49,764 --> 00:37:50,535 You [INAUDIBLE]-- 753 00:37:50,535 --> 00:37:51,410 JEREMY KEPNER: Right. 754 00:37:51,410 --> 00:37:53,496 AUDIENCE: [INAUDIBLE]. 755 00:37:53,496 --> 00:37:54,370 JEREMY KEPNER: Right. 756 00:37:54,370 --> 00:37:56,369 So then we talked about the collision functions. 757 00:37:56,369 --> 00:37:58,920 And there are two sort of contextual functions 758 00:37:58,920 --> 00:38:00,140 here, union and intersection. 759 00:38:00,140 --> 00:38:02,480 Because there's the two operations that makes sense. 760 00:38:02,480 --> 00:38:04,470 And then three conditions, less than, equal to, 761 00:38:04,470 --> 00:38:05,136 or greater than. 762 00:38:05,136 --> 00:38:08,950 That is in order to preserve this strict totally ordered 763 00:38:08,950 --> 00:38:09,450 set. 764 00:38:09,450 --> 00:38:11,870 Once I go to values that are reals or integers, 765 00:38:11,870 --> 00:38:12,970 I have more operations. 766 00:38:12,970 --> 00:38:15,229 But if I'm just limiting myself to values 767 00:38:15,229 --> 00:38:17,270 that are members of a strict totally ordered set, 768 00:38:17,270 --> 00:38:19,040 this is all I have. 769 00:38:19,040 --> 00:38:24,900 So that means that you have d plus 5 possible outcomes 770 00:38:24,900 --> 00:38:27,140 to any collision function. 771 00:38:27,140 --> 00:38:29,590 That is, you could have the collision function actually 772 00:38:29,590 --> 00:38:36,830 choose to put out the value of its underlying key. 773 00:38:36,830 --> 00:38:41,180 You could produce v1 or v2 or empty 774 00:38:41,180 --> 00:38:44,650 or minus infinity or plus or sets of these. 775 00:38:44,650 --> 00:38:48,790 So it's a fairly finite choice of results. 776 00:38:48,790 --> 00:38:51,230 If we're going to stay in this sort of restricted thing, 777 00:38:51,230 --> 00:38:53,479 we have a very limited number of things 778 00:38:53,479 --> 00:38:54,520 that can actually happen. 779 00:38:54,520 --> 00:38:56,860 When you're actually applying a collision function 780 00:38:56,860 --> 00:39:00,220 on a particular where there's an intersection, 781 00:39:00,220 --> 00:39:03,780 you have a fairly finite number of things. 782 00:39:03,780 --> 00:39:09,090 However the total number of combinations of these functions 783 00:39:09,090 --> 00:39:11,380 gives a very large number. 784 00:39:11,380 --> 00:39:13,760 And their function pairs gives a very large number 785 00:39:13,760 --> 00:39:16,800 of possible algebras. 786 00:39:16,800 --> 00:39:18,890 I did a back [INAUDIBLE] calculation 787 00:39:18,890 --> 00:39:20,110 said it was 10 to the 30th. 788 00:39:20,110 --> 00:39:22,280 It might be even more that. 789 00:39:22,280 --> 00:39:23,670 And in fact, it might be formally 790 00:39:23,670 --> 00:39:25,620 infinite for all I know. 791 00:39:25,620 --> 00:39:27,110 But that was an easy calculation. 792 00:39:27,110 --> 00:39:28,568 There's a lot of possible algebras. 793 00:39:30,950 --> 00:39:33,450 And this is a fairly impressive level 794 00:39:33,450 --> 00:39:37,330 of functionality given our relatively small numbers 795 00:39:37,330 --> 00:39:40,000 of assumptions that we have here. 796 00:39:40,000 --> 00:39:42,760 But we are going to focus here on just the nice collection 797 00:39:42,760 --> 00:39:45,960 functions, the ones that seem to feel like they give us 798 00:39:45,960 --> 00:39:47,850 intuitively useful things. 799 00:39:47,850 --> 00:39:53,230 So we are not going to use keys as outputs to our function 800 00:39:53,230 --> 00:39:54,370 here. 801 00:39:54,370 --> 00:39:55,140 OK. 802 00:39:55,140 --> 00:39:58,136 We will in some contexts. 803 00:39:58,136 --> 00:40:00,010 And we're going to say the results are always 804 00:40:00,010 --> 00:40:00,680 single valued. 805 00:40:00,680 --> 00:40:03,550 We're not going to deal with results, 806 00:40:03,550 --> 00:40:07,504 expand our values to be sets of values. 807 00:40:07,504 --> 00:40:09,420 Although, in certain contexts we will do that. 808 00:40:09,420 --> 00:40:12,590 But mathematically, let's restrict ourselves to that. 809 00:40:12,590 --> 00:40:14,720 We're going to do no tests on special symbols. 810 00:40:14,720 --> 00:40:16,845 And we're just going to basically say our collision 811 00:40:16,845 --> 00:40:19,420 function can be essentially-- if we 812 00:40:19,420 --> 00:40:21,680 have v, less than v2, the answers can 813 00:40:21,680 --> 00:40:23,910 be one of these five. 814 00:40:23,910 --> 00:40:26,987 If it's equal to these, it can be one of these four. 815 00:40:26,987 --> 00:40:29,320 If it's greater than these, it can be one of these five. 816 00:40:29,320 --> 00:40:32,620 So it gives a fairly limited number of possible collision 817 00:40:32,620 --> 00:40:34,145 functions. 818 00:40:34,145 --> 00:40:36,270 And all these properties are consistent with strict 819 00:40:36,270 --> 00:40:37,610 totally ordered set. 820 00:40:37,610 --> 00:40:40,640 And generally, when we handle a value of this, 821 00:40:40,640 --> 00:40:43,620 it's handled by the union and intersection function first. 822 00:40:43,620 --> 00:40:47,020 It never actually gets passed through to this function. 823 00:40:56,560 --> 00:40:57,060 All right. 824 00:40:57,060 --> 00:41:03,090 So let's move on here. 825 00:41:03,090 --> 00:41:05,670 And just as I said, well, what about concatenation? 826 00:41:05,670 --> 00:41:07,170 In fact, there are contexts where 827 00:41:07,170 --> 00:41:12,670 we will want to concatenate a couple of strings together 828 00:41:12,670 --> 00:41:14,230 or something like that. 829 00:41:14,230 --> 00:41:17,080 And you know, that's fairly supported. 830 00:41:17,080 --> 00:41:22,980 I think the results can be the sets themselves. 831 00:41:22,980 --> 00:41:25,290 You would have a new special symbol, 832 00:41:25,290 --> 00:41:28,320 which is the set itself. 833 00:41:28,320 --> 00:41:30,940 You have now different collision functions. 834 00:41:30,940 --> 00:41:31,730 So you have union. 835 00:41:31,730 --> 00:41:34,550 And then I want a union values as our intersect, 836 00:41:34,550 --> 00:41:38,050 and then union values and union and intersection. 837 00:41:38,050 --> 00:41:39,710 So those are your functions there. 838 00:41:39,710 --> 00:41:40,900 And there's actually a few instances. 839 00:41:40,900 --> 00:41:42,890 I think we've already showed one of them in one of the examples 840 00:41:42,890 --> 00:41:44,321 where we actually do this. 841 00:41:44,321 --> 00:41:46,070 So I'm just sort of throwing that in there 842 00:41:46,070 --> 00:41:48,770 that concatenation is a concept that we do think about. 843 00:41:48,770 --> 00:41:52,090 I don't think we've developed the formalism as richly 844 00:41:52,090 --> 00:41:55,810 in a situation where the values are simply single values. 845 00:41:55,810 --> 00:41:59,722 But we certainly can and do do that. 846 00:41:59,722 --> 00:42:00,430 AUDIENCE: Jeremy. 847 00:42:00,430 --> 00:42:00,795 JEREMY KEPNER: Yes? 848 00:42:00,795 --> 00:42:02,630 AUDIENCE: Third example, was that [INAUDIBLE] 849 00:42:02,630 --> 00:42:03,879 for identity instead of union? 850 00:42:06,675 --> 00:42:07,312 [INAUDIBLE] 851 00:42:07,312 --> 00:42:08,520 JEREMY KEPNER: This one here? 852 00:42:08,520 --> 00:42:11,412 AUDIENCE: No, go up. 853 00:42:11,412 --> 00:42:12,858 Get a little bit to the left. 854 00:42:12,858 --> 00:42:16,714 The symbol-- [INAUDIBLE] was upside down. 855 00:42:16,714 --> 00:42:18,747 AUDIENCE: I have a similar question. 856 00:42:18,747 --> 00:42:20,080 Is the one above it [INAUDIBLE]. 857 00:42:20,080 --> 00:42:21,129 JEREMY KEPNER: Yeah. 858 00:42:21,129 --> 00:42:22,170 I might have a typo here. 859 00:42:22,170 --> 00:42:24,710 I should double check that. 860 00:42:24,710 --> 00:42:25,210 Yeah. 861 00:42:25,210 --> 00:42:28,160 So v union that. 862 00:42:28,160 --> 00:42:30,350 All right, so that should be an intersection. 863 00:42:30,350 --> 00:42:32,760 And that should be union. 864 00:42:32,760 --> 00:42:36,760 And that should be intersection. 865 00:42:36,760 --> 00:42:38,820 And that should be union. 866 00:42:38,820 --> 00:42:41,290 So I just got them to flop there. 867 00:42:41,290 --> 00:42:42,410 So thank you. 868 00:42:42,410 --> 00:42:44,799 Full marks to you. 869 00:42:44,799 --> 00:42:45,840 We will make that change. 870 00:42:56,080 --> 00:42:58,170 All right. 871 00:42:58,170 --> 00:43:01,335 So that one is intersection, correct? 872 00:43:01,335 --> 00:43:05,970 And this 873 00:43:05,970 --> 00:43:08,360 AUDIENCE: [INAUDIBLE]. 874 00:43:08,360 --> 00:43:14,235 JEREMY KEPNER: --three should be intersection. 875 00:43:14,235 --> 00:43:17,160 All right. 876 00:43:17,160 --> 00:43:17,720 Very good. 877 00:43:17,720 --> 00:43:19,090 We'll check in the SVN. 878 00:43:19,090 --> 00:43:22,770 In the next down time, it will be part of your [INAUDIBLE]. 879 00:43:22,770 --> 00:43:24,340 Moving on. 880 00:43:24,340 --> 00:43:24,840 All right. 881 00:43:24,840 --> 00:43:26,381 So one of the things we're eventually 882 00:43:26,381 --> 00:43:28,940 going to work towards is trying to make matrix multiply work 883 00:43:28,940 --> 00:43:30,650 in this context. 884 00:43:30,650 --> 00:43:32,540 OK. 885 00:43:32,540 --> 00:43:34,890 We've already talked about the duality 886 00:43:34,890 --> 00:43:38,260 between the fundamental operation of grass, 887 00:43:38,260 --> 00:43:44,220 which is breadth first search and vector matrix multiply. 888 00:43:44,220 --> 00:43:46,990 We continue to want to be able to use that. 889 00:43:46,990 --> 00:43:48,910 It's a very powerful feature. 890 00:43:48,910 --> 00:43:53,170 And what we find is that in graph algorithms, where we're 891 00:43:53,170 --> 00:43:56,710 dealing with things on strings or other types of operations, 892 00:43:56,710 --> 00:44:00,511 that most graph algorithms can be reduced to operations, 893 00:44:00,511 --> 00:44:01,760 what are called on semi-rings. 894 00:44:01,760 --> 00:44:06,020 So those are generalizations of normal linear algebra, whereby 895 00:44:06,020 --> 00:44:10,790 your two core operations here, what is called the element 896 00:44:10,790 --> 00:44:14,040 wise multiply and the element addition operation, 897 00:44:14,040 --> 00:44:17,110 are this has the property of being associative 898 00:44:17,110 --> 00:44:19,300 and distribute over the plus operation. 899 00:44:19,300 --> 00:44:20,790 And plus has the property of being 900 00:44:20,790 --> 00:44:22,530 associative and commutative. 901 00:44:22,530 --> 00:44:25,860 And examples of this include the traditional matrix multiply 902 00:44:25,860 --> 00:44:33,950 plus and multiply, min.+, or max.+. 903 00:44:33,950 --> 00:44:35,410 In fact, there's a whole sub branch 904 00:44:35,410 --> 00:44:39,170 of algebra called max plus algebras, which do this. 905 00:44:39,170 --> 00:44:42,380 Another one is or.and and other types of things. 906 00:44:42,380 --> 00:44:50,660 So mathematically, semi-rings are certainly well-studied more 907 00:44:50,660 --> 00:44:58,240 in the context of specific algebras, like min.+, min+, 908 00:44:58,240 --> 00:45:00,420 or max+, or something like that. 909 00:45:00,420 --> 00:45:02,740 Here, with the associative arrays, 910 00:45:02,740 --> 00:45:04,950 we're dealing with you could be hopping 911 00:45:04,950 --> 00:45:09,530 and popping back and forth within the same expression. 912 00:45:09,530 --> 00:45:11,070 You could be moving back and forth 913 00:45:11,070 --> 00:45:12,445 between these different algebras. 914 00:45:12,445 --> 00:45:14,153 So I think that's a little bit different. 915 00:45:14,153 --> 00:45:16,960 We're taking sort of a bigger view, a more data-centric view. 916 00:45:16,960 --> 00:45:20,100 Our data can be of different kinds, and then impose upon it 917 00:45:20,100 --> 00:45:24,380 different algebras as we move forward. 918 00:45:24,380 --> 00:45:26,280 And the real theory questions that we're 919 00:45:26,280 --> 00:45:29,270 trying to answer here is we have this concept 920 00:45:29,270 --> 00:45:32,180 of associative arrays, which is kind of new. 921 00:45:32,180 --> 00:45:34,950 We have the traditional linear algebra. 922 00:45:34,950 --> 00:45:38,315 And there's going to be areas where they overlap. 923 00:45:38,315 --> 00:45:41,080 Your ideas, your intuition from linear algebra, 924 00:45:41,080 --> 00:45:43,710 and your intuition from associative arrays 925 00:45:43,710 --> 00:45:46,960 will be very, very well-connected. 926 00:45:46,960 --> 00:45:47,820 OK. 927 00:45:47,820 --> 00:45:52,570 And there's going to places where associative arrays 928 00:45:52,570 --> 00:45:56,000 give you new properties that you didn't have in linear algebra. 929 00:45:56,000 --> 00:45:58,530 And those will allow you to do new things. 930 00:45:58,530 --> 00:46:01,290 And there's going to be cases where your linear algebra 931 00:46:01,290 --> 00:46:03,450 intuition is wrong. 932 00:46:03,450 --> 00:46:05,335 That is, taking your linear algebra intuition 933 00:46:05,335 --> 00:46:08,070 and applying it to associative arrays 934 00:46:08,070 --> 00:46:12,500 will lead you into things that won't make sense. 935 00:46:12,500 --> 00:46:13,910 And so this is kind of what we're 936 00:46:13,910 --> 00:46:17,651 trying to give people a sense here, of where they overlap, 937 00:46:17,651 --> 00:46:20,150 where you should watch out, and where there's new properties 938 00:46:20,150 --> 00:46:22,774 that you didn't know about that you should maybe take advantage 939 00:46:22,774 --> 00:46:23,630 of. 940 00:46:23,630 --> 00:46:27,910 The biggest one being the universal conformance 941 00:46:27,910 --> 00:46:31,550 of addition and multiplication of all associative arrays. 942 00:46:31,550 --> 00:46:35,810 So any two associative arrays, regardless of their size, 943 00:46:35,810 --> 00:46:39,990 can be multiplied and added to each other unlike traditional 944 00:46:39,990 --> 00:46:42,360 linear algebra, where you have very strict constraints 945 00:46:42,360 --> 00:46:46,850 about the dimensions of matrices in order for them to be added. 946 00:46:46,850 --> 00:46:49,840 And we do use this all the time. 947 00:46:49,840 --> 00:46:53,260 We use this all the time, essentially almost exactly 948 00:46:53,260 --> 00:46:54,820 in the example that we talked about. 949 00:46:54,820 --> 00:46:56,850 It was a question that was brought up earlier. 950 00:46:56,850 --> 00:46:59,390 We can add a vector of three elements 951 00:46:59,390 --> 00:47:01,730 and a vector of five elements and have 952 00:47:01,730 --> 00:47:03,301 something that makes sense. 953 00:47:03,301 --> 00:47:03,800 OK. 954 00:47:03,800 --> 00:47:09,060 Likewise, we can multiply a 10 by 12 with a 6 by 14 955 00:47:09,060 --> 00:47:11,980 and have something that actually makes sense. 956 00:47:11,980 --> 00:47:13,840 So that's a very powerful concept. 957 00:47:13,840 --> 00:47:17,920 And I'd say it's one of the most commonly used features. 958 00:47:17,920 --> 00:47:18,420 All right. 959 00:47:18,420 --> 00:47:19,890 So we've done the basic definition. 960 00:47:19,890 --> 00:47:21,390 Now, we're going to kind of probably 961 00:47:21,390 --> 00:47:24,210 move a little bit quicker here into the group theory. 962 00:47:24,210 --> 00:47:27,750 So the group theory is when you pick various functions. 963 00:47:27,750 --> 00:47:30,740 What kind of algebra's a result of that? 964 00:47:30,740 --> 00:47:33,450 So we're going to talk about binary operators, 965 00:47:33,450 --> 00:47:36,790 and then get into commutative monoids, semi-rings, and then 966 00:47:36,790 --> 00:47:38,520 something we call the feld. 967 00:47:38,520 --> 00:47:42,210 So our joke there is that so a field 968 00:47:42,210 --> 00:47:47,160 is when you do the traditional mathematics of linear algebra. 969 00:47:47,160 --> 00:47:50,150 You're doing them over the real field or the complex field 970 00:47:50,150 --> 00:47:50,710 of numbers. 971 00:47:50,710 --> 00:47:53,960 It's a set of numbers with certain defined properties. 972 00:47:53,960 --> 00:47:55,680 OK. 973 00:47:55,680 --> 00:47:58,680 When we impose this condition that we 974 00:47:58,680 --> 00:48:00,690 are doing with strict totally ordered sets, 975 00:48:00,690 --> 00:48:03,760 we don't have an additive inverse. 976 00:48:03,760 --> 00:48:06,440 There's no way to formally subtract 977 00:48:06,440 --> 00:48:07,690 two words from each other. 978 00:48:07,690 --> 00:48:10,450 We can't have a string and a negative string. 979 00:48:10,450 --> 00:48:13,390 So we don't get, like, a and minus a, 980 00:48:13,390 --> 00:48:15,660 which is part of a field. 981 00:48:15,660 --> 00:48:19,540 So we have is feld, which is a field without an inverse. 982 00:48:19,540 --> 00:48:21,970 So mathematicians are very funny. 983 00:48:21,970 --> 00:48:23,470 And when they do this, they often 984 00:48:23,470 --> 00:48:28,510 will drop the i or something like that to make that concept. 985 00:48:28,510 --> 00:48:30,700 So we will be doing math over a-- 986 00:48:30,700 --> 00:48:33,807 we can have much of the properties of linear algebra. 987 00:48:33,807 --> 00:48:36,390 We just have to recognize that we don't have additive inverse. 988 00:48:36,390 --> 00:48:40,570 So it's sort of like vector spaces over felds. 989 00:48:40,570 --> 00:48:42,570 Or sometimes they're called semi-vectors spaces. 990 00:48:42,570 --> 00:48:44,250 You can put semi in front of anything. 991 00:48:44,250 --> 00:48:47,500 And it's like it's a little bit of this, but minus something. 992 00:48:47,500 --> 00:48:51,360 So this is kind of our operator roadmap. 993 00:48:51,360 --> 00:48:56,330 So you see here, we began with three definitions. 994 00:48:56,330 --> 00:48:59,470 We will have narrowed it down to 200 operators of interest 995 00:48:59,470 --> 00:49:03,540 of those nice collision functions and union functions. 996 00:49:03,540 --> 00:49:07,820 There's about 200 combinations that work well. 997 00:49:07,820 --> 00:49:11,410 We will find that 18 of those operators are associative. 998 00:49:11,410 --> 00:49:14,940 So not associative in terms of associative array, 999 00:49:14,940 --> 00:49:18,777 but associative in terms of the formal mathematical concept 1000 00:49:18,777 --> 00:49:19,485 of associativity. 1001 00:49:22,140 --> 00:49:25,930 And so those form what are called semigroups. 1002 00:49:25,930 --> 00:49:28,670 14 of them are commutative, which means 1003 00:49:28,670 --> 00:49:31,840 they are Abelian semigroups. 1004 00:49:31,840 --> 00:49:36,960 Commutativity, that just means that you can 1005 00:49:36,960 --> 00:49:38,630 switch the order of operations. 1006 00:49:38,630 --> 00:49:42,090 a plus b is the same as b plus a. 1007 00:49:42,090 --> 00:49:44,790 Not something that is a strict requirement. 1008 00:49:44,790 --> 00:49:46,790 But gosh, it's sure nice to have to not 1009 00:49:46,790 --> 00:49:49,650 have to be like oh, no, I switched the order. 1010 00:49:49,650 --> 00:49:51,310 And now the answer is different. 1011 00:49:51,310 --> 00:49:55,190 So traditional matrix multiply is like that, right? 1012 00:49:55,190 --> 00:49:58,790 a times b is not the same thing as b times a. 1013 00:49:58,790 --> 00:50:00,080 And we will have that as well. 1014 00:50:00,080 --> 00:50:02,480 But commutativity is really nice. 1015 00:50:02,480 --> 00:50:06,350 We then take these 14 operations and explore 1016 00:50:06,350 --> 00:50:07,800 all their possible pairs. 1017 00:50:07,800 --> 00:50:09,284 There's 196 of them. 1018 00:50:09,284 --> 00:50:10,950 We look at which one of them distribute. 1019 00:50:10,950 --> 00:50:14,610 There's 74 that distribute, so they form semirings. 1020 00:50:14,610 --> 00:50:20,190 We then look to see if we have an identity and an annihilator. 1021 00:50:20,190 --> 00:50:21,940 Some of our special symbols, can they 1022 00:50:21,940 --> 00:50:27,450 perform that role in which case they can form a feld? 1023 00:50:27,450 --> 00:50:29,010 And then we can create, essentially, 1024 00:50:29,010 --> 00:50:33,010 vector semi-spaces or vector spaces over these felds. 1025 00:50:33,010 --> 00:50:35,354 And we'll have about 18 of them when we're all done. 1026 00:50:35,354 --> 00:50:36,770 And the good thing is when you get 1027 00:50:36,770 --> 00:50:38,700 all done with this, all the ones that you like 1028 00:50:38,700 --> 00:50:41,060 are still in the game. 1029 00:50:41,060 --> 00:50:46,000 And a few of the ones you might like, some of my favorites, 1030 00:50:46,000 --> 00:50:48,667 they don't get as far down this path as you want. 1031 00:50:48,667 --> 00:50:50,125 But then you have these properties. 1032 00:50:50,125 --> 00:50:52,950 And you're like, OK, I have vector space properties 1033 00:50:52,950 --> 00:50:54,660 and other types of things. 1034 00:50:54,660 --> 00:50:59,120 And I'm sure some of you will find issues or comments 1035 00:50:59,120 --> 00:51:01,660 or criticisms that we haven't written this up formally 1036 00:51:01,660 --> 00:51:02,840 in journals. 1037 00:51:02,840 --> 00:51:06,515 I've been desperately trying to hire abstract algebraists 1038 00:51:06,515 --> 00:51:09,730 as summer students and have not been successful so far. 1039 00:51:09,730 --> 00:51:11,900 So if you know any abstract algebraists that 1040 00:51:11,900 --> 00:51:13,400 would like to work on this, we would 1041 00:51:13,400 --> 00:51:18,510 be happy to pay them to help us work on this. 1042 00:51:18,510 --> 00:51:20,860 If we extend our definition of values 1043 00:51:20,860 --> 00:51:24,850 to include sets, ie, to include concatenation, 1044 00:51:24,850 --> 00:51:26,690 this is essentially what it does. 1045 00:51:26,690 --> 00:51:28,220 It adds four functions here. 1046 00:51:28,220 --> 00:51:31,020 They actually make it through all these operations here. 1047 00:51:31,020 --> 00:51:33,240 And they basically keep on going. 1048 00:51:33,240 --> 00:51:35,450 But then they fall away here when we try and get 1049 00:51:35,450 --> 00:51:38,820 to this final feld step. 1050 00:51:38,820 --> 00:51:44,460 But you know, this is still a very useful space to be in. 1051 00:51:44,460 --> 00:51:44,960 All right. 1052 00:51:44,960 --> 00:51:47,080 So let me explain a little bit of how 1053 00:51:47,080 --> 00:51:50,690 we sort of navigate this space. 1054 00:51:50,690 --> 00:51:55,417 So if we can limit ourselves to special function combinations 1055 00:51:55,417 --> 00:51:57,750 that are associative and commutative-- so just remember, 1056 00:51:57,750 --> 00:52:01,250 associative just means you can group your parentheses however 1057 00:52:01,250 --> 00:52:04,100 you want. 1058 00:52:04,100 --> 00:52:06,050 I always have to look it up. 1059 00:52:06,050 --> 00:52:07,760 Never once been like associative-- 1060 00:52:07,760 --> 00:52:09,150 I mean, I don't know. 1061 00:52:09,150 --> 00:52:10,660 It's never been intuitive to me. 1062 00:52:10,660 --> 00:52:12,285 But that's the grouping of parentheses. 1063 00:52:15,270 --> 00:52:18,660 Commutativity just means that you can flip. 1064 00:52:18,660 --> 00:52:20,030 So these are our functions. 1065 00:52:20,030 --> 00:52:23,510 So these are all the 18 here. 1066 00:52:23,510 --> 00:52:26,680 And then the grade-- one's grade out of the ones you lose, 1067 00:52:26,680 --> 00:52:27,930 because of the commutativity. 1068 00:52:27,930 --> 00:52:30,130 So basically this function, which 1069 00:52:30,130 --> 00:52:34,256 is just left, which just says return the left value 1070 00:52:34,256 --> 00:52:36,130 in all circumstances, well, obviously, that's 1071 00:52:36,130 --> 00:52:39,090 not commutative. 1072 00:52:39,090 --> 00:52:41,090 And quite frankly, left and right are 1073 00:52:41,090 --> 00:52:42,440 sort of silly functions. 1074 00:52:42,440 --> 00:52:47,361 You Sometimes it almost is essentially a no op. 1075 00:52:47,361 --> 00:52:48,860 If you knew you want the left value, 1076 00:52:48,860 --> 00:52:53,407 you just take the left value and move on. 1077 00:52:53,407 --> 00:52:54,990 So that doesn't really-- so then we're 1078 00:52:54,990 --> 00:52:58,675 left with a lot of ones we like, essentially union max. 1079 00:52:58,675 --> 00:53:02,550 So basically, take the max value, union min. 1080 00:53:02,550 --> 00:53:06,580 I call this the intersection delta function just return. 1081 00:53:06,580 --> 00:53:10,330 It only gives you an answer if the values are the same. 1082 00:53:10,330 --> 00:53:12,320 This is essentially the union. 1083 00:53:12,320 --> 00:53:15,210 It's sort of like the x or function. 1084 00:53:15,210 --> 00:53:19,180 If there's a collision, blow it away. 1085 00:53:19,180 --> 00:53:21,210 And then there are very sort of other kind 1086 00:53:21,210 --> 00:53:24,150 of more unusual combinations here dealing 1087 00:53:24,150 --> 00:53:27,570 with the special functions. 1088 00:53:27,570 --> 00:53:30,010 We tend to kind of really live mostly here. 1089 00:53:30,010 --> 00:53:31,930 These tend to be the ones that really 1090 00:53:31,930 --> 00:53:33,540 are the ones you use a lot. 1091 00:53:33,540 --> 00:53:36,385 I haven't really used these too much. 1092 00:53:39,591 --> 00:53:40,090 All right. 1093 00:53:40,090 --> 00:53:42,110 And so we're left with these, what are called, 1094 00:53:42,110 --> 00:53:45,435 Abelian semigroups are these 14 highlighted ones here. 1095 00:53:45,435 --> 00:53:48,160 AUDIENCE: So Jeremy, I feel I'm going to get lost here. 1096 00:53:48,160 --> 00:53:50,130 What does Abelian mean? 1097 00:53:50,130 --> 00:53:53,680 JEREMY KEPNER: So Abelian is a semigroup that is commutative. 1098 00:53:53,680 --> 00:53:55,780 That's all it means. 1099 00:53:55,780 --> 00:53:58,290 So basically, it means you have associativity. 1100 00:53:58,290 --> 00:54:02,240 And then it's Abelian if you add commutativity to it. 1101 00:54:02,240 --> 00:54:05,510 So any sets of numbers that obey grouping operations 1102 00:54:05,510 --> 00:54:08,400 and are commutative is Abelian. 1103 00:54:08,400 --> 00:54:10,816 I mean, I think Abel did a lot more 1104 00:54:10,816 --> 00:54:13,190 that he didn't need to get his name associated with that. 1105 00:54:13,190 --> 00:54:15,565 It could have just been called comm-- it's sometimes just 1106 00:54:15,565 --> 00:54:17,770 called a commutative semigroup. 1107 00:54:17,770 --> 00:54:20,510 So you know, it's lucky when you get 1108 00:54:20,510 --> 00:54:22,590 your name added to some really trivial operation. 1109 00:54:22,590 --> 00:54:23,881 I mean, he did a lot of things. 1110 00:54:23,881 --> 00:54:25,820 But they threw him on here. 1111 00:54:25,820 --> 00:54:28,210 And in fact, in group theory, they complain about this. 1112 00:54:28,210 --> 00:54:30,126 And there are numerous rants about how they've 1113 00:54:30,126 --> 00:54:31,770 named all their groups after people, 1114 00:54:31,770 --> 00:54:33,360 and they don't tell you anything. 1115 00:54:33,360 --> 00:54:37,080 As opposed to other branches of mathematics, who would simply 1116 00:54:37,080 --> 00:54:42,330 just call this a commutative semigroup or just a commutative 1117 00:54:42,330 --> 00:54:43,840 associative group. 1118 00:54:43,840 --> 00:54:50,177 That might be even more-- and of all the different properties, 1119 00:54:50,177 --> 00:54:52,260 most of them you can combine in whatever you want. 1120 00:54:52,260 --> 00:54:55,050 There's almost an infinite number of groups. 1121 00:54:55,050 --> 00:54:58,420 So there's certain ones that are most useful. 1122 00:54:58,420 --> 00:54:59,270 Let's see here. 1123 00:54:59,270 --> 00:55:00,035 So now, we start-- 1124 00:55:00,035 --> 00:55:01,869 AUDIENCE: Along those lines, what's the-- 1125 00:55:01,869 --> 00:55:03,243 and I'm sorry if you already said 1126 00:55:03,243 --> 00:55:06,780 this-- distinction between a semigroup and a group? 1127 00:55:06,780 --> 00:55:10,542 JEREMY KEPNER: A group and a semigroup. 1128 00:55:10,542 --> 00:55:17,401 I want to say-- oh, it's an Abelian group without inverses. 1129 00:55:17,401 --> 00:55:18,363 So a semigroup there. 1130 00:55:18,363 --> 00:55:21,710 So these are all Abelian semigroup or Abelian groups 1131 00:55:21,710 --> 00:55:24,870 without inverses or commutative associative groups 1132 00:55:24,870 --> 00:55:26,160 without inverses. 1133 00:55:26,160 --> 00:55:30,510 So you see the problem. 1134 00:55:30,510 --> 00:55:32,595 And so I guess, he studied them a lot. 1135 00:55:32,595 --> 00:55:34,220 So he got his name associated with them 1136 00:55:34,220 --> 00:55:36,240 and proved their properties and stuff. 1137 00:55:36,240 --> 00:55:41,550 So we have 14 of those that form 196 pairs. 1138 00:55:41,550 --> 00:55:44,340 So these will begin to-- so we want 1139 00:55:44,340 --> 00:55:46,850 to look at the ones that are distributive. 1140 00:55:46,850 --> 00:55:50,890 So that basically means we assign one of those operators 1141 00:55:50,890 --> 00:55:53,370 to be the addition operation, the other to be 1142 00:55:53,370 --> 00:55:55,220 the multiplication operation. 1143 00:55:55,220 --> 00:55:59,450 And we need to show that it is distributive. 1144 00:55:59,450 --> 00:56:04,800 And of those 196, 74 operator pairs are distributive. 1145 00:56:04,800 --> 00:56:06,280 These are called semirings. 1146 00:56:06,280 --> 00:56:07,560 Or they could be called rings. 1147 00:56:07,560 --> 00:56:10,100 So a semiring is a ring without an inverse, so rings 1148 00:56:10,100 --> 00:56:12,720 without inverses and without identity elements. 1149 00:56:12,720 --> 00:56:15,710 And if you look at the various definitions, 1150 00:56:15,710 --> 00:56:17,937 I mean, there's the Wikipedia definition, 1151 00:56:17,937 --> 00:56:20,020 which mathematicians will say is that's not really 1152 00:56:20,020 --> 00:56:20,770 a true definition. 1153 00:56:20,770 --> 00:56:23,042 There's a Wolfram math rule definition, 1154 00:56:23,042 --> 00:56:24,000 which is more rigorous. 1155 00:56:26,740 --> 00:56:30,800 And they often disagree on this stuff. 1156 00:56:30,800 --> 00:56:33,280 I tend to be like if Wikipedia says is true, 1157 00:56:33,280 --> 00:56:34,780 then most people on the planet Earth 1158 00:56:34,780 --> 00:56:36,910 believe that that is what is true. 1159 00:56:36,910 --> 00:56:40,910 And so therefore, you should be aware that is the truth 1160 00:56:40,910 --> 00:56:43,440 that most people will believe. 1161 00:56:43,440 --> 00:56:46,220 So I don't know which one we've chosen here, whether it's 1162 00:56:46,220 --> 00:56:47,990 the Wikipedia or the Wolfram. 1163 00:56:47,990 --> 00:56:49,420 There's also various encyclopedias 1164 00:56:49,420 --> 00:56:50,587 that define this stuff, too. 1165 00:56:50,587 --> 00:56:51,295 AUDIENCE: Jeremy? 1166 00:56:51,295 --> 00:56:52,146 JEREMY KEPNER: Yes. 1167 00:56:52,146 --> 00:56:53,655 AUDIENCE: Can you define ring? 1168 00:56:53,655 --> 00:56:54,780 JEREMY KEPNER: Define what? 1169 00:56:54,780 --> 00:56:55,990 AUDIENCE: Ring. 1170 00:56:55,990 --> 00:56:56,831 JEREMY KEPNER: Ring? 1171 00:56:56,831 --> 00:56:58,124 AUDIENCE: Yeah. 1172 00:56:58,124 --> 00:57:07,250 JEREMY KEPNER: Oh, uh, well, a ring would be this. 1173 00:57:07,250 --> 00:57:11,040 And it would have inverses and identity elements. 1174 00:57:11,040 --> 00:57:13,300 So it's basically something that satisfies 1175 00:57:13,300 --> 00:57:19,920 that's distributive and has inverses and identity elements. 1176 00:57:23,930 --> 00:57:28,860 But if you type ring in the Wiki-- and I should say, 1177 00:57:28,860 --> 00:57:31,330 the great thing about Wikipedia is like all the definitions 1178 00:57:31,330 --> 00:57:31,829 are linked. 1179 00:57:31,829 --> 00:57:35,780 So after seven or eight clicks, it kind of all holds together. 1180 00:57:35,780 --> 00:57:37,110 So it's pretty nice. 1181 00:57:37,110 --> 00:57:41,690 And Wolfram is the same way, just enough knowledge 1182 00:57:41,690 --> 00:57:45,820 to be dangerous. 1183 00:57:45,820 --> 00:57:49,630 The internet's your friend, right? 1184 00:57:49,630 --> 00:57:51,070 So moving on here. 1185 00:57:51,070 --> 00:57:53,570 And this is something that's less important, but kind 1186 00:57:53,570 --> 00:57:55,945 of for completeness if we're going to head towards vector 1187 00:57:55,945 --> 00:57:58,475 space, we need to address, which is the concept of identity 1188 00:57:58,475 --> 00:57:58,975 elements. 1189 00:58:02,890 --> 00:58:06,000 So zero is the additive identity. 1190 00:58:06,000 --> 00:58:08,450 When you think of normal math, you add zero to something, 1191 00:58:08,450 --> 00:58:10,690 it doesn't change it. 1192 00:58:10,690 --> 00:58:13,810 And the choices for additive identity elements, 1193 00:58:13,810 --> 00:58:16,620 we have three special symbols, and we could pick them. 1194 00:58:16,620 --> 00:58:19,960 We have the multiplicative identity, which is 1, 1195 00:58:19,960 --> 00:58:23,380 and the multiplicative annihilator. 1196 00:58:23,380 --> 00:58:26,900 So of the choices here, we have 12 semirings 1197 00:58:26,900 --> 00:58:30,270 with the appropriate zeros and ones. 1198 00:58:30,270 --> 00:58:33,130 We have four that actually have two combinations. 1199 00:58:33,130 --> 00:58:36,360 And we have 16 total operations-- of the 16, 1200 00:58:36,360 --> 00:58:38,061 there are 6 operators. 1201 00:58:38,061 --> 00:58:39,310 These are different operators. 1202 00:58:39,310 --> 00:58:41,310 And again, we call these felds without inverses. 1203 00:58:41,310 --> 00:58:43,210 I'll get into it a little bit more. 1204 00:58:43,210 --> 00:58:51,686 So for instance-- and we can skip that for now. 1205 00:58:51,686 --> 00:58:53,560 So just a better way to look at that is these 1206 00:58:53,560 --> 00:58:55,790 are our operator pairs. 1207 00:58:55,790 --> 00:58:56,900 OK. 1208 00:58:56,900 --> 00:58:59,190 And we wanted to see which one of the them 1209 00:58:59,190 --> 00:59:04,270 sort of form these felds. 1210 00:59:04,270 --> 00:59:07,300 So the ones that distribute are marked with a D here. 1211 00:59:07,300 --> 00:59:09,090 The ones that distribute and have 1212 00:59:09,090 --> 00:59:14,790 a 0, 1 operator pair that works, are shown in the square here. 1213 00:59:14,790 --> 00:59:16,240 And some of them have two. 1214 00:59:16,240 --> 00:59:20,750 So if I pair [INAUDIBLE] plus union min 1215 00:59:20,750 --> 00:59:26,900 and multiply a intersection min, and I define 0 1216 00:59:26,900 --> 00:59:33,680 to be the empty value, and 1 to be plus infinity, 1217 00:59:33,680 --> 00:59:41,670 then I can create a feld, essentially, out of that. 1218 00:59:41,670 --> 00:59:45,990 I had a lot of debate with someone about this one, 1219 00:59:45,990 --> 00:59:48,990 whether I can have plus infinity be the 0 element 1220 00:59:48,990 --> 00:59:52,210 and have it be less than the 1 element. 1221 00:59:52,210 --> 00:59:55,580 Was there a requirement that 0 actually be less than 1 1222 00:59:55,580 --> 00:59:56,595 in this definition? 1223 00:59:58,474 --> 01:00:00,390 When I talked to mathematicians, they're like, 1224 01:00:00,390 --> 01:00:04,510 eh, it's kind of what you want, then you make it that. 1225 01:00:04,510 --> 01:00:05,900 So this just shows the full space 1226 01:00:05,900 --> 01:00:09,740 of things that are possible. 1227 01:00:09,740 --> 01:00:13,300 If we go back to our concatenation operators here, 1228 01:00:13,300 --> 01:00:16,180 so these were our four concatenation operators 1229 01:00:16,180 --> 01:00:20,570 and our collision functions. 1230 01:00:20,570 --> 01:00:22,760 And this shows you what that set looks like. 1231 01:00:22,760 --> 01:00:25,600 So these are the four by four pairs here. 1232 01:00:25,600 --> 01:00:28,630 All operators, they all distribute. 1233 01:00:28,630 --> 01:00:31,280 And 16 of these form semirings. 1234 01:00:31,280 --> 01:00:34,240 Because you're able to construct these various zeroes. 1235 01:00:34,240 --> 01:00:35,710 And this isn't, like, rigorous. 1236 01:00:35,710 --> 01:00:39,490 I mean, I might have messed one up or a few up here or there 1237 01:00:39,490 --> 01:00:41,326 or something like that. 1238 01:00:41,326 --> 01:00:42,950 But this just kind of gives you a sense 1239 01:00:42,950 --> 01:00:44,604 of the space that we're working on. 1240 01:00:44,604 --> 01:00:47,270 AUDIENCE: You made some typos on that table [? you ?] fix later. 1241 01:00:47,270 --> 01:00:48,670 JEREMY KEPNER: Probably. 1242 01:00:48,670 --> 01:00:49,170 Probably. 1243 01:00:52,360 --> 01:00:54,990 Now, I'm going to kind of really move it forward here. 1244 01:00:54,990 --> 01:00:57,250 So we get into vector spaces. 1245 01:00:57,250 --> 01:01:02,880 So we can have associative array vector addition. 1246 01:01:02,880 --> 01:01:05,390 Again, all associative arrays are conformant. 1247 01:01:05,390 --> 01:01:09,900 We have the concept of scalar multiplication, 1248 01:01:09,900 --> 01:01:13,600 which is essentially applied to all values. 1249 01:01:13,600 --> 01:01:15,380 So one of the things I really struggle 1250 01:01:15,380 --> 01:01:21,100 with-- so scalar multiplication kind of makes sense, right? 1251 01:01:21,100 --> 01:01:23,700 If I have an associative array and I multiply it by a scalar, 1252 01:01:23,700 --> 01:01:27,980 I can imagine just applying that in an intersection 1253 01:01:27,980 --> 01:01:33,980 sense only to the keys are defined by one. 1254 01:01:33,980 --> 01:01:40,700 Scalar addition, though, is very difficult. 1255 01:01:40,700 --> 01:01:42,750 If I have a scalar plus an associative array, 1256 01:01:42,750 --> 01:01:44,940 does it only apply to the keys? 1257 01:01:44,940 --> 01:01:48,080 Or is a scalar really the associative 1258 01:01:48,080 --> 01:01:50,390 array that's defined everywhere over all things? 1259 01:01:50,390 --> 01:01:51,300 So it's infinite. 1260 01:01:51,300 --> 01:01:52,800 So that's something I struggle with. 1261 01:01:52,800 --> 01:01:57,250 So when you ask why is scalar addition not supported in D4M, 1262 01:01:57,250 --> 01:02:01,040 it's because I don't know what it's supposed to mean. 1263 01:02:01,040 --> 01:02:03,481 And you can easily just pop out the values, 1264 01:02:03,481 --> 01:02:05,230 add whatever you want, stuff them back in, 1265 01:02:05,230 --> 01:02:06,300 and can be on your way. 1266 01:02:06,300 --> 01:02:07,740 And then your safe. 1267 01:02:10,340 --> 01:02:12,580 So in the vector space that we form, 1268 01:02:12,580 --> 01:02:14,431 it meets the plus requirements. 1269 01:02:14,431 --> 01:02:14,930 It commutes. 1270 01:02:14,930 --> 01:02:16,030 It's associative. 1271 01:02:16,030 --> 01:02:17,530 We have an identity. 1272 01:02:17,530 --> 01:02:18,680 But we have no inverse. 1273 01:02:18,680 --> 01:02:19,470 So we have to be careful. 1274 01:02:19,470 --> 01:02:20,678 That's why we don't have add. 1275 01:02:20,678 --> 01:02:23,480 And a vector space, it meets the scalar requirements. 1276 01:02:23,480 --> 01:02:25,890 So all associative array operator 1277 01:02:25,890 --> 01:02:28,560 pairs that yield felds, also result in vector spaces 1278 01:02:28,560 --> 01:02:30,390 without inverse spaces. 1279 01:02:30,390 --> 01:02:33,630 Maybe we call these vectors semispaces, I don't know, 1280 01:02:33,630 --> 01:02:37,392 or vector spaces over a feld or something like that. 1281 01:02:37,392 --> 01:02:38,600 What kind of properties here? 1282 01:02:38,600 --> 01:02:40,150 Well, we have scale identities. 1283 01:02:40,150 --> 01:02:40,780 That's great. 1284 01:02:40,780 --> 01:02:42,180 You could create subspaces. 1285 01:02:42,180 --> 01:02:43,880 That makes sense, too. 1286 01:02:43,880 --> 01:02:46,360 The concept of a span, yes, you can definitely 1287 01:02:46,360 --> 01:02:48,390 do concepts of spans. 1288 01:02:48,390 --> 01:02:50,440 Does span equal a subspace? 1289 01:02:50,440 --> 01:02:52,880 So this is a big question in vector space 1290 01:02:52,880 --> 01:02:57,340 theory, spans on subspaces. 1291 01:02:57,340 --> 01:02:58,790 Not sure. 1292 01:02:58,790 --> 01:03:00,260 Linear dependence. 1293 01:03:00,260 --> 01:03:03,270 Is there a nontrivial linear combination 1294 01:03:03,270 --> 01:03:07,750 of vectors equal to the plus identity? 1295 01:03:07,750 --> 01:03:10,540 You really can't do this without an additive inverse. 1296 01:03:10,540 --> 01:03:12,340 And so that becomes a little bit-- 1297 01:03:12,340 --> 01:03:15,870 so we really kind of need to redefine linear independence, 1298 01:03:15,870 --> 01:03:17,990 which we can do. 1299 01:03:17,990 --> 01:03:20,630 But there's a lot of the proofs of linear independence 1300 01:03:20,630 --> 01:03:22,970 and dependence that rely on the existence of inverses. 1301 01:03:22,970 --> 01:03:27,090 And you probably could circle your way around that. 1302 01:03:27,090 --> 01:03:27,920 AUDIENCE: Question. 1303 01:03:27,920 --> 01:03:30,310 Are you just missing the additive inverse 1304 01:03:30,310 --> 01:03:32,770 or both the additive and the [INAUDIBLE]? 1305 01:03:32,770 --> 01:03:34,030 JEREMY KEPNER: Both. 1306 01:03:34,030 --> 01:03:36,310 Both, yeah. 1307 01:03:36,310 --> 01:03:36,810 Yeah. 1308 01:03:40,080 --> 01:03:42,350 So one of the things is considering 1309 01:03:42,350 --> 01:03:45,620 a linear combination of two associative array vectors. 1310 01:03:45,620 --> 01:03:51,080 Under what conditions do they create a unique result? 1311 01:03:51,080 --> 01:03:53,090 So this really depends on what you choose. 1312 01:03:53,090 --> 01:03:58,020 So for instance, if we have a vector A1 and A2, 1313 01:03:58,020 --> 01:04:01,430 when we multiply it by coefficients A1 and A2, 1314 01:04:01,430 --> 01:04:03,480 and we use these as our plus and our multiply, 1315 01:04:03,480 --> 01:04:08,920 and this is our 0 and 1, where are A1 uniquely determined? 1316 01:04:08,920 --> 01:04:11,740 And so for instance, if I pick, in this case, 1317 01:04:11,740 --> 01:04:15,280 our canonical identity vectors, which is A1 1318 01:04:15,280 --> 01:04:20,650 is equal to infinity and minus infinity here, 1319 01:04:20,650 --> 01:04:25,830 then we find that we can cover the space very nicely. 1320 01:04:25,830 --> 01:04:27,270 If we do it the other, we can't. 1321 01:04:27,270 --> 01:04:30,340 A better way to view that is in the drawing here. 1322 01:04:30,340 --> 01:04:33,930 So here's my whole space of coefficients A1 and A2. 1323 01:04:33,930 --> 01:04:37,780 And we see that A1 and A2 uniquely define a result 1324 01:04:37,780 --> 01:04:39,920 with these basis vectors. 1325 01:04:39,920 --> 01:04:43,360 And A1 and A2 are completely degenerate with these basis 1326 01:04:43,360 --> 01:04:44,110 vectors. 1327 01:04:44,110 --> 01:04:47,630 So depending on the kinds of basis vectors you have, 1328 01:04:47,630 --> 01:04:51,800 you can create unique stuff or not unique stuff. 1329 01:04:51,800 --> 01:04:54,440 You can give yourselves actual values here. 1330 01:04:54,440 --> 01:04:58,740 Basically, if A1 is just equal to this value 1331 01:04:58,740 --> 01:05:00,970 and A2 is equal to this value, this 1332 01:05:00,970 --> 01:05:02,310 shows you what that looks like. 1333 01:05:02,310 --> 01:05:03,987 There's places where it's unique. 1334 01:05:03,987 --> 01:05:05,570 There's places where they're the same. 1335 01:05:05,570 --> 01:05:07,590 And there's places where one is unique, 1336 01:05:07,590 --> 01:05:09,320 but the other is not unique. 1337 01:05:09,320 --> 01:05:15,360 So again, you can construct using associative arrays 1338 01:05:15,360 --> 01:05:18,810 as basis vectors, a very rich set of things. 1339 01:05:18,810 --> 01:05:21,370 And the same thing goes with multivalued vectors. 1340 01:05:21,370 --> 01:05:23,774 Again, different types of spaces here. 1341 01:05:23,774 --> 01:05:25,440 We really need to kind of work this out. 1342 01:05:25,440 --> 01:05:27,090 If anybody's interested, we're very 1343 01:05:27,090 --> 01:05:29,423 interested in having people help us work this stuff out. 1344 01:05:31,740 --> 01:05:35,210 Which of these operations make sense? 1345 01:05:35,210 --> 01:05:37,380 Transpose makes total sense. 1346 01:05:37,380 --> 01:05:40,660 Transposing of associative arrays makes total sense. 1347 01:05:40,660 --> 01:05:44,330 And it's a very efficient operation, by the way, in D4M. 1348 01:05:44,330 --> 01:05:46,360 You can do transposes very efficiently. 1349 01:05:46,360 --> 01:05:50,660 It works out very nicely. 1350 01:05:50,660 --> 01:05:55,250 Special matrices, submatrices, zero matrices, square matrices, 1351 01:05:55,250 --> 01:05:57,780 diagonal matrices, yes. 1352 01:05:57,780 --> 01:06:01,980 Although, diagonal matrices are a little bit tricky. 1353 01:06:01,980 --> 01:06:05,170 Upper and lower triangular, yes, you can kind of do this. 1354 01:06:05,170 --> 01:06:07,060 Skew symmetric, no. 1355 01:06:07,060 --> 01:06:09,270 Hermitian, not really. 1356 01:06:09,270 --> 01:06:11,770 Elementary and row column, sort of. 1357 01:06:11,770 --> 01:06:16,027 Row column equivalence, sort of, under certain conditions. 1358 01:06:16,027 --> 01:06:18,110 These are all things you can do in linear algebra. 1359 01:06:18,110 --> 01:06:20,520 And sometimes you can do them with associative arrays 1360 01:06:20,520 --> 01:06:21,810 and sometimes you can't. 1361 01:06:21,810 --> 01:06:24,420 You have to think about them. 1362 01:06:24,420 --> 01:06:28,650 Matrix multiply is sort of our crown jewel. 1363 01:06:28,650 --> 01:06:31,880 Always conformant-- can multiply any sizes whenever you want. 1364 01:06:35,430 --> 01:06:37,610 There's two ways to think about this. 1365 01:06:37,610 --> 01:06:41,020 You can make your head hurt a little bit when you start 1366 01:06:41,020 --> 01:06:43,860 dealing with the no elements. 1367 01:06:43,860 --> 01:06:46,120 When does the union operator get applied? 1368 01:06:46,120 --> 01:06:49,240 And when the-- so when you do computation, 1369 01:06:49,240 --> 01:06:52,010 there's two ways to formulate a matrix multiply. 1370 01:06:52,010 --> 01:06:54,522 There's the inner product formulation, 1371 01:06:54,522 --> 01:06:56,730 which is typically what people use when they actually 1372 01:06:56,730 --> 01:06:57,430 program it up. 1373 01:06:57,430 --> 01:06:59,340 Because it tends to be more efficient. 1374 01:06:59,340 --> 01:07:01,630 That's basically you take each row and each column, 1375 01:07:01,630 --> 01:07:06,130 you do an inner product, and then do the result. OK. 1376 01:07:06,130 --> 01:07:08,860 Mathematically, from a theory perspective, 1377 01:07:08,860 --> 01:07:10,700 you get yourself in less trouble if you 1378 01:07:10,700 --> 01:07:13,350 think in terms of the outer product formulation, which 1379 01:07:13,350 --> 01:07:16,210 is basically you take each row and vector, 1380 01:07:16,210 --> 01:07:20,010 you do the outer product to form a matrix, 1381 01:07:20,010 --> 01:07:23,040 and then you take all of these, and then combine them 1382 01:07:23,040 --> 01:07:26,570 all together with the operation. 1383 01:07:26,570 --> 01:07:30,720 And that, theoretically, actually keeps you sane here. 1384 01:07:30,720 --> 01:07:34,510 And that's the way to think about it mathematically. 1385 01:07:34,510 --> 01:07:36,497 Variety of matrix multiplies examples, 1386 01:07:36,497 --> 01:07:37,580 I won't go into them here. 1387 01:07:37,580 --> 01:07:40,260 Obviously, they depend heavily on what our collision 1388 01:07:40,260 --> 01:07:42,450 function, g, is here. 1389 01:07:42,450 --> 01:07:44,790 It gives you different values and different behaviors. 1390 01:07:47,950 --> 01:07:56,800 The identity element, maybe a left identity, right identity. 1391 01:07:56,800 --> 01:07:59,830 In some instances, it seems to be OK. 1392 01:07:59,830 --> 01:08:04,540 But the identity [INAUDIBLE] is a little bit tricky. 1393 01:08:04,540 --> 01:08:08,640 Inverses, boy, is it hard to construct inverses when you 1394 01:08:08,640 --> 01:08:12,210 don't have underlying inverses. 1395 01:08:12,210 --> 01:08:14,660 I It's just really tricky. 1396 01:08:14,660 --> 01:08:18,125 And so, probably are not going to get anything 1397 01:08:18,125 --> 01:08:20,920 that looks like an inverse. 1398 01:08:20,920 --> 01:08:24,270 You can do Eigenvectors in certain restrictive cases 1399 01:08:24,270 --> 01:08:25,023 sort of. 1400 01:08:25,023 --> 01:08:27,189 And there are interesting papers written about this. 1401 01:08:27,189 --> 01:08:30,220 But it's only on very-- the row and column keys 1402 01:08:30,220 --> 01:08:34,542 need to be the same and stuff and so, you know. 1403 01:08:34,542 --> 01:08:37,125 One thing I really would like to explore is the pseudoinverse. 1404 01:08:37,125 --> 01:08:41,819 So pseudoinverse A plus satisfies these properties. 1405 01:08:41,819 --> 01:08:43,649 And I actually think that will are probably 1406 01:08:43,649 --> 01:08:46,226 be in pretty good shape for pseudoinverse. 1407 01:08:46,226 --> 01:08:47,600 And the pseudoinverse is what you 1408 01:08:47,600 --> 01:08:49,300 need to solve the least-squares problem. 1409 01:08:49,300 --> 01:08:50,970 And I think solving the least-squares problem is 1410 01:08:50,970 --> 01:08:52,720 actually something we might really be interested in doing 1411 01:08:52,720 --> 01:08:54,020 in some of our problems. 1412 01:08:54,020 --> 01:08:56,210 So I do need to work this out if people would 1413 01:08:56,210 --> 01:08:59,664 like to explore this with me. 1414 01:08:59,664 --> 01:09:02,330 We have a whole set of theorems that we'd like to prove. 1415 01:09:02,330 --> 01:09:04,410 Spanning theorems, linear dependence, identities, 1416 01:09:04,410 --> 01:09:07,600 inverses, determinants, pseudoinverses, Eigenvectors, 1417 01:09:07,600 --> 01:09:11,649 convolutions, for which of these do these apply? 1418 01:09:11,649 --> 01:09:14,309 A lot of good math that could be done here. 1419 01:09:14,309 --> 01:09:16,850 Call to arms for those people who are interested in this type 1420 01:09:16,850 --> 01:09:18,500 thing. 1421 01:09:18,500 --> 01:09:20,565 So just to summarize, you know, the algebra 1422 01:09:20,565 --> 01:09:21,940 of associative arrays provides us 1423 01:09:21,940 --> 01:09:23,590 this mathematical foundation. 1424 01:09:23,590 --> 01:09:25,240 I think I've have tried to show you the core parts that 1425 01:09:25,240 --> 01:09:27,744 are really well solid and expand that to the points here. 1426 01:09:27,744 --> 01:09:29,660 You can see where it is, while we don't really 1427 01:09:29,660 --> 01:09:31,660 know exactly what should be happening here, 1428 01:09:31,660 --> 01:09:37,359 and give you a little bit a logic behind how we do this. 1429 01:09:37,359 --> 01:09:38,870 A small number of assumption really 1430 01:09:38,870 --> 01:09:41,569 yields a rich mathematical environment. 1431 01:09:41,569 --> 01:09:44,459 And so I have a short code example. 1432 01:09:47,286 --> 01:09:49,740 It's not really teaching you anything new. 1433 01:09:49,740 --> 01:09:54,490 It's just to show you that I tested all these properties 1434 01:09:54,490 --> 01:09:59,110 using D4M, which was nice and really in a very kind 1435 01:09:59,110 --> 01:10:00,936 of spreadsheet kind of style. 1436 01:10:00,936 --> 01:10:02,560 And so I'm just going to show you that. 1437 01:10:02,560 --> 01:10:03,893 There's really just one example. 1438 01:10:03,893 --> 01:10:07,590 It takes a few seconds to run. 1439 01:10:07,590 --> 01:10:09,760 And then the assignments-- so these 1440 01:10:09,760 --> 01:10:21,590 are in this part in the directory-- 1441 01:10:21,590 --> 01:10:24,160 And then the assignment, should you so do it-- 1442 01:10:24,160 --> 01:10:25,980 if you didn't do the last assignment, well, 1443 01:10:25,980 --> 01:10:28,800 you're going to need to do that to now do this assignment. 1444 01:10:28,800 --> 01:10:31,570 So basically, for those of you who did the last assignment-- 1445 01:10:31,570 --> 01:10:36,510 array of a drawing and looking at the edges 1446 01:10:36,510 --> 01:10:37,380 and stuff like that. 1447 01:10:37,380 --> 01:10:40,540 Now, I want you think about that associative array. 1448 01:10:40,540 --> 01:10:43,090 And think about which kind of these operations 1449 01:10:43,090 --> 01:10:46,470 would make sense if you were to try and add them or multiply 1450 01:10:46,470 --> 01:10:48,026 them or whatever. 1451 01:10:48,026 --> 01:10:50,550 Just explore that a little bit. 1452 01:10:50,550 --> 01:10:53,710 And just write up, OK, I think these kind of operations 1453 01:10:53,710 --> 01:10:54,640 would make sense here. 1454 01:10:54,640 --> 01:10:58,180 You know, addition would make sense if it's union 1455 01:10:58,180 --> 01:11:02,900 and the collision function is this or something like that. 1456 01:11:02,900 --> 01:11:04,900 You'll have to think about what your values are. 1457 01:11:04,900 --> 01:11:09,210 It could just be your values are just 1, something like that. 1458 01:11:09,210 --> 01:11:12,375 You might be like, oh, my values are 0,1 and I want 1 plus 1 1459 01:11:12,375 --> 01:11:13,190 to equal 0. 1460 01:11:13,190 --> 01:11:16,300 So you might have an "or" or "x or" operation or something 1461 01:11:16,300 --> 01:11:16,800 like that. 1462 01:11:16,800 --> 01:11:20,090 So just sort of think about what your example was. 1463 01:11:20,090 --> 01:11:21,940 And think about these ideas a little bit. 1464 01:11:21,940 --> 01:11:24,410 And just write a few sentences on kind of what that means. 1465 01:11:26,990 --> 01:11:29,690 The last slide here is-- we don't really get this question. 1466 01:11:29,690 --> 01:11:31,730 But if you want to compare what the difference 1467 01:11:31,730 --> 01:11:34,730 between associative rays are and the algebra defined 1468 01:11:34,730 --> 01:11:37,334 by Codd that sort of is the basis of SQL, 1469 01:11:37,334 --> 01:11:39,000 there's this little table that describes 1470 01:11:39,000 --> 01:11:41,066 some of the differences there.