1 00:00:00,000 --> 00:00:00,030 2 00:00:00,030 --> 00:00:02,420 The following content is provided under a Creative 3 00:00:02,420 --> 00:00:03,840 Commons license. 4 00:00:03,840 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue to 5 00:00:06,860 --> 00:00:10,560 offer high quality educational resources for free. 6 00:00:10,560 --> 00:00:13,420 To make a donation, or view additional materials from 7 00:00:13,420 --> 00:00:17,520 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:17,520 --> 00:00:18,770 ocw.mit.edu. 9 00:00:18,770 --> 00:00:21,690 10 00:00:21,690 --> 00:00:25,800 PROFESSOR: Mike Acton today who has done a lot of 11 00:00:25,800 --> 00:00:27,320 programming on cell and also done a lot of game 12 00:00:27,320 --> 00:00:28,480 development. 13 00:00:28,480 --> 00:00:32,270 He came from California just like this today and 14 00:00:32,270 --> 00:00:33,560 [OBSCURED]. 15 00:00:33,560 --> 00:00:34,548 MIKE ACTON: Yeah, it's really cool. 16 00:00:34,548 --> 00:00:36,580 I'll tell you. 17 00:00:36,580 --> 00:00:40,790 PROFESSOR: He's going to talk about what it's really like to 18 00:00:40,790 --> 00:00:44,880 use cell and PS3 and what it's like to program games. 19 00:00:44,880 --> 00:00:47,980 So I think it's going to be a fun lecture. 20 00:00:47,980 --> 00:00:52,130 MIKE ACTON: All right, so anyway, I'm the Engine 21 00:00:52,130 --> 00:00:54,430 Director at Insomniac Games. 22 00:00:54,430 --> 00:00:58,890 I've only recently taken that position, previously I was 23 00:00:58,890 --> 00:01:02,530 working on PS3 technology at Highmoon studios, which is 24 00:01:02,530 --> 00:01:06,100 with vin studios. 25 00:01:06,100 --> 00:01:08,350 And I've worked at Sony. 26 00:01:08,350 --> 00:01:18,550 I've worked for Titus, and Bluesky Studios in San Diego. 27 00:01:18,550 --> 00:01:23,670 And I've been doing game development, 11, 12 years. 28 00:01:23,670 --> 00:01:26,750 Before that I was working in simulation. 29 00:01:26,750 --> 00:01:33,200 So, the PlayStation 3 is a really fun platform. 30 00:01:33,200 --> 00:01:36,900 And I know you guys have been working on cell development. 31 00:01:36,900 --> 00:01:39,220 Working with the PS3 under Linux. 32 00:01:39,220 --> 00:01:41,750 Working as developers for the PS3 is definitely a different 33 00:01:41,750 --> 00:01:44,340 environment from that. 34 00:01:44,340 --> 00:01:46,430 I think I'm going to concentrate more on the 35 00:01:46,430 --> 00:01:50,850 high-level aspects of how you design a game for the cell. 36 00:01:50,850 --> 00:01:54,360 And how the cell would impact the design, and what are the 37 00:01:54,360 --> 00:01:56,960 elements of the game. 38 00:01:56,960 --> 00:02:00,800 Just stuff that you probably haven't had as part of this 39 00:02:00,800 --> 00:02:02,950 course that you might find interesting. 40 00:02:02,950 --> 00:02:05,530 And you can feel free to interrupt me at any time with 41 00:02:05,530 --> 00:02:07,640 questions or whatever you'd like. 42 00:02:07,640 --> 00:02:12,820 43 00:02:12,820 --> 00:02:16,130 So just, I wanted to go over, briefly, some of the different 44 00:02:16,130 --> 00:02:19,370 types of game development and what the trade-offs for each 45 00:02:19,370 --> 00:02:20,700 one of them are. 46 00:02:20,700 --> 00:02:24,380 Casual games, console games, PC games, blah, blah, blah. 47 00:02:24,380 --> 00:02:27,763 Casual games, basically, are the small, simple games that 48 00:02:27,763 --> 00:02:30,485 you would download on the PC, or you would 49 00:02:30,485 --> 00:02:31,750 see on Yahoo or whatever. 50 00:02:31,750 --> 00:02:34,150 And those generally don't have really strict performance 51 00:02:34,150 --> 00:02:34,970 requirements. 52 00:02:34,970 --> 00:02:38,780 Where a console game, we have this particular advantage of 53 00:02:38,780 --> 00:02:41,005 knowing the hardware and the hardware doesn't change for an 54 00:02:41,005 --> 00:02:41,870 entire cycle. 55 00:02:41,870 --> 00:02:44,100 So for five, six years, we have 56 00:02:44,100 --> 00:02:46,320 exactly the same hardware. 57 00:02:46,320 --> 00:02:48,750 And that's definitely an advantage from a performance 58 00:02:48,750 --> 00:02:50,350 point anyway. 59 00:02:50,350 --> 00:02:52,250 In this case, it's PlayStation 3. 60 00:02:52,250 --> 00:02:57,240 61 00:02:57,240 --> 00:02:58,750 As far as develpment priorities, development 62 00:02:58,750 --> 00:03:01,610 priorities for a console game-- and especially a PS3 63 00:03:01,610 --> 00:03:03,810 game-- development would be completely different than you 64 00:03:03,810 --> 00:03:07,480 might find on another kind of application. 65 00:03:07,480 --> 00:03:11,890 We don't really consider the code itself important at all. 66 00:03:11,890 --> 00:03:13,790 The real value is in the programmers. 67 00:03:13,790 --> 00:03:15,950 The real value is in the experience, 68 00:03:15,950 --> 00:03:18,600 and is in those skills. 69 00:03:18,600 --> 00:03:20,690 Code is disposable. 70 00:03:20,690 --> 00:03:24,020 After six years, when we start a new platform we pretty much 71 00:03:24,020 --> 00:03:27,780 have to rewrite it anyway, so there's not much point in 72 00:03:27,780 --> 00:03:31,190 trying to plan for a long life span of code. 73 00:03:31,190 --> 00:03:33,090 Especially when you have optimized code written in 74 00:03:33,090 --> 00:03:34,920 assembly for a particular platform. 75 00:03:34,920 --> 00:03:38,500 76 00:03:38,500 --> 00:03:42,646 And to that end, the data is way more significant to the 77 00:03:42,646 --> 00:03:43,850 performance than the code, anyway. 78 00:03:43,850 --> 00:03:47,500 And the data is specific to a particular game. 79 00:03:47,500 --> 00:03:50,450 Or specific to a particular type of game. 80 00:03:50,450 --> 00:03:56,280 And certainly specific to a studios pipeline. 81 00:03:56,280 --> 00:03:58,670 And it's the design of the data where you really want to 82 00:03:58,670 --> 00:04:00,870 spend your time concentrating, especially for the PS3. 83 00:04:00,870 --> 00:04:03,430 84 00:04:03,430 --> 00:04:05,795 Ease of programming-- whether or not it's easier to do 85 00:04:05,795 --> 00:04:08,150 parallelism is not a major concern at all. 86 00:04:08,150 --> 00:04:09,680 If it's hard, so what? 87 00:04:09,680 --> 00:04:10,680 You do it. 88 00:04:10,680 --> 00:04:12,420 That's it. 89 00:04:12,420 --> 00:04:14,580 Portability, runs on PlayStation 3, doesn't run 90 00:04:14,580 --> 00:04:15,280 anywhere else. 91 00:04:15,280 --> 00:04:17,700 That's a non-concern. 92 00:04:17,700 --> 00:04:19,310 And everything is about performance. 93 00:04:19,310 --> 00:04:20,440 Everything we do. 94 00:04:20,440 --> 00:04:23,330 A vast majority of our code is either hand up from IC, or 95 00:04:23,330 --> 00:04:26,380 assembly, very little high level code. 96 00:04:26,380 --> 00:04:28,220 Some of our gameplay programmers will write C plus 97 00:04:28,220 --> 00:04:33,750 plus for the high level logic, but as a general, most of the 98 00:04:33,750 --> 00:04:36,700 code that's running most the time is definitely optimized. 99 00:04:36,700 --> 00:04:38,950 Yeah? 100 00:04:38,950 --> 00:04:43,060 AUDIENCE: If programming is a non-priority, does that mean 101 00:04:43,060 --> 00:04:45,060 to say that if you're developing more than one 102 00:04:45,060 --> 00:04:47,447 product or game, they don't share any common 103 00:04:47,447 --> 00:04:48,240 infrastructure or need? 104 00:04:48,240 --> 00:04:49,300 MIKE ACTION: No, that's not necessarily true. 105 00:04:49,300 --> 00:04:53,440 If we have games that share similar needs, they can 106 00:04:53,440 --> 00:04:55,140 definitely share similar code. 107 00:04:55,140 --> 00:04:58,850 I mean, the point I'm trying to make is, let's say in order 108 00:04:58,850 --> 00:05:02,640 to make something fast it has to be complicated. 109 00:05:02,640 --> 00:05:05,990 So be it, it's complicated. 110 00:05:05,990 --> 00:05:09,710 Whether or not it's easy to use for another programmer is 111 00:05:09,710 --> 00:05:11,260 not a major concern. 112 00:05:11,260 --> 00:05:14,060 AUDIENCE: So you wish it was easier? 113 00:05:14,060 --> 00:05:14,540 MIKE ACTION: No. 114 00:05:14,540 --> 00:05:16,600 I don't care. 115 00:05:16,600 --> 00:05:18,780 That's my point. 116 00:05:18,780 --> 00:05:20,259 AUDIENCE: Well, it's not as important as performance, but 117 00:05:20,259 --> 00:05:22,360 if someone came to you with a high performance tool, you 118 00:05:22,360 --> 00:05:23,330 would like to use it? 119 00:05:23,330 --> 00:05:24,580 MIKE ACTION: I doubt they could. 120 00:05:24,580 --> 00:05:26,840 121 00:05:26,840 --> 00:05:29,460 The highest performance tool that exists is the brains of 122 00:05:29,460 --> 00:05:31,150 the programmers on our team. 123 00:05:31,150 --> 00:05:33,220 You can not create-- 124 00:05:33,220 --> 00:05:35,040 it's theoretically impossible. 125 00:05:35,040 --> 00:05:39,570 You can not out perform people who are customizing for the 126 00:05:39,570 --> 00:05:42,520 data, for the context for the game. 127 00:05:42,520 --> 00:05:46,730 It is not even remotely theoretically possible. 128 00:05:46,730 --> 00:05:48,620 AUDIENCE: That didn't come out in assembly programming for 129 00:05:48,620 --> 00:05:53,550 general purpose but we'll take this offline? 130 00:05:53,550 --> 00:05:54,790 And there was a day when that was also true for general 131 00:05:54,790 --> 00:05:57,950 preferred cleary at the time, but it's no longer true. 132 00:05:57,950 --> 00:05:58,120 MIKE ACTION: It is absolutely-- 133 00:05:58,120 --> 00:05:59,050 AUDIENCE: So the average person prefers to go on -- 134 00:05:59,050 --> 00:06:00,300 take it offline. 135 00:06:00,300 --> 00:06:01,890 136 00:06:01,890 --> 00:06:02,800 MIKE ACTION: Average person. 137 00:06:02,800 --> 00:06:03,860 We're not the average people. 138 00:06:03,860 --> 00:06:05,720 We're game programmers. 139 00:06:05,720 --> 00:06:06,030 Yeah? 140 00:06:06,030 --> 00:06:08,580 AUDIENCE: So does cost ever become an issue? 141 00:06:08,580 --> 00:06:08,870 I mean-- 142 00:06:08,870 --> 00:06:10,640 MIKE ACTION: Absolutely, cost does become an issue. 143 00:06:10,640 --> 00:06:15,210 At a certain pont, something is so difficult that you 144 00:06:15,210 --> 00:06:18,350 either have to throw up your hands or you 145 00:06:18,350 --> 00:06:19,060 can't finish in time. 146 00:06:19,060 --> 00:06:20,900 AUDIENCE: Do you ever hit that point? 147 00:06:20,900 --> 00:06:22,720 MIKE ACTION: Or you figure out a new way of doing it. 148 00:06:22,720 --> 00:06:24,120 Or do a little bit less. 149 00:06:24,120 --> 00:06:25,350 I mean we do have to prioritize 150 00:06:25,350 --> 00:06:27,060 what you want to do. 151 00:06:27,060 --> 00:06:28,750 At the end of the day you can't do everything you want 152 00:06:28,750 --> 00:06:30,990 to do, and you have another game you need to ship 153 00:06:30,990 --> 00:06:32,520 eventually, anyway. 154 00:06:32,520 --> 00:06:34,550 So, a lot of times you do end up tabling things. 155 00:06:34,550 --> 00:06:38,300 And say, look we can get 50% more performance out of this, 156 00:06:38,300 --> 00:06:40,650 but we're going to have to table that for now and scale 157 00:06:40,650 --> 00:06:42,190 back on the content. 158 00:06:42,190 --> 00:06:44,340 And that's why you have six years of development. 159 00:06:44,340 --> 00:06:47,270 You know, maybe in the next cycle, in the next game, 160 00:06:47,270 --> 00:06:48,520 you'll be able to squeeze out a little bit more. 161 00:06:48,520 --> 00:06:50,370 And the next one you squeeze out a little bit more. 162 00:06:50,370 --> 00:06:52,560 That's sort of this continuous development, and continuous 163 00:06:52,560 --> 00:06:57,740 optimization over the course of a platform. 164 00:06:57,740 --> 00:07:00,750 And sometimes, yeah, I mean occasionally you just say 165 00:07:00,750 --> 00:07:03,720 yeah, we can't do it or whatever, it doesn't work. 166 00:07:03,720 --> 00:07:07,290 I mean, that's part and parcel of development in general. 167 00:07:07,290 --> 00:07:08,900 Some ideas just don't pan out. 168 00:07:08,900 --> 00:07:11,890 169 00:07:11,890 --> 00:07:12,200 But-- 170 00:07:12,200 --> 00:07:14,961 AUDIENCE: Have you ever come into a situation where 171 00:07:14,961 --> 00:07:17,283 programming conflicts just kills a project? 172 00:07:17,283 --> 00:07:20,137 Like Microsoft had had a few times, like they couldn't put 173 00:07:20,137 --> 00:07:20,620 out [OBSCURED]. 174 00:07:20,620 --> 00:07:22,930 Couldn't release for-- 175 00:07:22,930 --> 00:07:24,160 MIKE ACTION: Sure, there's plenty of studios where the 176 00:07:24,160 --> 00:07:26,090 programming complexity has killed the studio, or killed 177 00:07:26,090 --> 00:07:27,180 the project. 178 00:07:27,180 --> 00:07:29,870 But I find it hard to believe-- or it's very 179 00:07:29,870 --> 00:07:32,730 rarely-- because it's complexity that has to do 180 00:07:32,730 --> 00:07:34,860 specifically with optimization. 181 00:07:34,860 --> 00:07:38,160 That complexity usually has to do with unnecessary 182 00:07:38,160 --> 00:07:38,740 complexity. 183 00:07:38,740 --> 00:07:41,410 Complexity that doesn't achieve anything. 184 00:07:41,410 --> 00:07:43,500 Organization for the sake of organization. 185 00:07:43,500 --> 00:07:46,740 So you have these sort of over designed C plus plus 186 00:07:46,740 --> 00:07:51,870 hierarchies just for the sake of over organizing things. 187 00:07:51,870 --> 00:07:53,650 That's what will generally kill a project. 188 00:07:53,650 --> 00:07:57,760 But in performance, the complexity tends to come from 189 00:07:57,760 --> 00:08:00,010 the rule set-- what you need to do to set it up. 190 00:08:00,010 --> 00:08:03,700 But the code tends to be smaller when it's faster. 191 00:08:03,700 --> 00:08:05,810 You tend to be doing one thing and doing one 192 00:08:05,810 --> 00:08:07,390 thing really well. 193 00:08:07,390 --> 00:08:08,980 So it doesn't tend to get out of hand. 194 00:08:08,980 --> 00:08:12,750 I mean, it occasionally happens but, yeah? 195 00:08:12,750 --> 00:08:16,395 AUDIENCE: So in terms of the overall cost, how big is this 196 00:08:16,395 --> 00:08:18,020 programming versus the other aspect of 197 00:08:18,020 --> 00:08:19,040 coming up with the game? 198 00:08:19,040 --> 00:08:23,220 Like the game design, the graphics-- 199 00:08:23,220 --> 00:08:26,520 AUDIENCE: So, for example, do you have-- 200 00:08:26,520 --> 00:08:27,990 MIKE ACTION: OK, development team? 201 00:08:27,990 --> 00:08:30,830 202 00:08:30,830 --> 00:08:31,450 So-- 203 00:08:31,450 --> 00:08:33,612 AUDIENCE: So how many programmers, how many 204 00:08:33,612 --> 00:08:34,020 artists, how many-- 205 00:08:34,020 --> 00:08:37,620 PROFESSOR: Maybe, let's-- so for example, like, now it's 206 00:08:37,620 --> 00:08:39,720 like, what, $20 million to deliver a PS3 game? 207 00:08:39,720 --> 00:08:42,476 MIKE ACTION: Between $10 and $20 million, yeah. 208 00:08:42,476 --> 00:08:44,800 PROFESSOR: So let's develop [OBSCURED] 209 00:08:44,800 --> 00:08:48,630 MIKE ACTION: So artists are by far the most-- 210 00:08:48,630 --> 00:08:51,550 the largest group of developers. 211 00:08:51,550 --> 00:08:53,680 So you have animators and shade artists, and textual 212 00:08:53,680 --> 00:08:55,000 artists, and modelers, and enviromental 213 00:08:55,000 --> 00:08:57,110 artists, and lighters. 214 00:08:57,110 --> 00:09:02,630 And so they'll often outnumber programmers 2:1. 215 00:09:02,630 --> 00:09:05,580 Which is completely different than-- certainly very 216 00:09:05,580 --> 00:09:09,480 different from PlayStation and the gap is much larger than it 217 00:09:09,480 --> 00:09:11,750 was on PlayStation 2. 218 00:09:11,750 --> 00:09:16,660 With programmers you tend to have a fairly even split or 219 00:09:16,660 --> 00:09:18,570 you tend to have a divide between the high level game 220 00:09:18,570 --> 00:09:21,450 play programmers and the low level engine programmers. 221 00:09:21,450 --> 00:09:23,700 And you will tend to have more game play programmers than 222 00:09:23,700 --> 00:09:26,390 engine programmers, although most-- the majority of the CPU 223 00:09:26,390 --> 00:09:29,880 time is spent in the engine code. 224 00:09:29,880 --> 00:09:35,030 And that partially comes down to education and experience. 225 00:09:35,030 --> 00:09:39,470 In order to get high performance code you need to 226 00:09:39,470 --> 00:09:40,690 have that experience. 227 00:09:40,690 --> 00:09:41,970 You need to know how to optimize. 228 00:09:41,970 --> 00:09:43,420 You need to understand the machine. 229 00:09:43,420 --> 00:09:44,840 You need to understand the architecture and you need to 230 00:09:44,840 --> 00:09:46,990 understand the data. 231 00:09:46,990 --> 00:09:50,050 And there's only so many people that can do that on any 232 00:09:50,050 --> 00:09:50,760 particular team. 233 00:09:50,760 --> 00:09:56,401 AUDIENCE: Code size wise, How is the code size divided 234 00:09:56,401 --> 00:09:59,120 between game playing and AI, special effects? 235 00:09:59,120 --> 00:10:01,410 MIKE ACTION: Just like, the amount of code? 236 00:10:01,410 --> 00:10:03,870 AUDIENCE: Yeah, it should be small I guess. 237 00:10:03,870 --> 00:10:05,050 MIKE ACTION: Yeah, I mean, it's hard to say. 238 00:10:05,050 --> 00:10:07,420 I mean, because it depends on how many 239 00:10:07,420 --> 00:10:08,720 features you're using. 240 00:10:08,720 --> 00:10:11,710 And, you know sort of the scope of the engine is how 241 00:10:11,710 --> 00:10:13,780 much is being used for a particular game, especially if 242 00:10:13,780 --> 00:10:18,570 you're targeting multiple games within a studio. 243 00:10:18,570 --> 00:10:20,460 But quite often-- interestingly enough-- the 244 00:10:20,460 --> 00:10:23,210 game play code actually overwhelms the engine code in 245 00:10:23,210 --> 00:10:27,590 terms of size and that is back to basically what I was saying 246 00:10:27,590 --> 00:10:31,200 that the engine code tends to do one thing really well or a 247 00:10:31,200 --> 00:10:32,240 series of things really well. 248 00:10:32,240 --> 00:10:34,790 AUDIENCE: Game play code also C plus plus? 249 00:10:34,790 --> 00:10:36,745 MIKE ACTON: These days it's much more likely that game 250 00:10:36,745 --> 00:10:40,220 play code is C plus plus in the high level and kills 251 00:10:40,220 --> 00:10:45,290 performance and doesn't think about things like cache. 252 00:10:45,290 --> 00:10:47,870 253 00:10:47,870 --> 00:10:52,060 That's actually part of the problem with PlayStation 3 254 00:10:52,060 --> 00:10:52,540 development. 255 00:10:52,540 --> 00:10:55,210 It was part of the challenge that we've had with 256 00:10:55,210 --> 00:10:56,790 PlayStation 3 development. 257 00:10:56,790 --> 00:10:59,970 Is in the past, certainly with PlayStation 2 and definitely 258 00:10:59,970 --> 00:11:03,850 on any previous console, this divide between game play and 259 00:11:03,850 --> 00:11:05,600 engine worked very well. 260 00:11:05,600 --> 00:11:08,120 261 00:11:08,120 --> 00:11:10,400 The game play programmers could just call a function and 262 00:11:10,400 --> 00:11:13,200 it did its fat thing really fast and it came back and they 263 00:11:13,200 --> 00:11:17,550 continue this, but in a serial program on one process that 264 00:11:17,550 --> 00:11:19,390 model works very well. 265 00:11:19,390 --> 00:11:25,270 But now when the high level design can destroy performance 266 00:11:25,270 --> 00:11:27,650 but through the simplest decision, like for example, in 267 00:11:27,650 --> 00:11:33,420 collision detection if the logic assumes that the result 268 00:11:33,420 --> 00:11:36,900 is immediately available there's virtually no way of 269 00:11:36,900 --> 00:11:39,440 making that fast. So the high-level design has to 270 00:11:39,440 --> 00:11:43,010 conform to the hardware. 271 00:11:43,010 --> 00:11:45,660 That's sort of a challenge now, is introducing those 272 00:11:45,660 --> 00:11:48,440 concepts to the high-level programmer who haven't 273 00:11:48,440 --> 00:11:49,690 traditionally had to deal with it. 274 00:11:49,690 --> 00:11:52,050 275 00:11:52,050 --> 00:11:56,430 Does that answer that question as far as the split? 276 00:11:56,430 --> 00:11:58,610 AUDIENCE: You said 2:1, right? 277 00:11:58,610 --> 00:12:02,830 MIKE ACTON: Approximately 2:1, artist to programmers. 278 00:12:02,830 --> 00:12:07,390 It varies studio to studio and team to team, so it's hard to 279 00:12:07,390 --> 00:12:08,660 say in the industry as a whole. 280 00:12:08,660 --> 00:12:16,900 281 00:12:16,900 --> 00:12:20,230 So back basically to the point of the code 282 00:12:20,230 --> 00:12:21,610 isn't really important. 283 00:12:21,610 --> 00:12:26,250 The code itself doesn't have a lot of value. 284 00:12:26,250 --> 00:12:27,960 There are fundamental things that affect how you would 285 00:12:27,960 --> 00:12:29,380 design it in the first place. 286 00:12:29,380 --> 00:12:32,260 The type of game, the kind of engine that would run a racing 287 00:12:32,260 --> 00:12:34,320 game is completely different than the kind of engine that 288 00:12:34,320 --> 00:12:36,430 would run a first person shooter. 289 00:12:36,430 --> 00:12:39,240 The needs are different, the optimizations are totally 290 00:12:39,240 --> 00:12:42,670 different, the data is totally different, so you wouldn't try 291 00:12:42,670 --> 00:12:44,940 to reuse code from one to the other. 292 00:12:44,940 --> 00:12:47,350 It just either wouldn't work or would work 293 00:12:47,350 --> 00:12:49,020 really, really poorly. 294 00:12:49,020 --> 00:12:50,470 The framerate-- 295 00:12:50,470 --> 00:12:52,840 having a target of 30 frames per second is a much different 296 00:12:52,840 --> 00:12:55,450 problem than having a target of 60 frames per second. 297 00:12:55,450 --> 00:12:58,710 And in the NCSC territories those are pretty much your 298 00:12:58,710 --> 00:13:02,830 only two choices- 30 frames or 60, which means everything has 299 00:13:02,830 --> 00:13:06,430 to be done in 16 and 2/3 milliseconds. 300 00:13:06,430 --> 00:13:07,510 That's it, that's what you have-- 301 00:13:07,510 --> 00:13:12,220 432 milliseconds. 302 00:13:12,220 --> 00:13:16,830 Of course, back to schedule and cost, how much? 303 00:13:16,830 --> 00:13:19,580 You know, do you have a two year cycle, one year cycle, 304 00:13:19,580 --> 00:13:21,820 how much can you get done? 305 00:13:21,820 --> 00:13:23,200 The kind of hardware. 306 00:13:23,200 --> 00:13:26,620 So taking for example, an engine from PlayStation 2 and 307 00:13:26,620 --> 00:13:29,990 trying to move it to PlayStation 3 is sort of a 308 00:13:29,990 --> 00:13:31,240 lost cause. 309 00:13:31,240 --> 00:13:36,730 310 00:13:36,730 --> 00:13:38,870 The kind of optimizations that you would do, the kind of 311 00:13:38,870 --> 00:13:43,400 parallelization you would do is so completely different, 312 00:13:43,400 --> 00:13:46,750 although there was parallelization in PlayStation 313 00:13:46,750 --> 00:13:49,350 2, the choices would have been completely different. 314 00:13:49,350 --> 00:13:53,960 315 00:13:53,960 --> 00:13:57,510 The loss from trying to port it is much, much greater than 316 00:13:57,510 --> 00:13:59,560 the cost of just doing it again. 317 00:13:59,560 --> 00:14:06,290 AUDIENCE: [OBSCURED] 318 00:14:06,290 --> 00:14:07,970 MIKE ACTON: I don't know that there's an average. 319 00:14:07,970 --> 00:14:11,970 I mean, if you wanted to just like homogenize the industry, 320 00:14:11,970 --> 00:14:17,390 it's probably 18 months. 321 00:14:17,390 --> 00:14:20,450 The compiler actually makes a huge, significant difference 322 00:14:20,450 --> 00:14:25,040 in how you design your code. 323 00:14:25,040 --> 00:14:27,690 If you're working with GCC and you have programmers who have 324 00:14:27,690 --> 00:14:31,900 been working with GCC for 15 years abd who understand the 325 00:14:31,900 --> 00:14:37,020 intricacies and issues involved in GCC, the kind of 326 00:14:37,020 --> 00:14:38,660 code you would write would be completely different than if 327 00:14:38,660 --> 00:14:42,440 you were using XLC for example, on the cell. 328 00:14:42,440 --> 00:14:46,670 329 00:14:46,670 --> 00:14:48,130 There are studios-- 330 00:14:48,130 --> 00:14:50,240 Insomniac doesn't, but there are other studios who do cross 331 00:14:50,240 --> 00:14:50,940 platform design. 332 00:14:50,940 --> 00:14:55,480 So for example, write Playstation 3 games and Xbox 333 00:14:55,480 --> 00:14:59,260 360 games and/or PC titles. 334 00:14:59,260 --> 00:15:03,460 At the moment, probably the easiest approach for that is 335 00:15:03,460 --> 00:15:07,920 to target the PlayStation 3. 336 00:15:07,920 --> 00:15:11,660 So you have these sort of SPU friendly chunks of processing 337 00:15:11,660 --> 00:15:15,540 SPU chunks, friendly chunks of data and move those onto 338 00:15:15,540 --> 00:15:18,910 homogenous parallel processors. 339 00:15:18,910 --> 00:15:21,300 It's not the perfect solution, but virtually all cross 340 00:15:21,300 --> 00:15:23,150 platform titles are not looking for the perfect 341 00:15:23,150 --> 00:15:26,020 solution anyway because they cannot fully optimize for any 342 00:15:26,020 --> 00:15:27,270 particular platform. 343 00:15:27,270 --> 00:15:32,370 344 00:15:32,370 --> 00:15:33,250 I wanted to go through-- 345 00:15:33,250 --> 00:15:38,130 these are a basic list of some of the major modules that a 346 00:15:38,130 --> 00:15:39,610 game is made out of. 347 00:15:39,610 --> 00:15:42,510 348 00:15:42,510 --> 00:15:45,950 I'll go through some of these and explain how designing on 349 00:15:45,950 --> 00:15:47,680 the cell impacts the system. 350 00:15:47,680 --> 00:15:52,260 I'm not going to bother reading them. 351 00:15:52,260 --> 00:15:53,700 I assume you all can read. 352 00:15:53,700 --> 00:15:57,950 353 00:15:57,950 --> 00:16:00,620 So yeah, I'm going to go over the major system, a few of the 354 00:16:00,620 --> 00:16:02,450 major systems and then we're going to drive a little bit 355 00:16:02,450 --> 00:16:05,790 into a specific system, in this case an animation system. 356 00:16:05,790 --> 00:16:11,400 And just talk it through, basically you see how each of 357 00:16:11,400 --> 00:16:14,090 these steps are affected by the hardware that we're 358 00:16:14,090 --> 00:16:17,470 running on it. 359 00:16:17,470 --> 00:16:20,060 So just to start with when you're designing a structure, 360 00:16:20,060 --> 00:16:23,430 any structure, anywhere-- 361 00:16:23,430 --> 00:16:30,460 the initial structure is affected by the kind of 362 00:16:30,460 --> 00:16:32,370 hardware that you're running. 363 00:16:32,370 --> 00:16:37,080 And in this particular case on the SPU and there are other 364 00:16:37,080 --> 00:16:40,660 processors where this is equally true, but in this 365 00:16:40,660 --> 00:16:42,790 conventional structure where you say structure class or 366 00:16:42,790 --> 00:16:46,660 whatever and you have domain-constrained structures 367 00:16:46,660 --> 00:16:51,180 are of surprisingly little use. 368 00:16:51,180 --> 00:16:56,395 In general, the data is either compressed or is in a stram or 369 00:16:56,395 --> 00:16:58,500 is in blocks. 370 00:16:58,500 --> 00:17:01,910 It's sort of based on type, which means that there's no 371 00:17:01,910 --> 00:17:06,360 fixed size struct that you could define anyway. 372 00:17:06,360 --> 00:17:09,080 So as a general rule, the structure of the data is 373 00:17:09,080 --> 00:17:11,130 defined within the code as opposed 374 00:17:11,130 --> 00:17:13,860 to in a struct somewhere. 375 00:17:13,860 --> 00:17:17,540 And that's really to get the performance from the data, you 376 00:17:17,540 --> 00:17:20,590 group things of similar type together rather than for 377 00:17:20,590 --> 00:17:24,085 example, on SPU, having flags that say this is of type A and 378 00:17:24,085 --> 00:17:28,710 this is of type B. Any flag implies a branch, which is-- 379 00:17:28,710 --> 00:17:31,320 I'm sure you all know at this point-- is really poor 380 00:17:31,320 --> 00:17:33,330 performing on SPU. 381 00:17:33,330 --> 00:17:37,890 So basically, pull flags out, resort everything and then 382 00:17:37,890 --> 00:17:40,680 move things in streams. And all of these types are going 383 00:17:40,680 --> 00:17:43,570 to be of varying sizes. 384 00:17:43,570 --> 00:17:46,620 In which case there's very little point to define those 385 00:17:46,620 --> 00:17:48,570 structures in the first place because you can't change them. 386 00:17:48,570 --> 00:17:52,550 387 00:17:52,550 --> 00:17:55,000 And the fact that you're accessing data 388 00:17:55,000 --> 00:17:56,310 in quadwords anyway. 389 00:17:56,310 --> 00:17:58,530 You're always either loading and storing in quadwords, not 390 00:17:58,530 --> 00:18:01,940 on scalars, so having scalar fields in a structure is sort 391 00:18:01,940 --> 00:18:04,150 of pointless. 392 00:18:04,150 --> 00:18:06,300 So again, only SPU generally speaking 393 00:18:06,300 --> 00:18:08,220 structures are of much use. 394 00:18:08,220 --> 00:18:14,690 395 00:18:14,690 --> 00:18:17,020 When you go to define structures in general you need 396 00:18:17,020 --> 00:18:24,910 to consider things like the cache, the TLB, how that's 397 00:18:24,910 --> 00:18:27,380 going to affect you're reading out of the structure or 398 00:18:27,380 --> 00:18:29,670 writing to the structure. 399 00:18:29,670 --> 00:18:32,940 More to the point of you cannot just assume that if 400 00:18:32,940 --> 00:18:36,700 you've written some data definition that you can port 401 00:18:36,700 --> 00:18:38,340 it to another platform. 402 00:18:38,340 --> 00:18:40,490 It's very easy to be poorly, a 403 00:18:40,490 --> 00:18:43,300 performing platform to platform. 404 00:18:43,300 --> 00:18:45,980 In this case, when we design structures you have to 405 00:18:45,980 --> 00:18:48,760 consider the fundamental units of the cell. 406 00:18:48,760 --> 00:18:52,960 The cache line is a fundamental unit of the cell. 407 00:18:52,960 --> 00:18:55,560 Basically, you want to define things in terms of 408 00:18:55,560 --> 00:18:58,690 128 bytes of wide. 409 00:18:58,690 --> 00:19:01,830 What can you fit in there because you read one you read 410 00:19:01,830 --> 00:19:06,500 them all, so you want to pack as much as possible into 128 411 00:19:06,500 --> 00:19:11,290 bytes and just deal with that as a fundamental unit. 412 00:19:11,290 --> 00:19:14,270 16 bytes, of course, you're doing load and stores through 413 00:19:14,270 --> 00:19:17,300 quadword load and store. 414 00:19:17,300 --> 00:19:20,100 So you don't want to have little scalar bits in there 415 00:19:20,100 --> 00:19:21,080 that you're shuffling around. 416 00:19:21,080 --> 00:19:23,390 Just deal with it as a quadword. 417 00:19:23,390 --> 00:19:26,280 And don't deal with anything smaller than that. 418 00:19:26,280 --> 00:19:29,190 So basically the minimum working sizes, in practice, 419 00:19:29,190 --> 00:19:33,460 would be 4 by 128 bits wide and you can split that up 420 00:19:33,460 --> 00:19:36,030 regularly however you want. 421 00:19:36,030 --> 00:19:41,140 So to that point I think-- 422 00:19:41,140 --> 00:19:43,470 here's an example-- 423 00:19:43,470 --> 00:19:45,070 I want to talk about a vector class. 424 00:19:45,070 --> 00:19:49,400 Vector class js usually the first thing a programmer will 425 00:19:49,400 --> 00:19:54,050 jump onto when they might want to make something for games. 426 00:19:54,050 --> 00:19:57,670 But in real life, it's probably the most useless 427 00:19:57,670 --> 00:19:58,920 thing you could ever write. 428 00:19:58,920 --> 00:20:01,420 429 00:20:01,420 --> 00:20:04,150 It doesn't actually do anything. 430 00:20:04,150 --> 00:20:07,170 We have these, we know the instruction set, it's already 431 00:20:07,170 --> 00:20:07,950 in quadwords. 432 00:20:07,950 --> 00:20:09,990 We know the loads and stores, we've already designed your 433 00:20:09,990 --> 00:20:12,130 data so it fits properly. 434 00:20:12,130 --> 00:20:14,810 This doesn't give us anything. 435 00:20:14,810 --> 00:20:16,830 And it potentially makes things worse. 436 00:20:16,830 --> 00:20:20,130 437 00:20:20,130 --> 00:20:23,360 Allowing component access to a quadword, especially on the 438 00:20:23,360 --> 00:20:28,250 PPU is ridiculously bad. 439 00:20:28,250 --> 00:20:31,600 In practice, if you allow component access, high-level 440 00:20:31,600 --> 00:20:34,480 programs will use component access. 441 00:20:34,480 --> 00:20:37,160 So if you have a vector class that says get x, get y, 442 00:20:37,160 --> 00:20:40,660 whatever, somebody somewhere is going to use it, which 443 00:20:40,660 --> 00:20:43,330 means the performance of the whole thing just drops and 444 00:20:43,330 --> 00:20:46,390 it's impossible to optimize. 445 00:20:46,390 --> 00:20:48,990 So as a general rule, you pick your fundamental unit. 446 00:20:48,990 --> 00:20:52,930 In this case, the 4 by 128 bit unit that I was talking about 447 00:20:52,930 --> 00:20:56,150 and you don't define anything smaller than that. 448 00:20:56,150 --> 00:20:58,960 Everything is packed into a unit about that size. 449 00:20:58,960 --> 00:21:03,510 And yes, in practice there'll be some wasted space at the 450 00:21:03,510 --> 00:21:07,950 beginning or end of streams of data, groups of data, but it 451 00:21:07,950 --> 00:21:10,420 doesn't make much difference. 452 00:21:10,420 --> 00:21:12,930 You're going to have that wasted space if you are-- 453 00:21:12,930 --> 00:21:14,840 you're going to have much more than that in wasted space if 454 00:21:14,840 --> 00:21:17,870 you're using dynamic memory, for example, which 455 00:21:17,870 --> 00:21:20,090 when I get to it-- 456 00:21:20,090 --> 00:21:21,340 I don't recommend you use either. 457 00:21:21,340 --> 00:21:25,930 458 00:21:25,930 --> 00:21:27,800 So some things to consider when you're doing this sort of 459 00:21:27,800 --> 00:21:32,020 math transformation anyway is, are you going to do floats, 460 00:21:32,020 --> 00:21:33,690 double, fixed point? 461 00:21:33,690 --> 00:21:34,840 I mean, doubles write out. 462 00:21:34,840 --> 00:21:36,240 There's no point. 463 00:21:36,240 --> 00:21:39,640 Regardless of the speed on the SPU of a double, there's no 464 00:21:39,640 --> 00:21:43,170 value in it for games. 465 00:21:43,170 --> 00:21:46,900 We have known data, so if we need to we can renormalize a 466 00:21:46,900 --> 00:21:51,140 group of around a point and get into the range of a 467 00:21:51,140 --> 00:21:52,100 floating point. 468 00:21:52,100 --> 00:21:53,920 It's a nonissue. 469 00:21:53,920 --> 00:21:56,700 So there's no reason to waste the space in a double at all, 470 00:21:56,700 --> 00:21:59,380 unless it was actually faster, which it isn't. 471 00:21:59,380 --> 00:22:00,630 So we don't use it. 472 00:22:00,630 --> 00:22:04,440 473 00:22:04,440 --> 00:22:06,670 Sort of the only real problematic thing with the SPU 474 00:22:06,670 --> 00:22:09,570 floating point is its format and not supporting 475 00:22:09,570 --> 00:22:12,860 denormalized numbers becomes problematic, but again, you 476 00:22:12,860 --> 00:22:17,680 can work around it by renormalizing your numbers 477 00:22:17,680 --> 00:22:20,760 within a known range so that it won't to get to the point 478 00:22:20,760 --> 00:22:23,290 where it needs to denormalize-- 479 00:22:23,290 --> 00:22:25,270 at least for the work that you're actually doing. 480 00:22:25,270 --> 00:22:29,790 481 00:22:29,790 --> 00:22:30,520 Yeah? 482 00:22:30,520 --> 00:22:38,960 AUDIENCE: [OBSCURED] 483 00:22:38,960 --> 00:22:42,210 MIKE ACTON: Every program will write its own vector class. 484 00:22:42,210 --> 00:22:45,290 And I'm saying that that's a useless exercise. 485 00:22:45,290 --> 00:22:46,360 Don't bother doing it. 486 00:22:46,360 --> 00:22:47,610 Don't use anybody else's either. 487 00:22:47,610 --> 00:22:52,730 488 00:22:52,730 --> 00:22:55,780 If you're writing for the cell-- 489 00:22:55,780 --> 00:22:58,360 if you're writing in C you have the SI intrinsics. 490 00:22:58,360 --> 00:23:00,330 They're already in quadwords, you can do everything you want 491 00:23:00,330 --> 00:23:04,210 to do and you're not restricted by this sort of 492 00:23:04,210 --> 00:23:06,110 concept of what a vector is. 493 00:23:06,110 --> 00:23:08,290 If you want to deal with, especially on the SPU where 494 00:23:08,290 --> 00:23:12,210 you can freely deal with them as integers or floats or 495 00:23:12,210 --> 00:23:17,280 whatever seamlessly without cost, there's plenty that you 496 00:23:17,280 --> 00:23:18,860 can do with the floating point number if you 497 00:23:18,860 --> 00:23:20,030 treat it as an integer. 498 00:23:20,030 --> 00:23:23,800 And when on either AltiVec or the SPU where you can do that 499 00:23:23,800 --> 00:23:26,870 without cost there's a huge advantage to 500 00:23:26,870 --> 00:23:28,970 just doing it straight. 501 00:23:28,970 --> 00:23:30,680 AUDIENCE: [OBSCURED] 502 00:23:30,680 --> 00:23:32,080 MIKE ACTON: Well, I'm saying write it in assembly. 503 00:23:32,080 --> 00:23:35,260 504 00:23:35,260 --> 00:23:37,720 But if you have to, use the intrinsics. 505 00:23:37,720 --> 00:23:42,190 But certainly don't write a vector class. 506 00:23:42,190 --> 00:23:44,150 So memory management. 507 00:23:44,150 --> 00:23:46,190 Static allocations always prefer the dynamic. 508 00:23:46,190 --> 00:23:49,390 Basically, general purpose dynamic memory allocation, 509 00:23:49,390 --> 00:23:53,060 malloc free, whatever has just absolutely no place in games. 510 00:23:53,060 --> 00:23:57,540 511 00:23:57,540 --> 00:24:00,440 We don't have enough unknowns for that to be valuable. 512 00:24:00,440 --> 00:24:03,530 We can group our data by specific types. 513 00:24:03,530 --> 00:24:07,000 We know basic ranges of those types. 514 00:24:07,000 --> 00:24:09,620 The vast majority of the data is known in advance, it's 515 00:24:09,620 --> 00:24:11,080 actually burned onto the disk. 516 00:24:11,080 --> 00:24:13,730 We can actually analyze that. 517 00:24:13,730 --> 00:24:15,350 So most of our allocations tend to 518 00:24:15,350 --> 00:24:16,590 calculate it in advance. 519 00:24:16,590 --> 00:24:20,900 So you load the level and oftentimes you just load 520 00:24:20,900 --> 00:24:24,895 memory in off the disc into memory and 521 00:24:24,895 --> 00:24:26,145 then fix up the pointers. 522 00:24:26,145 --> 00:24:29,120 523 00:24:29,120 --> 00:24:33,570 For things that change during the runtime, just simple 524 00:24:33,570 --> 00:24:37,020 hierarchical allocators, block allocators where you have 525 00:24:37,020 --> 00:24:42,620 fixed sizes is always the easiest and best way to go. 526 00:24:42,620 --> 00:24:45,490 These are known types of known sizes. 527 00:24:45,490 --> 00:24:50,150 The key to that is to organize your data so that's actually a 528 00:24:50,150 --> 00:24:51,590 workable solution. 529 00:24:51,590 --> 00:24:54,460 So you don't have these sort of classes or structures that 530 00:24:54,460 --> 00:24:55,600 are dynamically sized. 531 00:24:55,600 --> 00:24:58,480 That you group them in terms of things that are similar. 532 00:24:58,480 --> 00:25:03,790 Physics data here and AI data is separately here in a 533 00:25:03,790 --> 00:25:05,190 separate array. 534 00:25:05,190 --> 00:25:10,580 And that way those sort of chunks of data are similarly 535 00:25:10,580 --> 00:25:12,210 sized and can be block allocated without any 536 00:25:12,210 --> 00:25:13,480 fragmentation issues at all. 537 00:25:13,480 --> 00:25:18,610 538 00:25:18,610 --> 00:25:23,590 Eventually you'll probably want to design an allocator. 539 00:25:23,590 --> 00:25:25,910 Things to consider are the page sizes. 540 00:25:25,910 --> 00:25:28,910 That's critically important, you want to work within a page 541 00:25:28,910 --> 00:25:30,560 as much as you possibly can. 542 00:25:30,560 --> 00:25:33,180 So you want to group things, not necessarily the same 543 00:25:33,180 --> 00:25:36,360 things, but the things that will be read together or 544 00:25:36,360 --> 00:25:38,430 written together within the same page. 545 00:25:38,430 --> 00:25:43,040 So you want to have a concept of the actual page up through 546 00:25:43,040 --> 00:25:44,290 the system. 547 00:25:44,290 --> 00:25:47,130 548 00:25:47,130 --> 00:25:49,180 Probably the most common mistake I see in a block 549 00:25:49,180 --> 00:25:51,510 allocator, so somebody says-- 550 00:25:51,510 --> 00:25:54,000 everybody knows what I mean by block allocator? 551 00:25:54,000 --> 00:25:54,540 Yeah? 552 00:25:54,540 --> 00:25:55,000 OK. 553 00:25:55,000 --> 00:25:57,890 So the most common mistake I see people make is that they 554 00:25:57,890 --> 00:25:59,040 do least recently used. 555 00:25:59,040 --> 00:26:02,780 They just grab the most least recently used block and use 556 00:26:02,780 --> 00:26:06,510 that when summoning a request. That's actually pretty much 557 00:26:06,510 --> 00:26:08,930 the worst thing you can possibly do because that's the 558 00:26:08,930 --> 00:26:11,530 most likely thing to be called. 559 00:26:11,530 --> 00:26:13,850 That's the most likely thing to be out of cache, both out 560 00:26:13,850 --> 00:26:15,630 of L1 and L2. 561 00:26:15,630 --> 00:26:19,340 Just the easiest thing you can do to change that is just use 562 00:26:19,340 --> 00:26:20,310 most recently used. 563 00:26:20,310 --> 00:26:21,170 Just go up the other way. 564 00:26:21,170 --> 00:26:24,280 I mean, there are much more complicated systems you can 565 00:26:24,280 --> 00:26:28,120 use, but just that one small change where you're much more 566 00:26:28,120 --> 00:26:33,010 likely to get warm data is going to give you a big boost. 567 00:26:33,010 --> 00:26:35,220 And again, like I said, use hierarchies of allocations 568 00:26:35,220 --> 00:26:38,990 instead of these sort of static block allocations. 569 00:26:38,990 --> 00:26:42,190 Instead of trying to have one general purpose super mega 570 00:26:42,190 --> 00:26:46,230 allocator that does everything. 571 00:26:46,230 --> 00:26:48,130 And again, if it's well planned, fragmentation is a 572 00:26:48,130 --> 00:26:51,450 non-issue, it's impossible. 573 00:26:51,450 --> 00:26:54,810 Cache line, oh, and probably another important concept to 574 00:26:54,810 --> 00:26:57,150 keep in mind as you're writing your allocator is the transfer 575 00:26:57,150 --> 00:26:59,920 block size of the SPU. 576 00:26:59,920 --> 00:27:06,110 If you have a 16K block and the system is aware of fixing 577 00:27:06,110 --> 00:27:11,080 K blocks then there are plenty of cases where you don't have 578 00:27:11,080 --> 00:27:14,190 to keep track of-- in the system-- the size of things. 579 00:27:14,190 --> 00:27:17,800 It's just how many blocks, how many SPU blocks do you have? 580 00:27:17,800 --> 00:27:20,990 Or what percentage of SPU blocks you have? 581 00:27:20,990 --> 00:27:25,000 And that will help you can sort of compress down your 582 00:27:25,000 --> 00:27:27,570 memory requirements when you're referring to blocks and 583 00:27:27,570 --> 00:27:28,820 memory streams and memory. 584 00:27:28,820 --> 00:27:33,230 585 00:27:33,230 --> 00:27:36,476 AUDIENCE: About the memory management for data here, you 586 00:27:36,476 --> 00:27:40,620 also write overlay managers for code for the user? 587 00:27:40,620 --> 00:27:43,840 MIKE ACTON: Well, it basically amounts to the same thing. 588 00:27:43,840 --> 00:27:46,900 I mean, the code is just data, you just load it in and fix up 589 00:27:46,900 --> 00:27:48,860 the pointers and you're done. 590 00:27:48,860 --> 00:27:52,292 AUDIENCE: I was just wondering whether IBM gives you 591 00:27:52,292 --> 00:27:55,130 embedding -- 592 00:27:55,130 --> 00:27:56,630 MIKE ACTON: We don't usse any of the IBM 593 00:27:56,630 --> 00:27:58,920 systems at all for games. 594 00:27:58,920 --> 00:28:01,880 595 00:28:01,880 --> 00:28:06,100 I know IBM has an overlay manager as part of the SDK. 596 00:28:06,100 --> 00:28:07,450 AUDIENCE: Well, not really. 597 00:28:07,450 --> 00:28:09,400 It's -- 598 00:28:09,400 --> 00:28:11,970 MIKE ACTON: Well, they have some overlay support, right? 599 00:28:11,970 --> 00:28:15,920 That's not something we would ever use. 600 00:28:15,920 --> 00:28:18,170 And in general, I mean, I guess that's probably an 601 00:28:18,170 --> 00:28:19,700 interesting question of how-- 602 00:28:19,700 --> 00:28:20,810 AUDIENCE: So it's all ground up? 603 00:28:20,810 --> 00:28:21,510 MIKE ACTON: What's that? 604 00:28:21,510 --> 00:28:24,080 AUDIENCE: All your development is ground up? 605 00:28:24,080 --> 00:28:25,340 MIKE ACTON: Yeah, for the most part. 606 00:28:25,340 --> 00:28:28,400 607 00:28:28,400 --> 00:28:30,600 For us, that's definitely true. 608 00:28:30,600 --> 00:28:33,610 There are studios that, especially cross platform 609 00:28:33,610 --> 00:28:36,020 studios that will take middleware development and 610 00:28:36,020 --> 00:28:37,270 just sort of use it on a high-level. 611 00:28:37,270 --> 00:28:39,680 612 00:28:39,680 --> 00:28:43,280 But especially when you're starting a first generation 613 00:28:43,280 --> 00:28:47,220 platform game, there's virtually nothing there to use 614 00:28:47,220 --> 00:28:48,690 because the hardware hasn't been around long enough for 615 00:28:48,690 --> 00:28:51,530 anybody else to write anything either. 616 00:28:51,530 --> 00:28:55,000 So if you need it, you write it yourself. 617 00:28:55,000 --> 00:28:57,240 Plus that's just sort of the general theme of game 618 00:28:57,240 --> 00:28:58,850 development. 619 00:28:58,850 --> 00:29:02,700 It's custom to your situation, to your data. 620 00:29:02,700 --> 00:29:04,740 And anything that's general purpose enough to sell as 621 00:29:04,740 --> 00:29:10,350 middleware is probably not going to be fast enough to run 622 00:29:10,350 --> 00:29:11,920 a triple A title. 623 00:29:11,920 --> 00:29:13,920 Not always true, but as a general 624 00:29:13,920 --> 00:29:15,640 rule, it's pretty valid. 625 00:29:15,640 --> 00:29:20,240 626 00:29:20,240 --> 00:29:24,920 OK, so-- wait, so how'd I get here? 627 00:29:24,920 --> 00:29:28,370 All right, this is next. 628 00:29:28,370 --> 00:29:33,100 So here's another example of how the cell 629 00:29:33,100 --> 00:29:34,760 might affect design. 630 00:29:34,760 --> 00:29:37,540 So you're writing a collision detection system. 631 00:29:37,540 --> 00:29:43,460 632 00:29:43,460 --> 00:29:47,650 It's obvious that you cannot or should not expect immediate 633 00:29:47,650 --> 00:29:50,820 results from a collision detection system, otherwise 634 00:29:50,820 --> 00:29:54,030 you're going to be sitting and syncing all the time for one 635 00:29:54,030 --> 00:29:56,670 result and performance just goes out the window, you may 636 00:29:56,670 --> 00:29:58,710 as well just have a serial program. 637 00:29:58,710 --> 00:30:03,090 So you want to group results, you want to group queries and 638 00:30:03,090 --> 00:30:06,410 you want potentially, for those queries to be deferred 639 00:30:06,410 --> 00:30:09,920 so that you can store them, you can just DMA them out and 640 00:30:09,920 --> 00:30:12,150 then whatever process needed then we'll come back and grab 641 00:30:12,150 --> 00:30:14,590 them later. 642 00:30:14,590 --> 00:30:17,475 So conceptually that's the design you want to build into 643 00:30:17,475 --> 00:30:22,590 a collision detection system, which then in turn affects the 644 00:30:22,590 --> 00:30:24,030 high-level design. 645 00:30:24,030 --> 00:30:29,800 So AI, scripts, any game code that might have previously 646 00:30:29,800 --> 00:30:32,040 depended on a result being immediately available, as in 647 00:30:32,040 --> 00:30:36,120 they have characters that shoot rays around the room to 648 00:30:36,120 --> 00:30:38,910 decide what they're going to do next or bullets that are 649 00:30:38,910 --> 00:30:42,730 flying through the air or whatever, can no longer make 650 00:30:42,730 --> 00:30:43,710 that assumption. 651 00:30:43,710 --> 00:30:46,580 So they have to be able to group up their queries and 652 00:30:46,580 --> 00:30:48,450 look them up later and have other work 653 00:30:48,450 --> 00:30:50,070 to do in the meantime. 654 00:30:50,070 --> 00:30:54,470 So this is a perfect example of how you cannot take old 655 00:30:54,470 --> 00:30:57,630 code and move it to the PS3. 656 00:30:57,630 --> 00:31:01,330 Because old code, serial code would have definitely assumed 657 00:31:01,330 --> 00:31:03,040 that the results were immediately available because 658 00:31:03,040 --> 00:31:04,700 honestly, that was the fastest way to do it. 659 00:31:04,700 --> 00:31:09,300 660 00:31:09,300 --> 00:31:13,230 So on a separate issue, we have SPU decomposition for the 661 00:31:13,230 --> 00:31:15,000 geometry look up. 662 00:31:15,000 --> 00:31:18,710 So from a high-level you have your entire scene in the level 663 00:31:18,710 --> 00:31:24,640 of the world or whatever and you have the set of queries in 664 00:31:24,640 --> 00:31:26,480 the case of static-- did I collide with 665 00:31:26,480 --> 00:31:27,710 anything in the world? 666 00:31:27,710 --> 00:31:29,820 Or you have a RAID that, where does this RAID collide with 667 00:31:29,820 --> 00:31:30,880 something in the world? 668 00:31:30,880 --> 00:31:33,640 And so you have this problem of you have this large sort of 669 00:31:33,640 --> 00:31:37,520 memory database in main RAM and you have the smallest 670 00:31:37,520 --> 00:31:40,680 spew, which obviously cannot read in the whole database, 671 00:31:40,680 --> 00:31:42,670 analyze it, and spit out the result. 672 00:31:42,670 --> 00:31:46,410 It has to go back and forth to main RAM in order 673 00:31:46,410 --> 00:31:49,340 to build its result. 674 00:31:49,340 --> 00:31:52,270 So the question is how do you decompose the memory in the 675 00:31:52,270 --> 00:31:54,700 first place to make that at least 676 00:31:54,700 --> 00:31:58,160 somewhat reasonably efficient? 677 00:31:58,160 --> 00:32:04,660 The first sort of instinct I think, based on history is 678 00:32:04,660 --> 00:32:07,960 sort of the traditional scene graph structures like BSP tree 679 00:32:07,960 --> 00:32:09,680 or off tree or something like that. 680 00:32:09,680 --> 00:32:13,720 681 00:32:13,720 --> 00:32:16,310 Particularly, on the SPU because if TLB misses that 682 00:32:16,310 --> 00:32:19,070 becomes really expensive, really quickly when you're 683 00:32:19,070 --> 00:32:21,550 basically hitting random memory on every 684 00:32:21,550 --> 00:32:25,400 single node on the tree. 685 00:32:25,400 --> 00:32:28,090 So what you want to do is you want to make that hierarchy as 686 00:32:28,090 --> 00:32:31,120 flat as you possibly can. 687 00:32:31,120 --> 00:32:35,480 If the leafs have to be bigger that's fine because it turns 688 00:32:35,480 --> 00:32:40,190 out it's much, much cheaper to stream in a bigger group of-- 689 00:32:40,190 --> 00:32:43,280 as much data as you can fit into the SPU and run through 690 00:32:43,280 --> 00:32:46,820 it and make your decisions and spit it back out than it is to 691 00:32:46,820 --> 00:32:49,190 traverse the hierarchy. 692 00:32:49,190 --> 00:32:53,640 So basically, the depth of your hierarchy in your scene 693 00:32:53,640 --> 00:32:55,960 database is completely determined by how much data 694 00:32:55,960 --> 00:32:58,080 you can fit into the SPU by the maximum 695 00:32:58,080 --> 00:33:02,230 size of the leaf node. 696 00:33:02,230 --> 00:33:05,230 The rest of the dep is only because you don't have any 697 00:33:05,230 --> 00:33:06,010 other choice. 698 00:33:06,010 --> 00:33:10,690 You know, And basically the same thing goes with dynamic 699 00:33:10,690 --> 00:33:12,860 geometry as you have geometry moving around in the scene, 700 00:33:12,860 --> 00:33:15,110 characters moving around in the scene-- 701 00:33:15,110 --> 00:33:18,730 they basically need to update themselves into their own 702 00:33:18,730 --> 00:33:22,020 database, into their own leaves and 703 00:33:22,020 --> 00:33:24,180 they'll do this in groups. 704 00:33:24,180 --> 00:33:26,790 And then when you query, you basically want it to query as 705 00:33:26,790 --> 00:33:28,820 many of those as possible, as you can 706 00:33:28,820 --> 00:33:30,920 possibly fit in at once. 707 00:33:30,920 --> 00:33:34,410 So you could have sort of a broad faced collision first, 708 00:33:34,410 --> 00:33:36,910 where you have all of the groups of characters that are 709 00:33:36,910 --> 00:33:39,800 potentially maximum in this leaf, so 710 00:33:39,800 --> 00:33:41,500 bound and box or whatever. 711 00:33:41,500 --> 00:33:45,780 So even though you could in theory, in principle narrow 712 00:33:45,780 --> 00:33:49,230 that down even more, the cost for that, the cost for the 713 00:33:49,230 --> 00:33:53,790 potential memory miss for that is so high that you just want 714 00:33:53,790 --> 00:33:55,760 to do a linear search through as many as you 715 00:33:55,760 --> 00:33:57,510 possibly can on SPU. 716 00:33:57,510 --> 00:33:59,170 Does that make sense? 717 00:33:59,170 --> 00:34:04,220 718 00:34:04,220 --> 00:34:06,610 Procedural graphics-- 719 00:34:06,610 --> 00:34:12,630 so although we have a GPU on the PlayStation 3, it does 720 00:34:12,630 --> 00:34:14,880 turn out that the SPU is a lot better at 721 00:34:14,880 --> 00:34:16,130 doing a lot of things. 722 00:34:16,130 --> 00:34:19,590 723 00:34:19,590 --> 00:34:22,490 Things basically where you create 724 00:34:22,490 --> 00:34:27,630 geometry for RSX to render. 725 00:34:27,630 --> 00:34:31,540 So particle system, dynamic particle systems. Especially 726 00:34:31,540 --> 00:34:34,990 where their systems have to interact with the world in 727 00:34:34,990 --> 00:34:41,000 some way, which will be much more expensive on the GPU. 728 00:34:41,000 --> 00:34:45,200 Sort of a dynamic systems like cloth. 729 00:34:45,200 --> 00:34:48,460 Fonts is actually really interesting because typically 730 00:34:48,460 --> 00:34:51,880 you'll just see bitmap fonts in which 731 00:34:51,880 --> 00:34:53,840 case are just textures. 732 00:34:53,840 --> 00:34:56,930 But if you have a very complex user interface then just the 733 00:34:56,930 --> 00:35:02,800 size of the bitmap becomes extreme and if you compressed 734 00:35:02,800 --> 00:35:04,370 them they look terrible, especially fonts. 735 00:35:04,370 --> 00:35:06,620 Fonts need to look perfect. 736 00:35:06,620 --> 00:35:08,980 So if you do do procedural fonts, for example, two type 737 00:35:08,980 --> 00:35:12,750 fonts, the cost of rendering a font actually gets 738 00:35:12,750 --> 00:35:15,040 significant. 739 00:35:15,040 --> 00:35:18,830 And in this case, the SPU is actually a great use for 740 00:35:18,830 --> 00:35:22,210 rendering a procedural font. 741 00:35:22,210 --> 00:35:24,020 Rendering textures is basically the 742 00:35:24,020 --> 00:35:25,510 same case as font. 743 00:35:25,510 --> 00:35:29,200 Procedural textures like if you do noise-based clouds or 744 00:35:29,200 --> 00:35:30,750 something like that. 745 00:35:30,750 --> 00:35:33,700 And parametric geometry, it's like nurbs or subdivision 746 00:35:33,700 --> 00:35:35,300 services or something like that, is a perfect 747 00:35:35,300 --> 00:35:36,700 case for the SPU. 748 00:35:36,700 --> 00:35:42,260 749 00:35:42,260 --> 00:35:43,754 Is there a question? 750 00:35:43,754 --> 00:35:48,060 751 00:35:48,060 --> 00:35:49,190 Geometry database, OK. 752 00:35:49,190 --> 00:35:53,970 First thing scene graphs are worthless. 753 00:35:53,970 --> 00:35:55,550 Yeah? 754 00:35:55,550 --> 00:35:58,700 AUDIENCE: So of those sort of differnet conceptualized 755 00:35:58,700 --> 00:36:02,936 paths, are you literally swapping code in and out of 756 00:36:02,936 --> 00:36:06,560 the SPUs with the data many times per frame? 757 00:36:06,560 --> 00:36:07,900 Or is it more of a static--- 758 00:36:07,900 --> 00:36:10,520 MIKE ACTON: OK, that's an excellent question. 759 00:36:10,520 --> 00:36:12,660 It totally depends. 760 00:36:12,660 --> 00:36:16,470 I mean, in general through a game or through, at least a 761 00:36:16,470 --> 00:36:20,480 particular area of a game the SPU set up is stable. 762 00:36:20,480 --> 00:36:24,770 So if we decide you're going to have this SPU dedicated to 763 00:36:24,770 --> 00:36:29,060 physics for example, it is very likely that that SPU is 764 00:36:29,060 --> 00:36:31,860 stable and it's going to be dedicated physics, at least 765 00:36:31,860 --> 00:36:34,840 for some period of time through the level or through 766 00:36:34,840 --> 00:36:36,360 the zone or wherever it is. 767 00:36:36,360 --> 00:36:39,920 Sometimes through an entire game. 768 00:36:39,920 --> 00:36:42,790 So there are going to be elements of that where it's 769 00:36:42,790 --> 00:36:45,260 sort of a well balanced problem. 770 00:36:45,260 --> 00:36:47,620 There's basically no way you're going to get waste. 771 00:36:47,620 --> 00:36:51,350 It's always going to be full, it's always going to be busy. 772 00:36:51,350 --> 00:36:54,470 Collision detection and physics are the two things 773 00:36:54,470 --> 00:36:58,910 that you'll never have enough CPU to do. 774 00:36:58,910 --> 00:37:02,300 You can always use more and more CPU. 775 00:37:02,300 --> 00:37:06,790 And basically, the rest can be dynamically scheduled. 776 00:37:06,790 --> 00:37:09,360 And the question of how to schedule it, is actually an 777 00:37:09,360 --> 00:37:12,230 interesting problem. 778 00:37:12,230 --> 00:37:15,810 It's my opinion that sort of looking for the universal 779 00:37:15,810 --> 00:37:18,820 scheduler that solves all problems and magically makes 780 00:37:18,820 --> 00:37:23,180 everything work is a total lost cause. 781 00:37:23,180 --> 00:37:27,880 You have more than enough data to work with and in your game 782 00:37:27,880 --> 00:37:33,480 to decide how to schedule your SPUs basically, manually. 783 00:37:33,480 --> 00:37:35,320 And it's just not that complicated. 784 00:37:35,320 --> 00:37:36,590 We have six SPUs. 785 00:37:36,590 --> 00:37:39,380 How to schedule six SPUs is just not that complicated a 786 00:37:39,380 --> 00:37:41,920 problem, you could write it down on a piece of paper. 787 00:37:41,920 --> 00:37:46,990 788 00:37:46,990 --> 00:37:52,040 OK, so scene graphs are almost always, universally a complete 789 00:37:52,040 --> 00:37:53,620 waste time. 790 00:37:53,620 --> 00:37:57,640 They store way too much data for no apparent reason. 791 00:37:57,640 --> 00:38:00,170 Store your databases independently based on what 792 00:38:00,170 --> 00:38:02,400 you're actually doing with them, optimize your data 793 00:38:02,400 --> 00:38:04,860 separately because you're accessing it separately. 794 00:38:04,860 --> 00:38:09,070 The only thing that should be linking your sort of domain 795 00:38:09,070 --> 00:38:11,910 object is a key that says all right, we'll exist in this 796 00:38:11,910 --> 00:38:15,290 database and the database and this database. 797 00:38:15,290 --> 00:38:18,510 But to have this sort of giant structure that keeps all of 798 00:38:18,510 --> 00:38:24,820 the data for each element in the scene is about the poorest 799 00:38:24,820 --> 00:38:30,580 performing you can imagine for both cache and TLB and SPU 800 00:38:30,580 --> 00:38:32,620 because you can't fit it in individual node on the SPU. 801 00:38:32,620 --> 00:38:38,130 802 00:38:38,130 --> 00:38:39,380 I think I covered that. 803 00:38:39,380 --> 00:38:42,620 804 00:38:42,620 --> 00:38:45,540 Here's an interesting example, so what you want to do is if 805 00:38:45,540 --> 00:38:48,690 you have the table of queries that you have-- bunch of 806 00:38:48,690 --> 00:38:51,910 people over the course of a frame say I want to know if I 807 00:38:51,910 --> 00:38:54,110 collided with something. 808 00:38:54,110 --> 00:38:57,040 And then if you basically make a pre-sort pass on that and 809 00:38:57,040 --> 00:39:01,150 basically, spatially sort these guys together, so let's 810 00:39:01,150 --> 00:39:03,300 say you have however many you can fit in a SPU. 811 00:39:03,300 --> 00:39:06,330 So you have four of these queries together. 812 00:39:06,330 --> 00:39:10,050 Although they might be a little further apart then you 813 00:39:10,050 --> 00:39:13,510 would hope, you could basically create a baling box 814 00:39:13,510 --> 00:39:16,720 through a single query on the database that's the sum of all 815 00:39:16,720 --> 00:39:20,320 of them and then as I said, now you have a linear list 816 00:39:20,320 --> 00:39:21,930 that you can just stream through for all of them. 817 00:39:21,930 --> 00:39:24,585 So even though it's doing more work for any individual one, 818 00:39:24,585 --> 00:39:29,230 the overhead is reduced so significantly that the end 819 00:39:29,230 --> 00:39:31,050 result is that it's significantly faster. 820 00:39:31,050 --> 00:39:34,200 821 00:39:34,200 --> 00:39:37,780 And that's also what I mean by multiple simultaneous lookups. 822 00:39:37,780 --> 00:39:41,530 Basically you want to group queries together, but make 823 00:39:41,530 --> 00:39:44,240 sure that there's some advantage to that. 824 00:39:44,240 --> 00:39:47,170 By spatially pre-sorting them there is an advantage to that 825 00:39:47,170 --> 00:39:50,050 because it's more likely that they will have 826 00:39:50,050 --> 00:39:53,370 overlap in your queries. 827 00:39:53,370 --> 00:39:54,100 So game logic. 828 00:39:54,100 --> 00:39:58,110 Stuff that the cell would affect in game logic. 829 00:39:58,110 --> 00:40:02,570 State machines are a good example. 830 00:40:02,570 --> 00:40:06,330 If you defer your logic lines and defer your results, SPUs 831 00:40:06,330 --> 00:40:10,610 are amazingly perfect for defining state machines. 832 00:40:10,610 --> 00:40:13,130 If you expect your logic lines to be immediately available 833 00:40:13,130 --> 00:40:17,970 across the entire system, SPU is absolutely horrid. 834 00:40:17,970 --> 00:40:20,920 So if you basically write buffers into your state 835 00:40:20,920 --> 00:40:26,760 machines or your logic machines then each SPU can be 836 00:40:26,760 --> 00:40:30,760 cranking on multiple state machines at once where all the 837 00:40:30,760 --> 00:40:34,500 input and all the output lines are assumed to be deferred and 838 00:40:34,500 --> 00:40:36,640 it's just an extremely straightforward process. 839 00:40:36,640 --> 00:40:39,480 840 00:40:39,480 --> 00:40:41,380 Scripting, so scripting things like-- 841 00:40:41,380 --> 00:40:44,020 I don't know, lewis script or C script or 842 00:40:44,020 --> 00:40:47,080 something like that. 843 00:40:47,080 --> 00:40:49,550 I mean, obviously the first thing to look at is the size 844 00:40:49,550 --> 00:40:50,940 of the interpreter. 845 00:40:50,940 --> 00:40:54,420 Will it fit into an SPU to begin with? 846 00:40:54,420 --> 00:41:00,100 Another option to consider is, can it be converted into SPU 847 00:41:00,100 --> 00:41:03,900 code, either offline or dynamically? 848 00:41:03,900 --> 00:41:05,990 Because you'll find that most off the shelf scripting 849 00:41:05,990 --> 00:41:08,190 languages are scalar, 850 00:41:08,190 --> 00:41:11,220 sequential scripting languages. 851 00:41:11,220 --> 00:41:16,720 So all of a P code within the scripting language itself 852 00:41:16,720 --> 00:41:18,300 basically defines scalar access. 853 00:41:18,300 --> 00:41:21,970 So not only are you switching on every byte to every two 854 00:41:21,970 --> 00:41:24,750 bytes or whatever, so it's sort of poorly performing code 855 00:41:24,750 --> 00:41:27,600 from an SPU point of view, but it's also poorly performing 856 00:41:27,600 --> 00:41:29,760 code from a memory point of view. 857 00:41:29,760 --> 00:41:31,810 So I guess the question is whether or not you can 858 00:41:31,810 --> 00:41:35,700 optimize the script itself and turn turn it into SPU code 859 00:41:35,700 --> 00:41:39,380 that you can then dynamically load or come up with a new 860 00:41:39,380 --> 00:41:42,780 script that's just much more friendly for the SPUs. 861 00:41:42,780 --> 00:41:45,410 862 00:41:45,410 --> 00:41:49,880 Another option if you have to use a single, sort of scalar 863 00:41:49,880 --> 00:41:55,550 scripting language like lua or C script or whatever, if you 864 00:41:55,550 --> 00:41:59,120 can run multiple streams simultaneously so that while 865 00:41:59,120 --> 00:42:01,980 you're doing these sort of individual offline memory 866 00:42:01,980 --> 00:42:06,140 lookups and reads and writes to main memory, that once one 867 00:42:06,140 --> 00:42:08,550 blocks you can start moving on another one. 868 00:42:08,550 --> 00:42:12,070 As long as there's no dependencies between these two 869 00:42:12,070 --> 00:42:14,600 scripts we should be able to stream them both 870 00:42:14,600 --> 00:42:17,640 simultaneously. 871 00:42:17,640 --> 00:42:21,620 Motion control actually turns out to be a critical problem 872 00:42:21,620 --> 00:42:23,850 in games in general that's often overlooked. 873 00:42:23,850 --> 00:42:28,300 It's who controls the motion in the game. 874 00:42:28,300 --> 00:42:29,750 Is is the AI? 875 00:42:29,750 --> 00:42:34,590 So is it the controller in the case of the player? 876 00:42:34,590 --> 00:42:36,720 I say, push forward, so the guy moves forward. 877 00:42:36,720 --> 00:42:39,090 Is that really what controls it? 878 00:42:39,090 --> 00:42:40,950 Or is it the physics? 879 00:42:40,950 --> 00:42:43,660 So all the AI does is say, I want to move forward, tells 880 00:42:43,660 --> 00:42:45,360 the physic system I want to move forward and the physics 881 00:42:45,360 --> 00:42:47,000 tries to follow it. 882 00:42:47,000 --> 00:42:48,660 Or is it the animation? 883 00:42:48,660 --> 00:42:50,710 That you have the animators actually put translation in 884 00:42:50,710 --> 00:42:52,530 the animation, so is that translation the thing that's 885 00:42:52,530 --> 00:42:54,607 actually driving the motion and everything else is trying 886 00:42:54,607 --> 00:42:56,170 to follow it? 887 00:42:56,170 --> 00:42:59,830 Turns out to be a surprisingly difficult problem to solve and 888 00:42:59,830 --> 00:43:06,180 every studio ends up with their own solution. 889 00:43:06,180 --> 00:43:08,590 I forget what point I was making on how the cell 890 00:43:08,590 --> 00:43:09,840 affected that decision. 891 00:43:09,840 --> 00:43:12,040 892 00:43:12,040 --> 00:43:13,210 But-- 893 00:43:13,210 --> 00:43:18,780 AUDIENCE: [OBSCURED] 894 00:43:18,780 --> 00:43:21,120 MIKE ACTON: I think the point, probably that I was trying to 895 00:43:21,120 --> 00:43:24,670 make is that because you want everything to be deferred 896 00:43:24,670 --> 00:43:29,700 anyway, then the order does become a clearer sort of 897 00:43:29,700 --> 00:43:32,530 winner in that order. 898 00:43:32,530 --> 00:43:37,780 Where you want the immediate feedback from the controls, 899 00:43:37,780 --> 00:43:39,900 the control leads the way. 900 00:43:39,900 --> 00:43:42,740 You have the physics, which then follows, perhaps, even a 901 00:43:42,740 --> 00:43:46,960 frame behind that to say how that new position is impacted 902 00:43:46,960 --> 00:43:49,320 by the physical reality of the world. 903 00:43:49,320 --> 00:43:52,230 And then potentially a frame behind that or half a frame 904 00:43:52,230 --> 00:43:55,560 behind that you have the animation system, which in 905 00:43:55,560 --> 00:43:58,050 that case would just be basically, a visual 906 00:43:58,050 --> 00:44:00,520 representation of what's going on rather 907 00:44:00,520 --> 00:44:01,540 than leading anything. 908 00:44:01,540 --> 00:44:04,190 It's basically an icon for what's happening in the 909 00:44:04,190 --> 00:44:06,600 physics and the AI. 910 00:44:06,600 --> 00:44:09,390 911 00:44:09,390 --> 00:44:12,680 The limitations of your system are that it has to be deferred 912 00:44:12,680 --> 00:44:14,470 and that it has to be done in groups. 913 00:44:14,470 --> 00:44:17,430 Basically, some of these sort of really difficult decisions 914 00:44:17,430 --> 00:44:19,350 have only one or two obvious answers. 915 00:44:19,350 --> 00:44:24,570 916 00:44:24,570 --> 00:44:26,550 All right, well I wanted to dig into animation a little 917 00:44:26,550 --> 00:44:29,520 bit, so does anybody have any questions on anything? 918 00:44:29,520 --> 00:44:31,300 Any of the sort of the high-level stuff that I've 919 00:44:31,300 --> 00:44:32,570 covered up to this point? 920 00:44:32,570 --> 00:44:35,603 921 00:44:35,603 --> 00:44:36,853 Yeah? 922 00:44:36,853 --> 00:44:38,700 923 00:44:38,700 --> 00:44:43,196 AUDIENCE: So does the need for deferral and breaking into 924 00:44:43,196 --> 00:44:48,230 groups and staging, does this need break the desire for the 925 00:44:48,230 --> 00:44:51,500 higher-level programmers to abstract what's going on at 926 00:44:51,500 --> 00:44:54,176 the data engine level? 927 00:44:54,176 --> 00:44:57,770 Or is that not quite the issue? 928 00:44:57,770 --> 00:45:01,130 MIKE ACTON: Well, let's say we get a game 929 00:45:01,130 --> 00:45:02,500 play programmer, right? 930 00:45:02,500 --> 00:45:05,430 Fresh out of school, he's taught in school, C 931 00:45:05,430 --> 00:45:06,680 plus plus in school. 932 00:45:06,680 --> 00:45:11,090 Taught to decompose the world into sort of the main classes 933 00:45:11,090 --> 00:45:13,460 and that they all communicate through each other maybe 934 00:45:13,460 --> 00:45:15,650 through messaging. 935 00:45:15,650 --> 00:45:17,920 All right, well the first thing we tell him is that all 936 00:45:17,920 --> 00:45:19,710 that is complete crap. 937 00:45:19,710 --> 00:45:22,970 None of that will actually work in practice. 938 00:45:22,970 --> 00:45:26,260 So in some sense, yes, there is a sort of tendency for them 939 00:45:26,260 --> 00:45:31,310 to want this interface, this sort of clean abstraction, but 940 00:45:31,310 --> 00:45:34,450 abstraction doesn't have any value. 941 00:45:34,450 --> 00:45:37,085 It doesn't make the game faster, it doesn't make the 942 00:45:37,085 --> 00:45:38,590 game cheaper. 943 00:45:38,590 --> 00:45:39,370 It doesn't make-- 944 00:45:39,370 --> 00:45:42,540 AUDIENCE: [OBSCURED] 945 00:45:42,540 --> 00:45:44,860 PROFESSOR: There's a bit of a religious. 946 00:45:44,860 --> 00:45:45,780 Let's move on. 947 00:45:45,780 --> 00:45:47,980 He has a lot of other interesting things to say. 948 00:45:47,980 --> 00:45:50,050 And we can get to that question-- 949 00:45:50,050 --> 00:45:51,860 AUDIENCE: It sounds like there's two completely 950 00:45:51,860 --> 00:45:53,790 different communities involved in the development. 951 00:45:53,790 --> 00:45:56,320 There's the engine developers and there's the higher-level-- 952 00:45:56,320 --> 00:45:57,540 MIKE ACTON: That's a fair enough assessment. 953 00:45:57,540 --> 00:45:59,430 There are different communities. 954 00:45:59,430 --> 00:46:02,476 There is a community of the game play programmers and the 955 00:46:02,476 --> 00:46:04,110 community of engine programmers. 956 00:46:04,110 --> 00:46:06,390 And they have different priorities and they have 957 00:46:06,390 --> 00:46:08,760 different experiences. 958 00:46:08,760 --> 00:46:13,190 So yeah, in that way there is a division. 959 00:46:13,190 --> 00:46:14,830 PROFESSOR: I will let you go on. 960 00:46:14,830 --> 00:46:15,890 You said you had a lot of interesting 961 00:46:15,890 --> 00:46:18,160 information to cover. 962 00:46:18,160 --> 00:46:20,210 MIKE ACTON: OK. 963 00:46:20,210 --> 00:46:21,270 AUDIENCE: I can bitch about that. 964 00:46:21,270 --> 00:46:23,950 Don't worry. 965 00:46:23,950 --> 00:46:26,790 MIKE ACTON: So just to get into animation a little bit. 966 00:46:26,790 --> 00:46:31,030 Let's start with trying to build a simple animation 967 00:46:31,030 --> 00:46:33,520 system and see what problems come creep up as we're trying 968 00:46:33,520 --> 00:46:34,820 to implement it on the cell. 969 00:46:34,820 --> 00:46:38,270 970 00:46:38,270 --> 00:46:41,080 So in the simplest case we have a set of animation 971 00:46:41,080 --> 00:46:43,116 channels defined for a character, which 972 00:46:43,116 --> 00:46:44,330 is made up of joints. 973 00:46:44,330 --> 00:46:47,200 We're just talking about sort of a simple hierarchical 974 00:46:47,200 --> 00:46:49,630 transformation here. 975 00:46:49,630 --> 00:46:51,900 And some of those channels are related. 976 00:46:51,900 --> 00:46:56,600 So in the case of rotation plus translation plus scale 977 00:46:56,600 --> 00:47:00,570 equals any individual joint. 978 00:47:00,570 --> 00:47:04,780 So the first thing, typically that you'll have to answer is 979 00:47:04,780 --> 00:47:12,900 whether or not you want to do euler or quaternion rotation. 980 00:47:12,900 --> 00:47:18,640 Now the tendency I guess, especially for new programmers 981 00:47:18,640 --> 00:47:21,370 is to go with quaternion. 982 00:47:21,370 --> 00:47:24,970 They're taught that gimbal lock is a sort of 983 00:47:24,970 --> 00:47:28,860 insurmountable problem that only quaternion solves. 984 00:47:28,860 --> 00:47:30,260 That's just simply not true. 985 00:47:30,260 --> 00:47:31,940 I mean, gimbal lock is completely 986 00:47:31,940 --> 00:47:33,500 manageable in practice. 987 00:47:33,500 --> 00:47:36,230 988 00:47:36,230 --> 00:47:39,830 When you're trying to rotate on three axes and two axes 989 00:47:39,830 --> 00:47:42,940 rotate 90 degrees apart and the third axis can't be 990 00:47:42,940 --> 00:47:44,570 resolved or 180 degrees apart. 991 00:47:44,570 --> 00:47:46,390 So you can't resolve one of the axes, right? 992 00:47:46,390 --> 00:47:48,950 AUDIENCE: [OBSCURED] 993 00:47:48,950 --> 00:47:51,830 MIKE ACTON: Yeah, it's where it's impossible to resolve one 994 00:47:51,830 --> 00:47:53,990 of the axes and that's the nature of 995 00:47:53,990 --> 00:47:56,320 euler sort of rotation. 996 00:47:56,320 --> 00:48:00,670 997 00:48:00,670 --> 00:48:02,820 But sort of a quaternion rotation completely solves 998 00:48:02,820 --> 00:48:06,800 that mathematical problem, it's always resolvable and 999 00:48:06,800 --> 00:48:08,290 it's not very messy at all. 1000 00:48:08,290 --> 00:48:11,550 I mean, from a sort of C programmers perspective, it 1001 00:48:11,550 --> 00:48:14,330 looks clean, the math's clean, everything's clean about it. 1002 00:48:14,330 --> 00:48:18,850 Unfortunately, it doesn't compress very well at all. 1003 00:48:18,850 --> 00:48:23,720 Where if you used euler rotation, which basically just 1004 00:48:23,720 --> 00:48:27,640 means that the individual rotation for every axis. 1005 00:48:27,640 --> 00:48:30,830 So x rotation, y rotation, z rotation. 1006 00:48:30,830 --> 00:48:33,300 That's much, much more compressible because each one 1007 00:48:33,300 --> 00:48:36,410 of those axes can be individually compressed. 1008 00:48:36,410 --> 00:48:38,665 It's very unlikely that you're always rotating all three 1009 00:48:38,665 --> 00:48:42,060 axes, all the time, especially in a human character. 1010 00:48:42,060 --> 00:48:44,596 It's much more likely that only one axis is rotating on 1011 00:48:44,596 --> 00:48:49,690 any one given time and so that makes it-- 1012 00:48:49,690 --> 00:48:52,660 just without any change, without any additional 1013 00:48:52,660 --> 00:48:57,368 compression-- it tends to make it about 1/3 of the size. 1014 00:48:57,368 --> 00:48:59,910 AUDIENCE: [OBSCURED] 1015 00:48:59,910 --> 00:49:03,070 MIKE ACTON: The animation data. 1016 00:49:03,070 --> 00:49:05,810 So you have this frame of animation, which is all these 1017 00:49:05,810 --> 00:49:07,720 animation channels, right? 1018 00:49:07,720 --> 00:49:11,000 And then over time you have these different frames of 1019 00:49:11,000 --> 00:49:12,080 animation, right? 1020 00:49:12,080 --> 00:49:15,100 If you store-- for every joint, for every rotation you 1021 00:49:15,100 --> 00:49:18,190 store a quaternion over time, it's hard to compress across 1022 00:49:18,190 --> 00:49:21,680 time because you're basically, essentially rotating all three 1023 00:49:21,680 --> 00:49:22,970 axes, all the time. 1024 00:49:22,970 --> 00:49:24,900 Well, with-- 1025 00:49:24,900 --> 00:49:25,420 yeah? 1026 00:49:25,420 --> 00:49:26,670 All right. 1027 00:49:26,670 --> 00:49:28,950 1028 00:49:28,950 --> 00:49:32,720 So let's say, of course, the next step is how do we store 1029 00:49:32,720 --> 00:49:35,170 the actual rotation itself? 1030 00:49:35,170 --> 00:49:39,120 Do we store it in cloth, double, half precision, fixed 1031 00:49:39,120 --> 00:49:40,370 point precision? 1032 00:49:40,370 --> 00:49:43,500 1033 00:49:43,500 --> 00:49:45,670 Probably the national tendency at this point would be to 1034 00:49:45,670 --> 00:49:49,030 store it in a floating point number, but if you look at the 1035 00:49:49,030 --> 00:49:51,630 actual range of rotation, which is extremely limited on 1036 00:49:51,630 --> 00:49:55,900 a character, on any particular joint there are very few 1037 00:49:55,900 --> 00:50:00,270 joints that would even rotate 180 degrees. 1038 00:50:00,270 --> 00:50:05,360 So a floating point is overkill, by a 1039 00:50:05,360 --> 00:50:07,810 large margin on rotation-- 1040 00:50:07,810 --> 00:50:11,720 1041 00:50:11,720 --> 00:50:12,440 for the range. 1042 00:50:12,440 --> 00:50:15,940 For the precision, however it's fairly good. 1043 00:50:15,940 --> 00:50:18,160 Especially if you're doing very small rotations over a 1044 00:50:18,160 --> 00:50:21,170 long period of time. 1045 00:50:21,170 --> 00:50:23,980 So probably a more balanced approach would be to go with a 1046 00:50:23,980 --> 00:50:27,280 16 bit floating point from a half format where you keep 1047 00:50:27,280 --> 00:50:29,270 most of the precision, but you reduce the range 1048 00:50:29,270 --> 00:50:31,340 significantly. 1049 00:50:31,340 --> 00:50:33,465 There's also the potential for going with an 8 bit floating 1050 00:50:33,465 --> 00:50:37,840 point format depending on the kind of 1051 00:50:37,840 --> 00:50:39,490 animation that you're doing. 1052 00:50:39,490 --> 00:50:43,190 And I'll probably have this on another slide, but it really 1053 00:50:43,190 --> 00:50:44,710 depends on how close-- 1054 00:50:44,710 --> 00:50:48,940 how compressible a joint is depends on how close to the 1055 00:50:48,940 --> 00:50:49,880 root it is. 1056 00:50:49,880 --> 00:50:52,530 The further away from the root the less it matters. 1057 00:50:52,530 --> 00:50:54,950 So the joint at your fingertip, you can compress a 1058 00:50:54,950 --> 00:50:57,017 whole lot more because it doesn't matter as much, it's 1059 00:50:57,017 --> 00:50:58,420 not going to affect anything else. 1060 00:50:58,420 --> 00:51:01,520 Where a joint at the actual root, the smallest change in 1061 00:51:01,520 --> 00:51:04,720 motion will affect the entire system in animation and will 1062 00:51:04,720 --> 00:51:07,150 make it virtually impossible for you to line up animations 1063 00:51:07,150 --> 00:51:09,580 with each other, so that that particular joint needs to be 1064 00:51:09,580 --> 00:51:12,030 nearly perfect. 1065 00:51:12,030 --> 00:51:13,220 And how do you store rotation? 1066 00:51:13,220 --> 00:51:17,150 Do you store them in degrees, radians, or normalized? 1067 00:51:17,150 --> 00:51:19,230 I have seen people store them in degrees. 1068 00:51:19,230 --> 00:51:21,800 I don't understand why you would ever do that. 1069 00:51:21,800 --> 00:51:26,430 It's just adding math to the problem. 1070 00:51:26,430 --> 00:51:31,920 Radians is perfectly fine if you're using off the shelf 1071 00:51:31,920 --> 00:51:35,150 trigonometric functions- tan, sine whatever. 1072 00:51:35,150 --> 00:51:37,080 But typically, if you're going to optimize those functions 1073 00:51:37,080 --> 00:51:39,970 yourself anyway, it's going to be much more effective go with 1074 00:51:39,970 --> 00:51:42,800 a normalized rotational value. 1075 00:51:42,800 --> 00:51:46,860 So basically between zero and 1. 1076 00:51:46,860 --> 00:51:52,970 Makes it a lot easier to do tricks based on the circle. 1077 00:51:52,970 --> 00:51:55,560 Basically you can just take the fractional value and just 1078 00:51:55,560 --> 00:51:58,050 deal with that. 1079 00:51:58,050 --> 00:52:01,150 So normalized rotation is generally the way to go and 1080 00:52:01,150 --> 00:52:07,250 normalizing a half precision is probably the even bet for 1081 00:52:07,250 --> 00:52:08,500 how you would store. 1082 00:52:08,500 --> 00:52:12,890 1083 00:52:12,890 --> 00:52:16,210 So looking at what we need to fit into an SPU if we're going 1084 00:52:16,210 --> 00:52:17,090 to running to an end machine. 1085 00:52:17,090 --> 00:52:17,652 Yeah? 1086 00:52:17,652 --> 00:52:20,472 AUDIENCE: You talked a lot about compressing because of 1087 00:52:20,472 --> 00:52:24,590 the way it's impacting data, what's the key driver of that? 1088 00:52:24,590 --> 00:52:25,940 MIKE ACTON: The SPU has very little space. 1089 00:52:25,940 --> 00:52:28,380 AUDIENCE: OK, so it's just the amount of space. 1090 00:52:28,380 --> 00:52:29,800 MIKE ACTON: Yeah, well OK. 1091 00:52:29,800 --> 00:52:33,450 There's two factors really, in all honesty. 1092 00:52:33,450 --> 00:52:34,820 So starting with the SPU. 1093 00:52:34,820 --> 00:52:37,090 That you have to be able to work through 1094 00:52:37,090 --> 00:52:39,660 this data on the SPU. 1095 00:52:39,660 --> 00:52:42,070 But you also have the DMA transfer itself. 1096 00:52:42,070 --> 00:52:45,360 The SPU can actually calculate really, really fast, right? 1097 00:52:45,360 --> 00:52:47,800 I mean, that's the whole point. 1098 00:52:47,800 --> 00:52:51,740 So if you can transfer less data, burn through it a little 1099 00:52:51,740 --> 00:52:55,960 bit to expand it, it's actually a huge win. 1100 00:52:55,960 --> 00:52:59,310 And on top of that we have a big, big game and only 256 1101 00:52:59,310 --> 00:53:02,160 megs of main ram. 1102 00:53:02,160 --> 00:53:08,150 And the amount of geometry that people require from a 1103 00:53:08,150 --> 00:53:14,070 current generation game or next generation game has 1104 00:53:14,070 --> 00:53:16,470 scaled up way more than the amount of memory we've been 1105 00:53:16,470 --> 00:53:20,270 given, so we've only been given eight times as much 1106 00:53:20,270 --> 00:53:21,480 memory as we had in the previous generation. 1107 00:53:21,480 --> 00:53:24,480 People expect significantly more than eight times as much 1108 00:53:24,480 --> 00:53:29,780 geometry on the screen and where do we store that? 1109 00:53:29,780 --> 00:53:32,145 We have the Blu-Ray, we can't be streaming everything off 1110 00:53:32,145 --> 00:53:36,470 the disc all the time, which is to another point. 1111 00:53:36,470 --> 00:53:41,250 You have 40 gigs of data on your disc, but 1112 00:53:41,250 --> 00:53:43,030 only 256 megs of RAM. 1113 00:53:43,030 --> 00:53:46,620 So there's this sort of a series of compression, 1114 00:53:46,620 --> 00:53:50,600 decompression to keep everything-- 1115 00:53:50,600 --> 00:53:55,640 basically, think of RAM as your L3 cache. 1116 00:53:55,640 --> 00:54:00,830 1117 00:54:00,830 --> 00:54:03,090 So we look at what we want to store on an SPU. 1118 00:54:03,090 --> 00:54:06,140 Basically, the goal of this is we want to get an entire 1119 00:54:06,140 --> 00:54:10,570 animation for a particular skeleton on an SPU so that we 1120 00:54:10,570 --> 00:54:12,710 can transform the skeleton and output the 1121 00:54:12,710 --> 00:54:17,040 resulting joint data. 1122 00:54:17,040 --> 00:54:19,690 So let's look at how big that would have to be. 1123 00:54:19,690 --> 00:54:22,580 So first we start with the basic nine channels per joint. 1124 00:54:22,580 --> 00:54:25,390 That's not assuming and again, you'd probably have additional 1125 00:54:25,390 --> 00:54:29,000 channels, like foot step channels and sound channels 1126 00:54:29,000 --> 00:54:30,660 and other sort of animation channels to help 1127 00:54:30,660 --> 00:54:32,000 actually make a game. 1128 00:54:32,000 --> 00:54:33,810 In this case, we just want to animate the character. 1129 00:54:33,810 --> 00:54:37,550 So we have rotation times 3, translations times 3, and 1130 00:54:37,550 --> 00:54:39,960 scales times 3. 1131 00:54:39,960 --> 00:54:44,590 So the first thing to drop and this will cover, this will 1132 00:54:44,590 --> 00:54:49,370 reduce your data by 70%, is all the uniform channels. 1133 00:54:49,370 --> 00:54:52,130 So any data that doesn't actually change across the 1134 00:54:52,130 --> 00:54:53,380 entire length of the animation. 1135 00:54:53,380 --> 00:54:56,110 It may not be zero, but it could be just one thing, one 1136 00:54:56,110 --> 00:54:57,232 value that doesn't change across 1137 00:54:57,232 --> 00:54:59,270 length of the animation. 1138 00:54:59,270 --> 00:55:01,990 So you pull all the uniform channels out. 1139 00:55:01,990 --> 00:55:04,370 And most things that's going to be scale, for example, most 1140 00:55:04,370 --> 00:55:06,100 joints don't scale. 1141 00:55:06,100 --> 00:55:08,580 Although occasionally they do. 1142 00:55:08,580 --> 00:55:12,870 And translation, in a human our joints don't translate. 1143 00:55:12,870 --> 00:55:17,710 However, when you actually animate a character in order 1144 00:55:17,710 --> 00:55:19,830 to get particular effects, in order to make it look more 1145 00:55:19,830 --> 00:55:24,650 human you do end up needing to translate joints. 1146 00:55:24,650 --> 00:55:31,680 So we can reduce, but in order to that we need to build a 1147 00:55:31,680 --> 00:55:34,190 map, basically, a table of these uniform channels. 1148 00:55:34,190 --> 00:55:36,340 So now we know this table of uniform channels has to be 1149 00:55:36,340 --> 00:55:40,110 stored in the SPU along with now the remaining actual 1150 00:55:40,110 --> 00:55:41,620 animation data. 1151 00:55:41,620 --> 00:55:44,880 Of course, multiplied by the number of joints. 1152 00:55:44,880 --> 00:55:47,780 1153 00:55:47,780 --> 00:55:49,670 So now we have what is essentially 1154 00:55:49,670 --> 00:55:52,520 raw animation data. 1155 00:55:52,520 --> 00:55:55,130 So for the sake of argument, let's say the animation data 1156 00:55:55,130 --> 00:55:58,860 has been baked out by Maya or whatever 1157 00:55:58,860 --> 00:56:01,120 at 30 frames a second. 1158 00:56:01,120 --> 00:56:04,990 We've pulled out the uniform data, so now for the joints 1159 00:56:04,990 --> 00:56:07,420 that do move we have these curves over time of the entire 1160 00:56:07,420 --> 00:56:08,780 length of the animation. 1161 00:56:08,780 --> 00:56:13,620 The problem is if that animation is 10 seconds long, 1162 00:56:13,620 --> 00:56:18,540 it's now way too big to fit in the SPU by a large margin. 1163 00:56:18,540 --> 00:56:22,600 So how do we sort of compress it down so that it 1164 00:56:22,600 --> 00:56:24,410 actually will fit? 1165 00:56:24,410 --> 00:56:27,630 Again, just first of all, the easiest thing to do to start 1166 00:56:27,630 --> 00:56:31,050 with is just do simple curve fitting to get rid of the 1167 00:56:31,050 --> 00:56:32,910 things that don't need to be there that you can easily 1168 00:56:32,910 --> 00:56:35,380 calculate out. 1169 00:56:35,380 --> 00:56:37,310 And again, the closer that you are to the root, the tighter 1170 00:56:37,310 --> 00:56:38,400 that fits need to be. 1171 00:56:38,400 --> 00:56:41,070 Conversely, the further away you are from the root, you can 1172 00:56:41,070 --> 00:56:44,460 loosen up the restrictions a little bit and have a little 1173 00:56:44,460 --> 00:56:46,640 bit looser fit on the curve and compress 1174 00:56:46,640 --> 00:56:47,890 a little bit more. 1175 00:56:47,890 --> 00:56:49,860 1176 00:56:49,860 --> 00:56:52,760 So if you're doing a curve fitting with the simple 1177 00:56:52,760 --> 00:56:57,880 spline, basically you have to store your time values in the 1178 00:56:57,880 --> 00:57:00,290 places that were calculated. 1179 00:57:00,290 --> 00:57:03,130 Part of the problem is now you have sort of these individual 1180 00:57:03,130 --> 00:57:05,970 scalars with time can be randomly spread throughout the 1181 00:57:05,970 --> 00:57:07,050 entire animation. 1182 00:57:07,050 --> 00:57:09,260 So any point where there's basically a knot in the curve, 1183 00:57:09,260 --> 00:57:10,960 there's a time value. 1184 00:57:10,960 --> 00:57:13,460 And none of these knots are going to line up with each 1185 00:57:13,460 --> 00:57:15,930 other in any of these animation channels. 1186 00:57:15,930 --> 00:57:18,350 So in principle, if you wanted to code this you would have to 1187 00:57:18,350 --> 00:57:23,520 basically say, what is time right now and loop through 1188 00:57:23,520 --> 00:57:26,440 each of these scalar values, find out where time is, 1189 00:57:26,440 --> 00:57:28,525 calculate the postition on the curve and then 1190 00:57:28,525 --> 00:57:31,930 spit out the result. 1191 00:57:31,930 --> 00:57:37,030 So one, you still have to have the unlimited length of data 1192 00:57:37,030 --> 00:57:39,100 and two, you're looping through scalar values on the 1193 00:57:39,100 --> 00:57:42,740 SPU, which is really actually, horrible. 1194 00:57:42,740 --> 00:57:45,210 So we want to find a way to solve that problem. 1195 00:57:45,210 --> 00:57:48,470 1196 00:57:48,470 --> 00:57:49,855 Probably the most trivial solution is 1197 00:57:49,855 --> 00:57:51,690 just do spline segemnts. 1198 00:57:51,690 --> 00:57:55,240 You lose some compressibility, but it solves the problem. 1199 00:57:55,240 --> 00:57:59,170 Basically you split up the spline into say, sections of 1200 00:57:59,170 --> 00:58:03,190 16 knots and you just do that. 1201 00:58:03,190 --> 00:58:06,690 And in order to do that you just need a table, you need to 1202 00:58:06,690 --> 00:58:11,170 add a table that says what the range of time are in each of 1203 00:58:11,170 --> 00:58:15,620 those groups of 16 knots for every channel. 1204 00:58:15,620 --> 00:58:17,690 So when you're going to transform the animation, first 1205 00:58:17,690 --> 00:58:20,120 you load this table in, you say, what's my time right now 1206 00:58:20,120 --> 00:58:21,010 at time, t? 1207 00:58:21,010 --> 00:58:24,000 You go and say which blocks, which segments of the spline 1208 00:58:24,000 --> 00:58:26,580 you need to load in for each channel, you load those in. 1209 00:58:26,580 --> 00:58:31,520 So now you have basically one section of the spline, which 1210 00:58:31,520 --> 00:58:35,400 is too big probably for the current t, but it covers what 1211 00:58:35,400 --> 00:58:36,510 t you're actually in. 1212 00:58:36,510 --> 00:58:40,660 So one block of spline for every single channel. 1213 00:58:40,660 --> 00:58:43,270 1214 00:58:43,270 --> 00:58:46,520 So the advantage of this, now that the spline is sorted into 1215 00:58:46,520 --> 00:58:49,420 sections is that rather than having all the spline data 1216 00:58:49,420 --> 00:58:53,680 stored, sort of linearly, you can now reorder the blocks so 1217 00:58:53,680 --> 00:59:00,590 that the spline data from different channels is actually 1218 00:59:00,590 --> 00:59:01,780 tiled next to each other. 1219 00:59:01,780 --> 00:59:03,900 So that when you actually go to do a load it's much more 1220 00:59:03,900 --> 00:59:07,380 likely because you know you're going to be requesting all 1221 00:59:07,380 --> 00:59:11,030 these channel at once and all on the same time, t, you can 1222 00:59:11,030 --> 00:59:15,470 find a more or less, optimal ordering that will allow more 1223 00:59:15,470 --> 00:59:17,630 of these group things to be grouped in the same cache or 1224 00:59:17,630 --> 00:59:19,620 at least the same page. 1225 00:59:19,620 --> 00:59:23,500 1226 00:59:23,500 --> 00:59:26,840 And the advantage of course again, is now the length of 1227 00:59:26,840 --> 00:59:29,830 animation makes absolutely no difference at all. 1228 00:59:29,830 --> 00:59:31,790 The disadvantage is its less compressible because you can 1229 00:59:31,790 --> 00:59:36,920 only basically compress this one section of the curve, but 1230 00:59:36,920 --> 00:59:40,820 a huge advantage is it solves the scalar loop problem. 1231 00:59:40,820 --> 00:59:46,240 So now you can take four of these scalar values all with a 1232 00:59:46,240 --> 00:59:50,560 fixed known number of knots in it and just loop through all 1233 00:59:50,560 --> 00:59:53,070 of the knots. 1234 00:59:53,070 --> 00:59:55,240 In principle you could search through and find a minimum 1235 00:59:55,240 --> 00:59:57,190 number of knots to look through for each one of the 1236 00:59:57,190 --> 00:59:59,330 scalars, but in practice it's much faster just to loop 1237 00:59:59,330 --> 01:00:03,190 through all four simultaneously for all 16 1238 01:00:03,190 --> 01:00:07,150 knots and just throw away the results that are invalid as 1239 01:00:07,150 --> 01:00:08,590 you're going through it. 1240 01:00:08,590 --> 01:00:11,220 That way you can use the SPU instruction set. 1241 01:00:11,220 --> 01:00:14,620 You can load quadwords, store quadwords, and do everything 1242 01:00:14,620 --> 01:00:17,690 in the minimum single loop, which you 1243 01:00:17,690 --> 01:00:18,940 can completely unroll. 1244 01:00:18,940 --> 01:00:23,560 1245 01:00:23,560 --> 01:00:26,785 Does anybody have the time? 1246 01:00:26,785 --> 01:00:29,190 PROFESSOR: It's [OBSCURED] 1247 01:00:29,190 --> 01:00:30,190 MIKE ACTON: So I'm OK. 1248 01:00:30,190 --> 01:00:33,620 PROFESSOR: [OBSCURED] 1249 01:00:33,620 --> 01:00:34,427 MIKE ACTON: Yeah? 1250 01:00:34,427 --> 01:00:34,985 AUDIENCE: In 1251 01:00:34,985 --> 01:00:38,550 context do you make like rendering the animation or it 1252 01:00:38,550 --> 01:00:40,860 seems like there would be a blow to whatever you're doing 1253 01:00:40,860 --> 01:00:42,170 on the SPUs. 1254 01:00:42,170 --> 01:00:44,430 MIKE ACTON: Basically the SPUs are taking this channel 1255 01:00:44,430 --> 01:00:48,360 animation data and baking it out into-- well, in the 1256 01:00:48,360 --> 01:00:50,530 easiest case baking it out into a 4 by 1257 01:00:50,530 --> 01:00:54,090 4 matrix per joint. 1258 01:00:54,090 --> 01:00:56,010 AUDIENCE: So the output time's much bigger 1259 01:00:56,010 --> 01:00:56,810 than the input time? 1260 01:00:56,810 --> 01:00:59,570 I mean, you're compressing the input by animation? 1261 01:00:59,570 --> 01:01:01,620 MIKE ACTON: No, the output size is significant, but it's 1262 01:01:01,620 --> 01:01:03,586 much smaller. 1263 01:01:03,586 --> 01:01:05,700 PROFESSOR: [OBSCURED] 1264 01:01:05,700 --> 01:01:08,160 AUDIENCE: So the animation data it's [OBSCURED] 1265 01:01:08,160 --> 01:01:09,410 [INTERPOSING VOICES] 1266 01:01:09,410 --> 01:01:11,470 1267 01:01:11,470 --> 01:01:12,460 MIKE ACTON: No. 1268 01:01:12,460 --> 01:01:14,700 I was just outputting the joint information. 1269 01:01:14,700 --> 01:01:19,820 PROFESSOR: [OBSCURED] 1270 01:01:19,820 --> 01:01:22,120 MIKE ACTON: Independently we have this skimming problem. 1271 01:01:22,120 --> 01:01:24,250 Independently there's a rendering problem. 1272 01:01:24,250 --> 01:01:26,250 This is just baking animation. 1273 01:01:26,250 --> 01:01:28,050 This is purely animation channel problem. 1274 01:01:28,050 --> 01:01:33,060 PROFESSOR: [OBSCURED] 1275 01:01:33,060 --> 01:01:34,910 MIKE ACTON: OK, I'm just going to skip through this because 1276 01:01:34,910 --> 01:01:36,860 this could take a long time to talk about. 1277 01:01:36,860 --> 01:01:39,560 Basically, what I wanted to say here was let's take the 1278 01:01:39,560 --> 01:01:40,955 next step with animation, let's add 1279 01:01:40,955 --> 01:01:43,050 some dynamic support. 1280 01:01:43,050 --> 01:01:47,080 The easiest thing to do is just create a second uniform 1281 01:01:47,080 --> 01:01:50,380 data table that you then blend with the first 1282 01:01:50,380 --> 01:01:51,510 one and that one. 1283 01:01:51,510 --> 01:01:55,450 In principle, is basically all of the channels and then now a 1284 01:01:55,450 --> 01:01:56,440 game play programmer can go and 1285 01:01:56,440 --> 01:01:57,990 individually set any of those. 1286 01:01:57,990 --> 01:02:00,750 So they can tweak the head or tweak the elbow or whatever. 1287 01:02:00,750 --> 01:02:02,080 And that's definitely compressible because it's very 1288 01:02:02,080 --> 01:02:05,170 unlikely thet're going to be moving all the joints at once. 1289 01:02:05,170 --> 01:02:08,050 You can create a secondary map that says, this is the number 1290 01:02:08,050 --> 01:02:11,450 of joints that are dynamic, this is how they map to the 1291 01:02:11,450 --> 01:02:12,700 uniform values. 1292 01:02:12,700 --> 01:02:15,350 1293 01:02:15,350 --> 01:02:18,897 But then once you add any kind of dynamic support, you have 1294 01:02:18,897 --> 01:02:21,760 now complicated the problem significantly. 1295 01:02:21,760 --> 01:02:25,690 Because now in reality, what you need are constraints. 1296 01:02:25,690 --> 01:02:29,650 You need to be able to have a limit to how high the head can 1297 01:02:29,650 --> 01:02:31,900 move because what's going to happen is although you could 1298 01:02:31,900 --> 01:02:35,850 just say the head could only move so much, if that movement 1299 01:02:35,850 --> 01:02:37,380 is algorithmic, so let's say follow a 1300 01:02:37,380 --> 01:02:39,570 character or whatever-- 1301 01:02:39,570 --> 01:02:41,760 it is going to go outside of reasonable 1302 01:02:41,760 --> 01:02:42,930 constraints really quickly. 1303 01:02:42,930 --> 01:02:48,360 So it's much cleaner and simpler to support that on the 1304 01:02:48,360 --> 01:02:50,250 engine side, so basically define constraints for the 1305 01:02:50,250 --> 01:02:57,250 joints and then let the high-level code point 1306 01:02:57,250 --> 01:02:58,060 wherever they want. 1307 01:02:58,060 --> 01:02:58,500 AUDIENCE: [OBSCURED] 1308 01:02:58,500 --> 01:03:00,330 MIKE ACTON: Yeah. 1309 01:03:00,330 --> 01:03:03,090 Yeah, you can have max change over time so it only can move 1310 01:03:03,090 --> 01:03:08,080 so fast. The max range of motion, the max acceleration 1311 01:03:08,080 --> 01:03:11,770 is actually a much harder problem because it implies 1312 01:03:11,770 --> 01:03:15,780 that you need to store the change over time, which we're 1313 01:03:15,780 --> 01:03:17,360 not actually storing. 1314 01:03:17,360 --> 01:03:20,440 Which would probably blow our memory on the SPU. 1315 01:03:20,440 --> 01:03:25,990 So as far as impacting animation, I would immediately 1316 01:03:25,990 --> 01:03:28,880 throw out max acceleration if an animator were to come to me 1317 01:03:28,880 --> 01:03:32,020 and say, this is a feature that I wanted. 1318 01:03:32,020 --> 01:03:35,060 I would say, it's unlikely because it's unlikely we can 1319 01:03:35,060 --> 01:03:36,880 fit it on the SPU. 1320 01:03:36,880 --> 01:03:41,400 Whereas, on the PC, it might be a different story. 1321 01:03:41,400 --> 01:03:42,800 And blending information, how you blend 1322 01:03:42,800 --> 01:03:44,050 these things together. 1323 01:03:44,050 --> 01:03:52,180 1324 01:03:52,180 --> 01:03:52,700 What's that? 1325 01:03:52,700 --> 01:03:56,040 AUDIENCE: [OBSCURED] 1326 01:03:56,040 --> 01:03:57,290 MIKE ACTON: OK. 1327 01:03:57,290 --> 01:03:59,710 1328 01:03:59,710 --> 01:04:03,840 So as far as mixing, there's plenty of additional problems 1329 01:04:03,840 --> 01:04:05,550 in mixing animation. 1330 01:04:05,550 --> 01:04:08,270 Phase matching, so for example, you have a running 1331 01:04:08,270 --> 01:04:09,800 and a walk. 1332 01:04:09,800 --> 01:04:12,355 Basically all that means is if you were going to blend from a 1333 01:04:12,355 --> 01:04:15,070 run to a walk you kind of want to blend in basically the 1334 01:04:15,070 --> 01:04:17,490 essentially same leg position. 1335 01:04:17,490 --> 01:04:19,330 Because if you just blend from the middle of an animation to 1336 01:04:19,330 --> 01:04:21,760 the beginning of the animation, it's unlikely the 1337 01:04:21,760 --> 01:04:23,820 legs are going to match and for the transition time you're 1338 01:04:23,820 --> 01:04:25,940 going to see the scissoring of the legs. 1339 01:04:25,940 --> 01:04:29,700 Which you see that in plenty of games, but especially in 1340 01:04:29,700 --> 01:04:32,200 next generation, especially as characters look more 1341 01:04:32,200 --> 01:04:37,100 complicated they are expected to act more complicated. 1342 01:04:37,100 --> 01:04:42,390 Transitions handling either programmatic transitions 1343 01:04:42,390 --> 01:04:46,220 between animations, so we have an animation that's standing 1344 01:04:46,220 --> 01:04:49,165 and animation that's crouching and with constraints, move 1345 01:04:49,165 --> 01:04:52,790 them down; or artist driven animated 1346 01:04:52,790 --> 01:04:56,540 transitions and/or both. 1347 01:04:56,540 --> 01:04:57,735 Translation matching is actually 1348 01:04:57,735 --> 01:04:58,820 an interesting problem. 1349 01:04:58,820 --> 01:05:01,290 So you have an animation that's running and you have an 1350 01:05:01,290 --> 01:05:02,180 animation that's walking. 1351 01:05:02,180 --> 01:05:04,170 They both translate obviously, at different speeds, 1352 01:05:04,170 --> 01:05:10,080 nonlinearly and you want to slowly run down into a walk, 1353 01:05:10,080 --> 01:05:13,940 but you have to match these sort of nonlinear translations 1354 01:05:13,940 --> 01:05:16,520 as his feet are stepping onto the ground. 1355 01:05:16,520 --> 01:05:18,690 Turns out to be a really difficult problem to get 1356 01:05:18,690 --> 01:05:22,520 perfectly right, especially if you have eye key on the feet 1357 01:05:22,520 --> 01:05:24,980 where he's walking on the ground or maybe walking uphill 1358 01:05:24,980 --> 01:05:28,010 or downhill and the translation is being affected 1359 01:05:28,010 --> 01:05:30,170 by the world. 1360 01:05:30,170 --> 01:05:33,760 In a lot of cases you'll see people pretty much just ignore 1361 01:05:33,760 --> 01:05:36,970 this problem. 1362 01:05:36,970 --> 01:05:38,950 But it is something to consider going forward and 1363 01:05:38,950 --> 01:05:41,860 this is something that we would consider how to solve, 1364 01:05:41,860 --> 01:05:46,970 regardless of whether or not we could get it in. 1365 01:05:46,970 --> 01:05:51,920 As far as actually rendering the geometry goes, you now 1366 01:05:51,920 --> 01:05:55,920 have your sort of matrices of joints and you have-- 1367 01:05:55,920 --> 01:05:59,510 1368 01:05:59,510 --> 01:06:03,070 let's say you want to send those to the GPU along with 1369 01:06:03,070 --> 01:06:06,350 the geometry to skin and render. 1370 01:06:06,350 --> 01:06:08,230 Now the question is, do you single or double 1371 01:06:08,230 --> 01:06:10,280 buffer those joints? 1372 01:06:10,280 --> 01:06:14,480 Because right now basically, the GPU can be reading these 1373 01:06:14,480 --> 01:06:16,320 joints in parallel to when you're 1374 01:06:16,320 --> 01:06:17,890 actually outputting them. 1375 01:06:17,890 --> 01:06:20,495 So the traditional approach or the easiest approach is just 1376 01:06:20,495 --> 01:06:21,850 to double buffer the joints. 1377 01:06:21,850 --> 01:06:23,880 So just output into a different buffer that the R6 1378 01:06:23,880 --> 01:06:24,650 is reading from. 1379 01:06:24,650 --> 01:06:29,070 It's one frame or half a frame behind, doesn't much matter. 1380 01:06:29,070 --> 01:06:33,080 But it also doubles now the space of your joints. 1381 01:06:33,080 --> 01:06:36,740 One advantage that games have is that a frame is a well 1382 01:06:36,740 --> 01:06:39,900 defined element in the games. 1383 01:06:39,900 --> 01:06:43,810 We know what needs to happen across the course of a frame. 1384 01:06:43,810 --> 01:06:48,735 So these characters need to be rendered, the collisions of 1385 01:06:48,735 --> 01:06:49,690 this background needs to happen, physics 1386 01:06:49,690 --> 01:06:51,480 need to happen here. 1387 01:06:51,480 --> 01:06:56,500 So you can within a frame, set it up so that the update from 1388 01:06:56,500 --> 01:07:03,350 the SPUs and the read from the GPU can never overlap. 1389 01:07:03,350 --> 01:07:07,970 Even without any kind of synchronization or lock, it 1390 01:07:07,970 --> 01:07:10,450 can be a well known fact that it's impossible for these two 1391 01:07:10,450 --> 01:07:12,230 things because there's actually something in the 1392 01:07:12,230 --> 01:07:15,770 middle happening that has its own synchronization primitive. 1393 01:07:15,770 --> 01:07:18,370 1394 01:07:18,370 --> 01:07:21,370 That will allow you to do single buffering of the data. 1395 01:07:21,370 --> 01:07:23,220 But it does require more organization. 1396 01:07:23,220 --> 01:07:26,190 Especially if you're doing it on more than just one case. 1397 01:07:26,190 --> 01:07:28,520 So you have all these things that you want single buffered, 1398 01:07:28,520 --> 01:07:31,360 so you need to organize them within the frames so they're 1399 01:07:31,360 --> 01:07:33,520 never updating and reading at the same time. 1400 01:07:33,520 --> 01:07:38,910 1401 01:07:38,910 --> 01:07:42,850 So I'll make this the last point I'll make. 1402 01:07:42,850 --> 01:07:46,640 Optimization, one of the things that you'll hear, save 1403 01:07:46,640 --> 01:07:47,980 optimization till the end. 1404 01:07:47,980 --> 01:07:50,490 My point here being is if you save optimization till the 1405 01:07:50,490 --> 01:07:52,800 end, you don't know how to do it because you haven't 1406 01:07:52,800 --> 01:07:53,970 actually practiced it. 1407 01:07:53,970 --> 01:07:57,750 If you haven't practiced it you don't know what to do. 1408 01:07:57,750 --> 01:07:59,100 So it will take much longer. 1409 01:07:59,100 --> 01:08:01,090 You should always be optimizing in order to 1410 01:08:01,090 --> 01:08:05,500 understand, when it actually counts, what to do. 1411 01:08:05,500 --> 01:08:08,655 And the fact that real optimization does impact the 1412 01:08:08,655 --> 01:08:10,420 design all the way up. 1413 01:08:10,420 --> 01:08:13,400 Optimization of the hardware impacts how an engine is 1414 01:08:13,400 --> 01:08:17,550 designed to be fast does impact the data, it impacts 1415 01:08:17,550 --> 01:08:21,263 how game play needs to be written, high-level code needs 1416 01:08:21,263 --> 01:08:23,400 to be called. 1417 01:08:23,400 --> 01:08:26,060 So if you save optimization till last, what you're doing 1418 01:08:26,060 --> 01:08:30,310 is completely limiting what you can optimize. 1419 01:08:30,310 --> 01:08:33,010 And the idea that it's the root of all evil certainly 1420 01:08:33,010 --> 01:08:36,580 didn't come from a game developer, I have to say. 1421 01:08:36,580 --> 01:08:37,940 Anyway, that's it. 1422 01:08:37,940 --> 01:08:39,690 I hope that was helpful. 1423 01:08:39,690 --> 01:08:44,800 1424 01:08:44,800 --> 01:08:46,700 PROFESSOR: Any questions? 1425 01:08:46,700 --> 01:08:49,005 I think it's very interesting because there is a lot of 1426 01:08:49,005 --> 01:08:50,255 things you learn at MIT. 1427 01:08:50,255 --> 01:08:57,950 1428 01:08:57,950 --> 01:09:00,530 Forget everything you learned so I think there's a very 1429 01:09:00,530 --> 01:09:03,440 interesting perspective in there and for some of us it's 1430 01:09:03,440 --> 01:09:06,990 kind of hard to even digest a little bit, but Question? 1431 01:09:06,990 --> 01:09:09,540 AUDIENCE: Call of Duty 3 came out on the Xbox and on the 1432 01:09:09,540 --> 01:09:15,180 PS3, is Call of Duty 3 on the PS3 just running on the GPU 1433 01:09:15,180 --> 01:09:16,580 then or is it-- 1434 01:09:16,580 --> 01:09:18,420 MIKE ACTON: No, it's very likely using the SPUs. 1435 01:09:18,420 --> 01:09:19,050 I mean, I don't know. 1436 01:09:19,050 --> 01:09:20,950 I haven't looked at the source code, but I suspect that it's 1437 01:09:20,950 --> 01:09:22,550 using the SPUs. 1438 01:09:22,550 --> 01:09:24,200 How efficiently it's using them is an 1439 01:09:24,200 --> 01:09:26,150 entirely different question. 1440 01:09:26,150 --> 01:09:31,280 But it's easy to take the most trivial things right, say you 1441 01:09:31,280 --> 01:09:37,140 do hot spot analysis on your sequential code and say, OK, 1442 01:09:37,140 --> 01:09:39,120 well I can grab this section of thing and put it on the SPU 1443 01:09:39,120 --> 01:09:41,970 right and just the heaviest hitters and 1444 01:09:41,970 --> 01:09:42,910 put them on the SPU. 1445 01:09:42,910 --> 01:09:44,590 That's pretty easy to do. 1446 01:09:44,590 --> 01:09:47,230 It's taking it to the next level though, and to really 1447 01:09:47,230 --> 01:09:51,820 have sort of the next gen of game-- 1448 01:09:51,820 --> 01:09:54,260 now there's nowhere to go from there. 1449 01:09:54,260 --> 01:09:56,250 There's nowhere to go from that analysis, you've already 1450 01:09:56,250 --> 01:09:58,460 sort of hit the limit of what you can do with that. 1451 01:09:58,460 --> 01:10:00,970 It has to be redesigned. 1452 01:10:00,970 --> 01:10:03,970 So I don't know what they're doing honestly and certainly 1453 01:10:03,970 --> 01:10:05,290 I'm being recorded so-- 1454 01:10:05,290 --> 01:10:08,470 1455 01:10:08,470 --> 01:10:09,940 yeah? 1456 01:10:09,940 --> 01:10:14,390 AUDIENCE: You guys have shipped a game, on the PS3? 1457 01:10:14,390 --> 01:10:16,080 MIKE ACTON: Yeah, it was on action list. 1458 01:10:16,080 --> 01:10:18,480 AUDIENCE: OK, so that was more like the [OBSCURED] 1459 01:10:18,480 --> 01:10:23,890 games and whatever You seem to talk a lot about all these 1460 01:10:23,890 --> 01:10:24,586 things you've had to redo. 1461 01:10:24,586 --> 01:10:26,940 What else is there-- 1462 01:10:26,940 --> 01:10:30,220 games look better as a console was built on, what else is 1463 01:10:30,220 --> 01:10:34,300 there that you guys plan on changing as far as working 1464 01:10:34,300 --> 01:10:36,530 with the cell processor, or do you think 1465 01:10:36,530 --> 01:10:37,950 you've got it all ready? 1466 01:10:37,950 --> 01:10:38,450 MIKE ACTON: Oh, no. 1467 01:10:38,450 --> 01:10:40,110 There's plenty of work. 1468 01:10:40,110 --> 01:10:42,180 There's plenty more to be optimized. 1469 01:10:42,180 --> 01:10:45,330 It's down to cost in scheduling those things. 1470 01:10:45,330 --> 01:10:47,170 I mean, we have a team of people who now really 1471 01:10:47,170 --> 01:10:52,420 understand the platform and whereas a lot of what went 1472 01:10:52,420 --> 01:10:58,120 into previous titles was mixed with learning curve. 1473 01:10:58,120 --> 01:11:01,053 So there's definitely a potential for going back and 1474 01:11:01,053 --> 01:11:03,990 improving things and making things better. 1475 01:11:03,990 --> 01:11:07,060 That's what a cycle of game development is all about. 1476 01:11:07,060 --> 01:11:09,580 I mean, games at the end of the lifetime of PlayStation 3 1477 01:11:09,580 --> 01:11:11,970 will look significantly better than release titles. 1478 01:11:11,970 --> 01:11:13,710 That's the way it always is. 1479 01:11:13,710 --> 01:11:17,120 AUDIENCE: The head of Sony computer and gaming said that 1480 01:11:17,120 --> 01:11:18,970 PS3 pretty soon would be customizable. 1481 01:11:18,970 --> 01:11:20,460 You're be able to get different 1482 01:11:20,460 --> 01:11:21,800 amounts of RAM and whatnot. 1483 01:11:21,800 --> 01:11:24,330 MIKE ACTON: Well, I think in that case he was talking 1484 01:11:24,330 --> 01:11:30,320 specifically about a PS3 based, like Tivo kind of weird 1485 01:11:30,320 --> 01:11:34,318 media thing, which has nothing to do with us. 1486 01:11:34,318 --> 01:11:36,730 AUDIENCE: [OBSCURED] 1487 01:11:36,730 --> 01:11:37,510 MIKE ACTON: We're not stuck. 1488 01:11:37,510 --> 01:11:40,250 That's what we have. I mean, I don't see it as stuck. 1489 01:11:40,250 --> 01:11:41,810 I would much rather have the-- 1490 01:11:41,810 --> 01:11:45,150 I mean, that's what console development is about, really. 1491 01:11:45,150 --> 01:11:48,530 We have a machine, we have a set of limitations of it and 1492 01:11:48,530 --> 01:11:51,080 we can push that machine over the lifetime of the platform. 1493 01:11:51,080 --> 01:11:53,830 If it changes out from under us, it becomes PC development. 1494 01:11:53,830 --> 01:11:54,820 AUDIENCE: Are you allowed to use the 1495 01:11:54,820 --> 01:11:57,380 seven SPUs or are you-- 1496 01:11:57,380 --> 01:11:57,690 [OBSCURED] 1497 01:11:57,690 --> 01:12:09,020 PROFESSOR: [OBSCURED] 1498 01:12:09,020 --> 01:12:13,220 MIKE ACTON: I don't know how much I can answer this just 1499 01:12:13,220 --> 01:12:14,030 from NDA point-of-view. 1500 01:12:14,030 --> 01:12:17,180 But let's say hypothetically, there magically became more 1501 01:12:17,180 --> 01:12:19,640 SPUs on the PS3, right? 1502 01:12:19,640 --> 01:12:20,920 Probably nothing would happen. 1503 01:12:20,920 --> 01:12:24,620 The game has to be optimized for the minimum case, so 1504 01:12:24,620 --> 01:12:25,870 nothing would change. 1505 01:12:25,870 --> 01:12:29,132 1506 01:12:29,132 --> 01:12:31,750 Anything else? 1507 01:12:31,750 --> 01:12:32,640 Yeah? 1508 01:12:32,640 --> 01:12:35,130 AUDIENCE: So what's the development life cycle like 1509 01:12:35,130 --> 01:12:37,010 for the engine part of the game. 1510 01:12:37,010 --> 01:12:42,006 And I don't assume you start by prototyping in higher-level 1511 01:12:42,006 --> 01:12:45,910 mechanisms. Then you'll completely miss the design for 1512 01:12:45,910 --> 01:12:47,500 performance aspects of it. 1513 01:12:47,500 --> 01:12:54,140 How do you build up from empty [OBSCURED] 1514 01:12:54,140 --> 01:12:55,610 MIKE ACTON: No, you don't start with an empty. 1515 01:12:55,610 --> 01:12:57,300 That's the perspective difference. 1516 01:12:57,300 --> 01:12:59,620 You don't start with code, code's not important. 1517 01:12:59,620 --> 01:13:00,700 Start with the data. 1518 01:13:00,700 --> 01:13:02,130 You sit down with an artist and they say, what 1519 01:13:02,130 --> 01:13:04,150 do you want to do? 1520 01:13:04,150 --> 01:13:05,140 And then you look at the data. 1521 01:13:05,140 --> 01:13:06,450 What does that data look like? 1522 01:13:06,450 --> 01:13:07,530 What does this animation data look like? 1523 01:13:07,530 --> 01:13:09,730 PROFESSOR: Data size matters. 1524 01:13:09,730 --> 01:13:12,960 [OBSCURED] 1525 01:13:12,960 --> 01:13:13,080 MIKE ACTON: Right. 1526 01:13:13,080 --> 01:13:15,080 We have to figure out how to make it smaller. 1527 01:13:15,080 --> 01:13:16,640 But it all starts with the data. 1528 01:13:16,640 --> 01:13:19,940 It all starts with that concept of what do we want to 1529 01:13:19,940 --> 01:13:20,330 see on the screen? 1530 01:13:20,330 --> 01:13:21,820 What do we even want to hear on the speakers? 1531 01:13:21,820 --> 01:13:23,580 What kind of effects do we want. 1532 01:13:23,580 --> 01:13:28,430 And actually look at that from the perspective of the content 1533 01:13:28,430 --> 01:13:31,490 creator and what they're generating and what we can do 1534 01:13:31,490 --> 01:13:32,550 with that data. 1535 01:13:32,550 --> 01:13:36,440 Because game development is just this black box between 1536 01:13:36,440 --> 01:13:39,150 the artists and the screen. 1537 01:13:39,150 --> 01:13:41,490 We're providing a transformation engine that 1538 01:13:41,490 --> 01:13:44,570 takes the vision of the designers and the artists and 1539 01:13:44,570 --> 01:13:48,810 just transforming it and spitting it on to screen. 1540 01:13:48,810 --> 01:13:51,340 So where you really need to start is with the 1541 01:13:51,340 --> 01:13:53,380 source of the data. 1542 01:13:53,380 --> 01:13:55,040 AUDIENCE: You've been doing game developoment for 11 years 1543 01:13:55,040 --> 01:13:56,330 now is what you said. 1544 01:13:56,330 --> 01:13:57,660 Have you had a favorite platform 1545 01:13:57,660 --> 01:14:00,050 and a nightmare platform? 1546 01:14:00,050 --> 01:14:01,740 MIKE ACTON: I've been pretty much with the PlayStation 1547 01:14:01,740 --> 01:14:07,920 platform since there was a PlayStation platform and I 1548 01:14:07,920 --> 01:14:09,660 don't know, it's hard to get perspective because you're 1549 01:14:09,660 --> 01:14:12,610 getting it and you always really love plarform you're 1550 01:14:12,610 --> 01:14:15,030 working on. 1551 01:14:15,030 --> 01:14:17,800 So it's hard. 1552 01:14:17,800 --> 01:14:20,630 I mean, it's hard to get perspective. 1553 01:14:20,630 --> 01:14:22,357 In the program where I am today is not the same program 1554 01:14:22,357 --> 01:14:25,640 where I was 10 years ago. 1555 01:14:25,640 --> 01:14:27,745 Personally, right now my favorite plarform is PS3. 1556 01:14:27,745 --> 01:14:30,770 PROFESSOR: So when put it in perspective, there are already 1557 01:14:30,770 --> 01:14:37,190 some platforms that on the first time round, [COUGHING], 1558 01:14:37,190 --> 01:14:39,360 it's like cost of development. 1559 01:14:39,360 --> 01:14:44,990 So one platform that as time goes we have [OBSCURED] 1560 01:14:44,990 --> 01:14:47,380 MIKE ACTON: Well, like with the PS3, some of the things 1561 01:14:47,380 --> 01:14:48,920 that I like about the PS3, which is sort of a different 1562 01:14:48,920 --> 01:14:53,450 question are the fact that the cell is much more public than 1563 01:14:53,450 --> 01:14:55,790 any other platform has ever been. 1564 01:14:55,790 --> 01:14:58,990 With IBM's documentation, with Toshiba's support and Sony 1565 01:14:58,990 --> 01:15:05,200 support, I've never had a platform where I can get up on 1566 01:15:05,200 --> 01:15:09,210 a website and actually talk about it outside of NDA. 1567 01:15:09,210 --> 01:15:10,980 And that for me is an amazing change. 1568 01:15:10,980 --> 01:15:13,710 Where I can go and talk to other people-- exactly like 1569 01:15:13,710 --> 01:15:15,610 this group here- that have used the same exact 1570 01:15:15,610 --> 01:15:17,530 platform I've used. 1571 01:15:17,530 --> 01:15:21,280 That's never been able to happen before. 1572 01:15:21,280 --> 01:15:24,380 Even on PS2, for quite a long part of the lifespan, even 1573 01:15:24,380 --> 01:15:27,370 though there was a Linux eventually on the PS2, 1574 01:15:27,370 --> 01:15:29,210 virtually everything was covered by NDA because there 1575 01:15:29,210 --> 01:15:32,830 was no independent release of information. 1576 01:15:32,830 --> 01:15:38,270 So that's one of the great things about PS3 is the public 1577 01:15:38,270 --> 01:15:39,520 availability of cell. 1578 01:15:39,520 --> 01:15:43,510 1579 01:15:43,510 --> 01:15:46,304 PROFESSOR: Thank you very much for coming all the way from 1580 01:15:46,304 --> 01:15:48,230 California and giving us some insight. 1581 01:15:48,230 --> 01:15:55,023