WEBVTT

00:00:00.050 --> 00:00:02.500
The following content is
provided under a Creative

00:00:02.500 --> 00:00:04.010
Commons license.

00:00:04.010 --> 00:00:06.350
Your support will help
MIT OpenCourseWare

00:00:06.350 --> 00:00:10.720
continue to offer high quality
educational resources for free.

00:00:10.720 --> 00:00:13.330
To make a donation or
view additional materials

00:00:13.330 --> 00:00:17.205
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.205 --> 00:00:17.830
at ocw.mit.edu.

00:00:21.480 --> 00:00:24.388
BRIAN SWEATT: So we're the
real time ray tracing group.

00:00:24.388 --> 00:00:29.020
And as we like to call
ourselves, Blue Steel.

00:00:29.020 --> 00:00:33.440
So one thing you might be
asking yourselves and wondering

00:00:33.440 --> 00:00:35.980
from us is, where did we
get the name Blue Steel?

00:00:35.980 --> 00:00:38.670
What does it have to do with
this thing we call ray tracing?

00:00:38.670 --> 00:00:44.350
And this morning when I was
sitting up and finishing up

00:00:44.350 --> 00:00:47.390
the project, I was asking
myself the same thing.

00:00:47.390 --> 00:00:48.390
We're doing ray tracing.

00:00:48.390 --> 00:00:49.740
Where did we come
up with this name?

00:00:49.740 --> 00:00:51.906
So I'm going to give you a
little background on what

00:00:51.906 --> 00:00:52.580
Blue Steel is.

00:00:57.890 --> 00:01:01.610
The name came from the first
day we got access to our PS3,

00:01:01.610 --> 00:01:03.360
we went to the student
center like a bunch

00:01:03.360 --> 00:01:04.920
of excited kids on Christmas.

00:01:04.920 --> 00:01:06.060
And what did we do?

00:01:06.060 --> 00:01:10.010
We created a mailing
list for our group.

00:01:10.010 --> 00:01:11.260
Didn't even mess with the PS3.

00:01:11.260 --> 00:01:13.890
Just created the mailing list
and sat there for about an hour

00:01:13.890 --> 00:01:15.520
and a half, thinking of a name.

00:01:15.520 --> 00:01:19.140
And somehow, we decided on
Zoolander's famous face,

00:01:19.140 --> 00:01:20.350
Blue Steel.

00:01:20.350 --> 00:01:24.740
And it didn't have particular
significance to us before,

00:01:24.740 --> 00:01:28.730
but as you'll see, it has
a lot of historical value.

00:01:32.080 --> 00:01:37.060
Blue Steel is also an
alloy, a ferrous alloy.

00:01:39.680 --> 00:01:44.000
As you could see, we have
blue tempered spring steel

00:01:44.000 --> 00:01:47.210
that many companies,
industries, use.

00:01:47.210 --> 00:01:54.440
And Blue Steel is also a
warhead made in the Cold War.

00:01:54.440 --> 00:01:56.770
But these have nothing
to do with our project.

00:01:56.770 --> 00:02:00.400
To us, and for the
purpose of this project,

00:02:00.400 --> 00:02:04.000
Blue Steel is a distributed
real time ray tracer.

00:02:04.000 --> 00:02:07.020
And Natalia is going to give
you a little background on what

00:02:07.020 --> 00:02:09.461
ray tracing really is.

00:02:09.461 --> 00:02:10.669
NATALIA CHERNENKO: Thank you.

00:02:17.660 --> 00:02:24.210
So ray tracing, there are two
terms that you would generally

00:02:24.210 --> 00:02:27.250
hear, ray tracing
and ray casting.

00:02:27.250 --> 00:02:32.800
Basically, the idea is the
camera is like the eye.

00:02:32.800 --> 00:02:35.710
And from the camera,
rays are cast

00:02:35.710 --> 00:02:38.170
for each pixel in a picture.

00:02:38.170 --> 00:02:43.590
And these rays, they intersect
with all the virtual objects

00:02:43.590 --> 00:02:44.700
in the scene.

00:02:44.700 --> 00:02:50.820
When an object is hit, the
color of it is computed.

00:02:53.610 --> 00:02:56.400
The closest object
that is intersected

00:02:56.400 --> 00:03:00.270
is the object that
shows up on the screen.

00:03:02.451 --> 00:03:04.284
BRIAN SWEATT: We should
credit these slides.

00:03:04.284 --> 00:03:08.760
They're Fredo
Durand's from 6A37.

00:03:08.760 --> 00:03:12.180
NATALIA CHERNENKO: And
ray tracing is recursive.

00:03:12.180 --> 00:03:15.140
It accounts for
shadows, positions

00:03:15.140 --> 00:03:19.200
of other objects in the scene.

00:03:19.200 --> 00:03:22.390
There is reflection
and refraction.

00:03:22.390 --> 00:03:26.030
For example, if this ray
were to hit that object

00:03:26.030 --> 00:03:28.360
and then reflect
it, then another ray

00:03:28.360 --> 00:03:32.160
is cast from that object in
the reflecting direction.

00:03:32.160 --> 00:03:35.420
And this can go on for
an indefinite number

00:03:35.420 --> 00:03:37.500
of reflections.

00:03:37.500 --> 00:03:40.590
These are normally
specified in the program.

00:03:44.550 --> 00:03:47.250
Here is an example
of the recursion.

00:03:47.250 --> 00:03:52.080
So the three planes
are reflective.

00:03:52.080 --> 00:03:55.080
They're like mirrors, and
there are three spheres

00:03:55.080 --> 00:03:56.580
running between them.

00:03:56.580 --> 00:03:59.580
With no recursion, you can see
there is no reflection at all.

00:03:59.580 --> 00:04:04.580
With one recursion, you can see
the rays are reflecting just

00:04:04.580 --> 00:04:05.580
off the side mirrors.

00:04:05.580 --> 00:04:08.170
At the end with two
recursions, there is also

00:04:08.170 --> 00:04:09.570
reflection at the bottom.

00:04:09.570 --> 00:04:12.220
So it is a reflection of
one of the reflections.

00:04:17.180 --> 00:04:20.550
And this is very
computationally intensive

00:04:20.550 --> 00:04:25.090
because for every pixel,
you're casting a ray.

00:04:25.090 --> 00:04:27.300
And for every
object in the scene,

00:04:27.300 --> 00:04:30.010
you're intersecting
that ray with the object

00:04:30.010 --> 00:04:31.480
and seeing if it's hit.

00:04:31.480 --> 00:04:35.560
And then if it is hit and it
is reflective or refractive--

00:04:35.560 --> 00:04:40.300
so transparent-- then
another ray is cast.

00:04:40.300 --> 00:04:46.440
This scene here is a
resolution 1024 by 1024

00:04:46.440 --> 00:04:49.490
and recursion depth is 100.

00:04:49.490 --> 00:04:54.780
And the rendering time in
our program was 2.6 seconds.

00:04:54.780 --> 00:05:01.910
The rendering time for 20 depth
recursion was 600 milliseconds,

00:05:01.910 --> 00:05:06.492
and for 10 was 300 milliseconds.

00:05:06.492 --> 00:05:10.920
PROFESSOR: [INAUDIBLE]
SPU [INAUDIBLE].

00:05:10.920 --> 00:05:13.872
What would be the [INAUDIBLE].

00:05:13.872 --> 00:05:17.550
R. J. RYAN: Many of us have
taken 6A37, and in that class,

00:05:17.550 --> 00:05:20.300
we're all about ray tracers.

00:05:20.300 --> 00:05:24.098
And a typical scene like the one
that is on the PowerPoint right

00:05:24.098 --> 00:05:28.550
now could take up
to a minute or more.

00:05:28.550 --> 00:05:32.360
Often, if you were
to do 100 bounces,

00:05:32.360 --> 00:05:37.398
I think mine would probably
take over 10 minutes.

00:05:37.398 --> 00:05:40.877
PROFESSOR: [INAUDIBLE] you
just to [INAUDIBLE] SPU or

00:05:40.877 --> 00:05:42.236
[? one SPU. ?]

00:05:42.236 --> 00:05:44.110
NATALIA CHERNENKO: We
did try different SPUs.

00:05:44.110 --> 00:05:46.840
This picture, we didn't
try with different SPUs,

00:05:46.840 --> 00:05:50.610
but we have tried other
scenes with various numbers.

00:05:50.610 --> 00:05:54.290
And we did get exact
linear speed up.

00:05:54.290 --> 00:05:57.620
So if there were
twice as many SPUs,

00:05:57.620 --> 00:05:59.900
it would be twice as fast.

00:06:05.165 --> 00:06:05.790
R. J. RYAN: OK.

00:06:05.790 --> 00:06:08.685
So I'm going to talk to
you about optimizations

00:06:08.685 --> 00:06:10.080
that we did.

00:06:10.080 --> 00:06:12.580
Natalia's told you about ray
tracing, and that's only half

00:06:12.580 --> 00:06:15.030
of our project.

00:06:15.030 --> 00:06:19.012
The part that we have to
focus on is optimizing.

00:06:19.012 --> 00:06:22.565
And meeting that goal
of being real time.

00:06:22.565 --> 00:06:27.745
And for us, real time, you
have to ask, what is real time?

00:06:27.745 --> 00:06:32.790
If we go off of like, the
American video standard NTSC,

00:06:32.790 --> 00:06:37.430
they define it to be around
27, 30 frames a second.

00:06:37.430 --> 00:06:39.366
So that was pretty
much the goal that we

00:06:39.366 --> 00:06:40.890
set when we set out to do this.

00:06:40.890 --> 00:06:44.790
And so we took the code and we
think, OK, we can split this.

00:06:44.790 --> 00:06:47.290
This task is easily
parallelizable.

00:06:47.290 --> 00:06:50.720
But we're going to have to
do a lot more than that if we

00:06:50.720 --> 00:06:52.010
want to get better rates.

00:06:52.010 --> 00:06:55.430
So we took advantage of the
hardware that we were on.

00:06:55.430 --> 00:07:01.570
And one thing that on
the ray tracing end,

00:07:01.570 --> 00:07:03.680
we had to go crazy
with, was SIMD.

00:07:04.670 --> 00:07:08.300
Pretty much, the
entire ray tracer never

00:07:08.300 --> 00:07:11.010
deals with a scalar value.

00:07:11.010 --> 00:07:12.660
We use SIMD for everything.

00:07:12.660 --> 00:07:19.460
And in the core of it,
we do branch avoidance.

00:07:19.460 --> 00:07:21.880
So we really try not
to take branches,

00:07:21.880 --> 00:07:25.010
because branch misses are
very costly on the cell.

00:07:25.010 --> 00:07:29.490
So I believe there's
something like three

00:07:29.490 --> 00:07:34.020
if statements spread
across 4,000 lines of code

00:07:34.020 --> 00:07:35.520
for the ray tracer.

00:07:35.520 --> 00:07:38.590
We really took SIMD
to hear in this.

00:07:42.110 --> 00:07:47.480
Using SIMD, you can process
arrays of structures.

00:07:47.480 --> 00:07:49.797
So we took--

00:07:49.797 --> 00:07:51.640
NATALIA CHERNENKO: [INAUDIBLE]

00:07:52.585 --> 00:07:56.510
R. J. RYAN: We would trace ray
packets of four rays at a time,

00:07:56.510 --> 00:07:58.950
and from those ray packets,
we would generate hit packets.

00:07:58.950 --> 00:08:01.120
And then similarly, as
you go down the line,

00:08:01.120 --> 00:08:05.630
we would produce an RGB packet,
which is four color values tied

00:08:05.630 --> 00:08:06.540
into one.

00:08:06.540 --> 00:08:11.015
So at every step, we would
use SIMD to our advantage.

00:08:11.015 --> 00:08:13.140
Now Brian's going to talk
to you about pipe lining.

00:08:17.570 --> 00:08:20.085
BRIAN SWEATT: So of particular
importance on the cell

00:08:20.085 --> 00:08:23.347
is not only what instructions
you execute, but the order

00:08:23.347 --> 00:08:25.164
that you execute them
in, because there is

00:08:25.164 --> 00:08:27.080
no reorder buffer on the cell.

00:08:27.080 --> 00:08:28.680
There is no out of
order execution.

00:08:28.680 --> 00:08:31.400
And despite the fact
that compiler technology

00:08:31.400 --> 00:08:35.909
is very good, hand optimization
and pipe lining instructions

00:08:35.909 --> 00:08:37.770
is still always better.

00:08:37.770 --> 00:08:42.520
For instance, with our
triangle intersection routine,

00:08:42.520 --> 00:08:45.360
we gained on the
order of hundreds

00:08:45.360 --> 00:08:49.259
of milliseconds of render time.

00:08:49.259 --> 00:08:51.300
We shaved hundreds of
milliseconds of render time

00:08:51.300 --> 00:08:54.010
off of our render time for
some scenes you will see later,

00:08:54.010 --> 00:08:57.190
just by reordering
our calculations

00:08:57.190 --> 00:08:58.450
in the intersection code.

00:08:58.450 --> 00:09:00.750
So it's very important,
especially on the SPEs.

00:09:00.750 --> 00:09:02.720
And one other
optimization we made

00:09:02.720 --> 00:09:06.430
was the fact that we
coded basically everything

00:09:06.430 --> 00:09:07.750
on the SPEs.

00:09:07.750 --> 00:09:11.480
Anything running on the SPEs
is coded in C. Not in C++.

00:09:11.480 --> 00:09:13.730
And we're not using
inheritance when we do use C++

00:09:13.730 --> 00:09:18.930
because virtual functions
have more overhead in respect

00:09:18.930 --> 00:09:22.290
to memory, the V table, and
also looking up the function

00:09:22.290 --> 00:09:23.880
pointers from the V table.

00:09:23.880 --> 00:09:26.354
So these are just a couple
of the optimizations we made.

00:09:26.354 --> 00:09:27.520
We'll talk about more later.

00:09:27.520 --> 00:09:28.120
AUDIENCE: Just a quick question.

00:09:28.120 --> 00:09:29.494
So for the pipeline,
you actually

00:09:29.494 --> 00:09:31.630
reordered instruction
statements?

00:09:31.630 --> 00:09:33.710
The compiler wasn't very
helpful in that respect?

00:09:33.710 --> 00:09:34.501
BRIAN SWEATT: Yeah.

00:09:34.501 --> 00:09:40.480
So on that scene, 1024
by 1024 with the spheres,

00:09:40.480 --> 00:09:43.460
we were getting,
Natalia mentioned

00:09:43.460 --> 00:09:47.800
with a depth of recursion 10, we
were getting 300 milliseconds.

00:09:47.800 --> 00:09:48.970
That's with optimized.

00:09:48.970 --> 00:09:52.250
Without the optimizations,
we were getting closer to 500

00:09:52.250 --> 00:09:55.850
with a depth of recursion of 10.

00:09:55.850 --> 00:09:59.930
And we gained that extra 200
milliseconds just by reordering

00:09:59.930 --> 00:10:03.980
instructions in the intersection
with optimization level 03

00:10:03.980 --> 00:10:05.570
in XLC.

00:10:05.570 --> 00:10:08.930
So hand optimization, like
Mike Acton said in class,

00:10:08.930 --> 00:10:09.960
is still very important.

00:10:13.281 --> 00:10:15.030
And I'm going to hand
you off to Mikey now

00:10:15.030 --> 00:10:16.340
to discuss the regular grid.

00:10:21.059 --> 00:10:22.850
MICHAEL D'AMBROSIO: So
one of the problems,

00:10:22.850 --> 00:10:25.340
I guess, with ray tracing
is that intersections

00:10:25.340 --> 00:10:26.953
are very expensive.

00:10:26.953 --> 00:10:31.056
And when you have scenes
with n rays and m primitives,

00:10:31.056 --> 00:10:34.200
well, you have to generate
n times m intersections.

00:10:34.200 --> 00:10:37.700
So one common acceleration
structure is grid.

00:10:37.700 --> 00:10:40.480
And the concept is very
simple, as the picture depicts.

00:10:40.480 --> 00:10:42.380
You just take your
screen, split it up

00:10:42.380 --> 00:10:44.840
into a number of
evenly size box holes,

00:10:44.840 --> 00:10:46.930
place your primitives
in them, and then you

00:10:46.930 --> 00:10:48.560
march your ray through.

00:10:48.560 --> 00:10:52.550
And you only intersect
the ray with primitives

00:10:52.550 --> 00:10:54.660
that are around the area.

00:10:54.660 --> 00:10:56.230
And the good thing
about a grid is

00:10:56.230 --> 00:10:57.690
that it's very
easy to construct,

00:10:57.690 --> 00:11:00.130
as opposed to, say, a
k-d tree or an octree.

00:11:00.130 --> 00:11:02.050
Those take on the
order of n log n,

00:11:02.050 --> 00:11:04.350
where n is number of primitives.

00:11:04.350 --> 00:11:07.275
Regular grid is order n, where
n is the number of primitives.

00:11:07.275 --> 00:11:08.650
And the reason
why we did this is

00:11:08.650 --> 00:11:12.170
because with an interactive
scene where things are moving

00:11:12.170 --> 00:11:15.090
around, you need to rebuild
the grid every frame

00:11:15.090 --> 00:11:16.930
or whenever something moves.

00:11:16.930 --> 00:11:20.410
So we favored this choice
of acceleration structure

00:11:20.410 --> 00:11:23.325
because it was quick to
build and easy to traverse.

00:11:26.080 --> 00:11:31.063
And now Fish is going to
tell you about the fun

00:11:31.063 --> 00:11:32.037
part of the project.

00:11:37.394 --> 00:11:41.290
SCOTT FISHER: I was in charge
of building the actual physics

00:11:41.290 --> 00:11:42.270
engine.

00:11:42.270 --> 00:11:47.770
Our original plan was to
the 6170 well-known gizmo

00:11:47.770 --> 00:11:51.860
ball in 3D and ray traced.

00:11:51.860 --> 00:11:57.180
And I guess if we
want to show that?

00:11:57.180 --> 00:12:00.790
I have it running
on this laptop.

00:12:00.790 --> 00:12:02.700
We didn't quite
get it ray traced,

00:12:02.700 --> 00:12:06.110
but I'm just going to show
you a little bit of what

00:12:06.110 --> 00:12:07.300
I did as a physics engine.

00:12:21.880 --> 00:12:22.380
Maybe not.

00:12:32.535 --> 00:12:34.118
R. J. RYAN: We'll
get the demo working

00:12:34.118 --> 00:12:35.886
and then I think we
can use it later.

00:12:35.886 --> 00:12:37.469
SCOTT FISHER: Anyway,
it was basically

00:12:37.469 --> 00:12:40.120
a 2 and 1/2 D version
of the example that

00:12:40.120 --> 00:12:42.170
was just up on the screen.

00:12:42.170 --> 00:12:46.440
Flippers work, absorbers work,
and due to complications,

00:12:46.440 --> 00:12:48.470
we never did get it
fully ray traced.

00:12:48.470 --> 00:12:54.120
And I guess we'll go ahead and
give you the real ray trace

00:12:54.120 --> 00:12:55.860
demo, since we do have
a couple of those.

00:13:04.613 --> 00:13:08.025
PROFESSOR: So their actual demo
will be on a little LCD screen

00:13:08.025 --> 00:13:08.525
here.

00:13:08.525 --> 00:13:11.948
Hopefully everybody
can see the ray trace.

00:13:11.948 --> 00:13:15.126
Unfortunately, PS3
needs a specific kind

00:13:15.126 --> 00:13:18.716
of digital connection to display
things in the right resolution

00:13:18.716 --> 00:13:20.680
when you're writing
through Frame Buffer.

00:13:20.680 --> 00:13:22.660
So we can't project it
onto the big screen.

00:13:22.660 --> 00:13:25.135
PROFESSOR: [INAUDIBLE]

00:13:25.135 --> 00:13:26.819
PROFESSOR: Can I
do it from here?

00:13:26.819 --> 00:13:27.610
PROFESSOR: Perfect.

00:13:32.624 --> 00:13:34.540
BRIAN SWEATT: Now you
got to tell [INAUDIBLE].

00:13:37.505 --> 00:13:38.005
Oh, no.

00:13:38.005 --> 00:13:39.985
Wrong way.

00:13:39.985 --> 00:13:40.975
R. J. RYAN: Really?

00:13:40.975 --> 00:13:43.770
BRIAN SWEATT: By the way,
he's still adjusting it.

00:13:43.770 --> 00:13:44.720
R. J. RYAN: Whoops.

00:13:44.720 --> 00:13:45.553
BRIAN SWEATT: Uh oh.

00:13:49.000 --> 00:13:50.715
Oh, no.

00:13:50.715 --> 00:13:51.340
R. J. RYAN: OK.

00:13:51.340 --> 00:13:54.480
We've prepared a
few demos for you.

00:13:54.480 --> 00:13:57.710
And these are really
just going to show off

00:13:57.710 --> 00:14:00.650
the ray tracing part
of it and the rates

00:14:00.650 --> 00:14:04.430
that we got after we
applied our speed ups.

00:14:08.160 --> 00:14:09.960
Thank you.

00:14:09.960 --> 00:14:12.850
Here we see both
the camera is moving

00:14:12.850 --> 00:14:17.450
and we implemented full
scene functional textures,

00:14:17.450 --> 00:14:18.900
so you see the checkerboard.

00:14:18.900 --> 00:14:21.890
It's generated every frame.

00:14:21.890 --> 00:14:23.500
We used no texture maps.

00:14:23.500 --> 00:14:26.600
This is completely
generated functionally.

00:14:26.600 --> 00:14:30.110
Bump mapping, as you can see
in the large orange sphere,

00:14:30.110 --> 00:14:33.470
is also done on the
fly, as well as blending

00:14:33.470 --> 00:14:37.700
between two textures using
Perlin noise is done on the fly

00:14:37.700 --> 00:14:38.980
also.

00:14:38.980 --> 00:14:40.970
Showcased in the center
sphere right here,

00:14:40.970 --> 00:14:43.720
you can see both
reflection and refraction.

00:14:43.720 --> 00:14:45.130
The sphere is transparent.

00:14:45.130 --> 00:14:46.910
It acts as a glass sphere.

00:14:46.910 --> 00:14:53.490
So you can see the reflection
of the orange in it,

00:14:53.490 --> 00:14:55.390
but also the warping
of the checkerboard

00:14:55.390 --> 00:14:57.390
as it goes through.

00:14:57.390 --> 00:14:59.250
This is one of the
advantages of-- well,

00:14:59.250 --> 00:15:02.180
spheres are one of the
advantages of ray tracing,

00:15:02.180 --> 00:15:08.690
because you can use
geometrical equations to define

00:15:08.690 --> 00:15:12.080
your geometry versus in
using rasterization methods,

00:15:12.080 --> 00:15:13.990
you have to actually
define the geometry

00:15:13.990 --> 00:15:17.246
in some concrete numerical way.

00:15:17.246 --> 00:15:19.370
Going to go to the next demo.

00:15:19.370 --> 00:15:22.000
This should showcase some of
the lights that we implemented.

00:15:22.000 --> 00:15:24.900
AUDIENCE: So how many
frames per second is this?

00:15:24.900 --> 00:15:27.400
R. J. RYAN: This should
be 15 frames a second.

00:15:27.400 --> 00:15:29.860
And let me reiterate.

00:15:29.860 --> 00:15:33.290
The refraction step requires
at least two bounces,

00:15:33.290 --> 00:15:35.840
so two steps of recursion.

00:15:35.840 --> 00:15:42.540
If we were to turn that
off, we would be achieving,

00:15:42.540 --> 00:15:46.165
I would estimate, 27
frames a second, 28.

00:15:46.165 --> 00:15:47.790
But then it doesn't
look as pretty, so,

00:15:47.790 --> 00:15:49.220
you know, what's the point.

00:15:49.220 --> 00:15:52.040
You can see-- this will
be showcased later--

00:15:52.040 --> 00:15:54.167
but you can see the spotlight.

00:15:54.167 --> 00:15:55.000
It's kind of moving.

00:15:55.000 --> 00:15:57.639
It's waving back and
forth, illuminating

00:15:57.639 --> 00:15:58.430
parts of the scene.

00:15:58.430 --> 00:16:01.490
There's also a point light
at some distance around here

00:16:01.490 --> 00:16:02.970
that's illuminating everything.

00:16:02.970 --> 00:16:06.199
But it's basically the
same set of primitives.

00:16:06.199 --> 00:16:07.740
If you are closer,
you could probably

00:16:07.740 --> 00:16:11.690
notice that the checkerboard
is fully reflective.

00:16:11.690 --> 00:16:15.292
This far wall, you can view
the entire scene in it.

00:16:15.292 --> 00:16:16.605
Going to go to the next demo.

00:16:19.950 --> 00:16:22.910
This is with the lights off.

00:16:22.910 --> 00:16:25.600
So you can get a better sense of
what the spotlight is actually

00:16:25.600 --> 00:16:26.100
doing.

00:16:28.910 --> 00:16:30.770
If you notice on
this orange sphere,

00:16:30.770 --> 00:16:35.190
we implemented two
different forms of BRDFs.

00:16:35.190 --> 00:16:36.920
One was generic foam shading.

00:16:36.920 --> 00:16:38.990
And then the other
was Cook-Torrance,

00:16:38.990 --> 00:16:42.165
which helps
particularly with metals

00:16:42.165 --> 00:16:45.170
and other materials like that.

00:16:45.170 --> 00:16:50.750
It helps have very bright
specular highlight.

00:16:50.750 --> 00:16:52.740
And you'll notice as
the camera's panning

00:16:52.740 --> 00:17:01.360
back and forth, the speed
ups that we get from-- we

00:17:01.360 --> 00:17:04.560
used little tricks to
glean speed here and there.

00:17:04.560 --> 00:17:09.150
Like when it's so dark
or it's very far away,

00:17:09.150 --> 00:17:10.704
we stop recursing.

00:17:10.704 --> 00:17:12.156
BRIAN SWEATT: We're over, so--

00:17:12.156 --> 00:17:12.640
R. J. RYAN: We are?

00:17:12.640 --> 00:17:13.124
BRIAN SWEATT: Yeah.

00:17:13.124 --> 00:17:14.099
So if there's any--

00:17:14.099 --> 00:17:15.280
R. J. RYAN: Let's see.

00:17:15.280 --> 00:17:17.650
SCOTT FISHER: Do the
SPUs from one to six.

00:17:17.650 --> 00:17:21.050
R. J. RYAN: I can demonstrate--
we actually enabled it

00:17:21.050 --> 00:17:24.347
so that you can turn it on
either one through six SPEs

00:17:24.347 --> 00:17:25.555
using command line arguments.

00:17:28.130 --> 00:17:29.810
Let's go back to the first demo.

00:17:29.810 --> 00:17:36.330
If I say just use one SPE, we
can watch it slowly trod along.

00:17:36.330 --> 00:17:38.060
And if I were
SHH-ed in, it would

00:17:38.060 --> 00:17:42.056
be giving me timing information
so I could tell you exactly.

00:17:42.056 --> 00:17:44.180
It's really unfortunate
that it's not, because it's

00:17:44.180 --> 00:17:45.790
very linear in its speed up.

00:17:45.790 --> 00:17:49.690
Right now it says
500 milliseconds.

00:17:49.690 --> 00:17:54.770
But if I turn it on two, it
drops literally in half to 250.

00:17:54.770 --> 00:17:57.305
You can get a sense
of that, that it's

00:17:57.305 --> 00:18:02.760
moving like tick, tick, tick,
tick versus tick, tick, tick.

00:18:02.760 --> 00:18:08.420
And then three, it's
a similar speedup.

00:18:08.420 --> 00:18:11.420
And then I'll jump to five.

00:18:11.420 --> 00:18:13.200
It's getting closer.

00:18:13.200 --> 00:18:22.130
And then when we turn on the
sixth, this should be about 12,

00:18:22.130 --> 00:18:23.560
15 frames a second.

00:18:25.977 --> 00:18:28.060
AUDIENCE: So you estimate
if you had a dozen SPUs,

00:18:28.060 --> 00:18:29.024
you could go--

00:18:29.024 --> 00:18:31.190
R. J. RYAN: Actually, it's
funny you mentioned that.

00:18:31.190 --> 00:18:34.165
I think it was the Julia
re-tracer or the IRT.

00:18:34.165 --> 00:18:35.290
BRIAN SWEATT: Both of them.

00:18:35.290 --> 00:18:37.485
R. J. RYAN: Both of them
use two cell blades,

00:18:37.485 --> 00:18:39.125
which is a full eight?

00:18:39.125 --> 00:18:41.000
MICHAEL D'AMBROSIO: It's
a full eight times--

00:18:41.000 --> 00:18:42.416
R. J. RYAN: Eight
cores times two.

00:18:42.416 --> 00:18:44.760
So they use 16 SPUs.

00:18:44.760 --> 00:18:47.517
The PlayStation 3 do
have certain factors.

00:18:47.517 --> 00:18:48.600
We're only limited to six.

00:18:53.340 --> 00:18:55.390
AUDIENCE: The way you're
dividing up the work,

00:18:55.390 --> 00:19:01.140
each SPU taking every sixth
raster line, that sounds like--

00:19:01.140 --> 00:19:05.640
could you improve on the
performance by allocating

00:19:05.640 --> 00:19:06.980
the rays--

00:19:06.980 --> 00:19:09.420
R. J. RYAN: The issue
there is one of granularity

00:19:09.420 --> 00:19:11.500
of our distribution.

00:19:11.500 --> 00:19:15.630
And also communication
time on the bus.

00:19:15.630 --> 00:19:18.535
We've designed it such that
we can fine tune it in any way

00:19:18.535 --> 00:19:19.248
we want.

00:19:19.248 --> 00:19:20.706
AUDIENCE: But it
seems to me if you

00:19:20.706 --> 00:19:26.490
had a localized bundle
of rays on a single SPU,

00:19:26.490 --> 00:19:28.680
you might be able to
do some optimizations,

00:19:28.680 --> 00:19:30.810
like treat them as fat rays.

00:19:30.810 --> 00:19:33.177
Nearby rays are likely to
intersect the same triangles.

00:19:33.177 --> 00:19:35.260
MICHAEL D'AMBROSIO: Choosing
the frosting variety.

00:19:35.260 --> 00:19:37.090
R. J. RYAN: The
entire material system

00:19:37.090 --> 00:19:38.390
works very much like that.

00:19:38.390 --> 00:19:40.790
It actually saves
on computations

00:19:40.790 --> 00:19:45.780
based on which rays
in a packet of four

00:19:45.780 --> 00:19:48.210
have collided with
the same element.

00:19:48.210 --> 00:19:51.130
And if it does, then it
takes full advantage of SIMD

00:19:51.130 --> 00:19:54.933
to quickly chug through
those rays that are the same

00:19:54.933 --> 00:19:57.217
and then never call that
particular shader again,

00:19:57.217 --> 00:19:59.050
because it's already
done the work for them.

00:19:59.050 --> 00:20:01.260
MICHAEL D'AMBROSIO: Another
issue, I think, is--

00:20:01.260 --> 00:20:03.320
PROFESSOR: Sorry, we
should try to wrap so we

00:20:03.320 --> 00:20:04.580
can get through the others.

00:20:04.580 --> 00:20:07.290
Any more very short questions?

00:20:07.290 --> 00:20:10.760
Did you guys have any last
concluding statements?

00:20:10.760 --> 00:20:13.560
NATALIA CHERNENKO: Maybe we go
through this like really fast?

00:20:13.560 --> 00:20:15.452
[INTERPOSING VOICES]

00:20:15.452 --> 00:20:17.160
BRIAN SWEATT: Point
is we had two pieces.

00:20:17.160 --> 00:20:18.010
PROFESSOR: Thank the speaker.

00:20:18.010 --> 00:20:18.926
R. J. RYAN: Thank you.

00:20:18.926 --> 00:20:22.560
[APPLAUSE]