WEBVTT

00:00:00.060 --> 00:00:02.500
The following content is
provided under a Creative

00:00:02.500 --> 00:00:04.010
Commons license.

00:00:04.010 --> 00:00:06.350
Your support will help
MIT OpenCourseWare

00:00:06.350 --> 00:00:10.720
continue to offer high quality
educational resources for free.

00:00:10.720 --> 00:00:13.330
To make a donation or
view additional materials

00:00:13.330 --> 00:00:17.209
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.209 --> 00:00:17.834
at ocw.mit.edu.

00:00:26.550 --> 00:00:28.270
PROFESSOR: Greg was
not here yesterday.

00:00:28.270 --> 00:00:29.220
He was away.

00:00:29.220 --> 00:00:32.470
And so this is work he's
largely done on his own again.

00:00:32.470 --> 00:00:34.339
He's done a molecular
dynamics simulator.

00:00:34.339 --> 00:00:36.380
So he's going to tell us
a little bit about that,

00:00:36.380 --> 00:00:39.770
and then after that we'll
do course evaluations,

00:00:39.770 --> 00:00:42.530
hand out some awards, have
some cake, and then after that

00:00:42.530 --> 00:00:45.885
some pizza, and then
play same PlayStation 3s.

00:00:45.885 --> 00:00:48.830
All yours.

00:00:48.830 --> 00:00:50.790
GREG PINTILIE: OK, so
I'll talk about the make

00:00:50.790 --> 00:00:53.080
general molecular
dynamics algorithm

00:00:53.080 --> 00:00:56.930
and then parallelization
approaches.

00:00:56.930 --> 00:00:59.980
So molecular dynamics is based
on a potential energy function,

00:00:59.980 --> 00:01:04.519
which includes two terms, bonded
terms and non-bonded terms.

00:01:04.519 --> 00:01:08.455
And bonded terms are split
up into three sub terms.

00:01:12.040 --> 00:01:14.867
So don't confuse
bonded with-- actually,

00:01:14.867 --> 00:01:16.450
don't think that
these are non-bonded.

00:01:16.450 --> 00:01:18.650
These are actually all
sort of grouped as bonded.

00:01:18.650 --> 00:01:20.108
I know it's a little
bit confusing,

00:01:20.108 --> 00:01:22.970
but right now I'm dealing just
with the sort of bonded terms.

00:01:22.970 --> 00:01:24.930
And what that means
is that it deals

00:01:24.930 --> 00:01:27.410
with atoms that are connected
together covalently.

00:01:27.410 --> 00:01:32.610
So for example, the bonds
term reflects some energy

00:01:32.610 --> 00:01:37.190
due to a displacement
between two atoms, which

00:01:37.190 --> 00:01:38.570
are certain distance apart.

00:01:38.570 --> 00:01:41.930
There is an equilibrium distance
that they like to stay at,

00:01:41.930 --> 00:01:45.280
and if they vary
from that distance

00:01:45.280 --> 00:01:49.760
the potential increases as
a squared of that separation

00:01:49.760 --> 00:01:51.770
distance.

00:01:51.770 --> 00:01:57.890
The angles term
represents, basically,

00:01:57.890 --> 00:01:59.610
the deviation from
an equilibrium

00:01:59.610 --> 00:02:01.350
angle between three atoms.

00:02:01.350 --> 00:02:04.360
So if you imagine these
three nodes as three atoms,

00:02:04.360 --> 00:02:07.560
if this angle varies away
from a certain equilibrium

00:02:07.560 --> 00:02:11.260
theta or angle, then
the potential energy

00:02:11.260 --> 00:02:16.120
is also a quadratic
term, which tends

00:02:16.120 --> 00:02:22.010
to bring the three atoms to have
a certain equilibrium angle.

00:02:22.010 --> 00:02:25.350
And then, finally, the
dihedral and improper term

00:02:25.350 --> 00:02:27.260
is probably one of the
more complicated ones.

00:02:27.260 --> 00:02:31.990
The improper term basically
deals with four atoms.

00:02:31.990 --> 00:02:36.080
So if you imagine this is
one, two, three, and four.

00:02:36.080 --> 00:02:39.090
These three atoms, if you
put two planes through them,

00:02:39.090 --> 00:02:43.230
the angle that those two
planes make to each other

00:02:43.230 --> 00:02:46.360
is given a certain
equilibrium angle again.

00:02:46.360 --> 00:02:50.930
And this should also
be a quadratic term,

00:02:50.930 --> 00:02:52.650
but the energy also
varies quadratically

00:02:52.650 --> 00:02:55.760
as that angle is deviated from.

00:02:55.760 --> 00:02:59.670
An actual dihedral
is the rotation

00:02:59.670 --> 00:03:03.330
along a bond between
the two central atoms.

00:03:03.330 --> 00:03:07.540
And this energy term has
actually a cosine form.

00:03:11.680 --> 00:03:16.040
So as you rotate it along
this bond, say for example,

00:03:16.040 --> 00:03:19.990
the energy will
very sinusoidally.

00:03:19.990 --> 00:03:22.000
And there will be a
restoring potential that

00:03:22.000 --> 00:03:24.830
will tend to make
it either a line,

00:03:24.830 --> 00:03:28.320
or sometimes it will
make the atom stagger.

00:03:28.320 --> 00:03:32.090
So all of these parameters are
very sort of molecule and atom

00:03:32.090 --> 00:03:32.750
dependent.

00:03:32.750 --> 00:03:35.350
And all of these
parameters are available,

00:03:35.350 --> 00:03:40.450
or have been built over many
years by different methods

00:03:40.450 --> 00:03:43.870
like ab initio calculations
or experimental calculations.

00:03:43.870 --> 00:03:46.380
But basically, if you plug those
parameters in, for example,

00:03:46.380 --> 00:03:51.120
here k represents the
spring constant, right?

00:03:51.120 --> 00:03:53.370
Because they're basically
just like spring constraints

00:03:53.370 --> 00:03:54.045
more or less.

00:03:54.045 --> 00:03:55.420
Or in this case,
it will tell you

00:03:55.420 --> 00:03:57.920
how far up the potential,
like what this value will

00:03:57.920 --> 00:04:02.570
be at the maximum, and will,
for example, in this case,

00:04:02.570 --> 00:04:06.310
will specify how
many fluctuations you

00:04:06.310 --> 00:04:08.400
have in the energy function.

00:04:08.400 --> 00:04:10.340
And the other
parameters are usually

00:04:10.340 --> 00:04:14.116
the theta, the equilibrium
theta, or the equilibrium,

00:04:14.116 --> 00:04:17.170
or the equilibrium bond length.

00:04:17.170 --> 00:04:19.279
So this is kind of
what you can expect

00:04:19.279 --> 00:04:27.120
for it to look like as
molecules will tend to move.

00:04:27.120 --> 00:04:30.240
The non bonded terms
include van-der-Waals terms

00:04:30.240 --> 00:04:31.810
and electrostatic terms.

00:04:31.810 --> 00:04:35.450
So van-der-Waals forces
occur between any two atoms,

00:04:35.450 --> 00:04:41.780
and if they're far enough apart,
they will attract to each other

00:04:41.780 --> 00:04:43.220
according to this formula.

00:04:43.220 --> 00:04:45.040
And if they're too
close together,

00:04:45.040 --> 00:04:47.610
so at equilibrium
distance between them

00:04:47.610 --> 00:04:51.780
the energy is a minimum,
but as they get any closer

00:04:51.780 --> 00:04:54.670
there will be a very strong
group repulsive force, which

00:04:54.670 --> 00:04:56.600
is the r to the 12th term.

00:04:56.600 --> 00:04:58.590
And as they get
further apart, there

00:04:58.590 --> 00:05:03.440
is an attraction force, which
varies as r to the sixth.

00:05:06.870 --> 00:05:10.450
And then the electrostatic
force is just

00:05:10.450 --> 00:05:12.630
a simple Coulombic form.

00:05:12.630 --> 00:05:15.120
So you basically take
two atoms that are,

00:05:15.120 --> 00:05:18.050
say oppositely charged,
will attract each other.

00:05:18.050 --> 00:05:20.480
If they're positively charged
they will repel each other.

00:05:20.480 --> 00:05:24.320
And it's just a matter of
multiplying the charges,

00:05:24.320 --> 00:05:26.310
and then dividing by
r to get the energy.

00:05:29.160 --> 00:05:31.520
OK, so to do molecular
dynamics, basically,

00:05:31.520 --> 00:05:34.850
you have to take derivatives
of all those formulas.

00:05:34.850 --> 00:05:35.970
I just showed you.

00:05:35.970 --> 00:05:37.719
And I didn't do that all myself.

00:05:37.719 --> 00:05:38.510
It was pretty hard.

00:05:38.510 --> 00:05:42.940
I just got some code from
some current existing

00:05:42.940 --> 00:05:44.420
molecular dynamic simulators.

00:05:44.420 --> 00:05:46.230
But basically, you
take the derivative.

00:05:46.230 --> 00:05:48.390
And that basically equates
to the force, right?

00:05:48.390 --> 00:05:51.150
The typical Newton
Newtonian model.

00:05:51.150 --> 00:05:53.980
And that gives you the
force in every atom,

00:05:53.980 --> 00:05:58.590
and then using that force
and integration scheme,

00:05:58.590 --> 00:06:00.970
the leapfrog Verlet
integration scheme just

00:06:00.970 --> 00:06:05.420
gives you the velocity at
the next half time step.

00:06:05.420 --> 00:06:08.860
And then using that velocity
at the half time step,

00:06:08.860 --> 00:06:13.830
the positions of
each atom is updated

00:06:13.830 --> 00:06:16.910
based on the positions at
the previous time step,

00:06:16.910 --> 00:06:18.570
and then some value.

00:06:18.570 --> 00:06:20.570
So I will show you
a little bit of what

00:06:20.570 --> 00:06:23.675
it looks like on a
simple molecule first.

00:06:27.770 --> 00:06:29.400
So here's a sort
of simple molecule,

00:06:29.400 --> 00:06:31.550
and it has just an
initial configuration

00:06:31.550 --> 00:06:35.540
that is just sort of random.

00:06:38.990 --> 00:06:43.070
And so one technique
that is done

00:06:43.070 --> 00:06:45.190
is first the geometry
is minimized,

00:06:45.190 --> 00:06:47.140
which means that you
just take small steps

00:06:47.140 --> 00:06:49.800
in the direction of the gradient
until you reach the energy

00:06:49.800 --> 00:06:50.740
minimum.

00:06:50.740 --> 00:06:52.680
And if I do that,
I sort of reach

00:06:52.680 --> 00:06:54.335
sort of the minimum
configuration

00:06:54.335 --> 00:06:57.140
that this molecule
likes to be in.

00:06:57.140 --> 00:07:00.460
And then, if I add some thermal
noise, which means that I just

00:07:00.460 --> 00:07:02.650
set the initial
velocities of the atoms

00:07:02.650 --> 00:07:06.370
to some random value
based on the temperature,

00:07:06.370 --> 00:07:10.680
then the molecule starts
sort of jiggling around.

00:07:10.680 --> 00:07:13.180
And the nice thing about the
Verlet integration scheme

00:07:13.180 --> 00:07:14.830
is that it's
actually just right,

00:07:14.830 --> 00:07:16.650
like the energy is
conserved in the system,

00:07:16.650 --> 00:07:18.200
although it fluctuates
a little bit.

00:07:18.200 --> 00:07:19.949
So a lot of integration
schemes the energy

00:07:19.949 --> 00:07:23.220
tends to be either
decrease or increase,

00:07:23.220 --> 00:07:24.569
so the system explodes.

00:07:24.569 --> 00:07:26.360
But this integration
scheme is very stable.

00:07:29.880 --> 00:07:33.500
One thing that you saw there
is, so I just talked briefly

00:07:33.500 --> 00:07:34.390
about kinetic energy.

00:07:34.390 --> 00:07:36.440
This is kind of how the
initial velocities are

00:07:36.440 --> 00:07:40.100
set based on the Boltzmann
constant and the temperature.

00:07:40.100 --> 00:07:43.520
And the other thing that
molecular dynamics usually do

00:07:43.520 --> 00:07:45.010
is called Langevin dynamics.

00:07:45.010 --> 00:07:48.640
And in this case, some
random velocity vector

00:07:48.640 --> 00:07:53.490
is added to the normal velocity,
which just sort of simulates

00:07:53.490 --> 00:07:55.789
collisions with other
molecules in the environment.

00:07:55.789 --> 00:07:58.080
So this makes the simulation
look a bit more realistic.

00:07:58.080 --> 00:08:00.820
So instead of the molecule
just sitting there in space,

00:08:00.820 --> 00:08:06.350
this random sort of
factor adds some movement.

00:08:06.350 --> 00:08:08.720
And there's also a
damping constant,

00:08:08.720 --> 00:08:13.250
which sort of opposes the
velocity by some constant.

00:08:13.250 --> 00:08:16.550
So if you didn't add any
more sort of randomness

00:08:16.550 --> 00:08:20.700
into the system, the system
would just eventually go

00:08:20.700 --> 00:08:22.442
to be completely still.

00:08:22.442 --> 00:08:23.900
So there would be
no more movement,

00:08:23.900 --> 00:08:27.600
because this is kind of like
a damping sort of factor.

00:08:27.600 --> 00:08:31.900
And there are some properties
of this random vector that sort

00:08:31.900 --> 00:08:34.419
of make some physical sense.

00:08:34.419 --> 00:08:36.950
For example, it's independent
of previously picked

00:08:36.950 --> 00:08:39.650
random vectors, and it also
depends on the temperature.

00:08:39.650 --> 00:08:42.299
So that if you keep
adding this random factor

00:08:42.299 --> 00:08:44.322
to the simulation,
the temperature

00:08:44.322 --> 00:08:45.405
will always stay constant.

00:08:50.590 --> 00:08:52.990
Another thing to consider
is that these molecules,

00:08:52.990 --> 00:08:54.620
if you try to think
of them as they

00:08:54.620 --> 00:08:56.860
are in a sort of typical
biological system,

00:08:56.860 --> 00:09:00.030
is that they'll be
solvated in water.

00:09:00.030 --> 00:09:03.040
So in that case
the Coulombic term

00:09:03.040 --> 00:09:05.260
is not actually that simple.

00:09:05.260 --> 00:09:08.270
You have to include some
sort of dielectric constant.

00:09:08.270 --> 00:09:10.990
So for example, one
that is typically used

00:09:10.990 --> 00:09:14.500
and is pretty simple is that the
constant is just the distance.

00:09:14.500 --> 00:09:18.690
So that makes the electrostatic
energy dependent on one over r

00:09:18.690 --> 00:09:20.440
squared rather than 1 over r.

00:09:20.440 --> 00:09:23.410
So that means that the
electrostatic forces drop off

00:09:23.410 --> 00:09:27.240
a lot faster as the atoms get
further and further apart.

00:09:27.240 --> 00:09:33.430
So for example, so two molecules
that are oppositely charged,

00:09:33.430 --> 00:09:36.030
if you just put them in
vacuum, they will eventually

00:09:36.030 --> 00:09:37.090
come close together.

00:09:37.090 --> 00:09:42.320
But if you add this term,
the electrostatic attraction

00:09:42.320 --> 00:09:43.160
will be much weaker.

00:09:43.160 --> 00:09:45.730
So it will simulate them
being solvated in water.

00:09:45.730 --> 00:09:47.760
And they will come
together eventually

00:09:47.760 --> 00:09:49.180
but probably a lot slower.

00:09:52.200 --> 00:09:54.040
Another thing that
is typically--

00:09:54.040 --> 00:09:56.320
so I'll talk a little bit
about the non-bonded forces

00:09:56.320 --> 00:09:59.110
right now, which are, again,
the electrostatic forces

00:09:59.110 --> 00:10:01.000
and the van-der-Waal's forces.

00:10:01.000 --> 00:10:03.310
So generally, what
has to be done

00:10:03.310 --> 00:10:06.110
is, if you consider
any single atom

00:10:06.110 --> 00:10:09.400
like say this one
over here, well,

00:10:09.400 --> 00:10:11.340
so first of all the
bonded forces basically

00:10:11.340 --> 00:10:13.990
include just the atoms that are
connected to it, and that's it.

00:10:13.990 --> 00:10:17.530
So it's usually the fastest
step in the computation.

00:10:17.530 --> 00:10:20.780
But for non-bonded forces,
you have to consider basically

00:10:20.780 --> 00:10:24.410
every other atom in the system.

00:10:24.410 --> 00:10:26.512
And I didn't include
all of the atoms here,

00:10:26.512 --> 00:10:27.970
but basically just
to give an idea,

00:10:27.970 --> 00:10:31.360
they could be atoms that
are really far apart.

00:10:31.360 --> 00:10:32.880
So generally,
what's done is there

00:10:32.880 --> 00:10:35.490
is some cut off radius,
outside of which

00:10:35.490 --> 00:10:36.840
those forces aren't considered.

00:10:36.840 --> 00:10:38.750
And because the forces
drop off really fast,

00:10:38.750 --> 00:10:40.760
sometimes that's considered
to be good enough

00:10:40.760 --> 00:10:42.340
for a lot of simulations.

00:10:42.340 --> 00:10:45.035
So that's a consideration
to take into account

00:10:45.035 --> 00:10:48.100
when doing these simulations.

00:10:48.100 --> 00:10:52.190
OK, so here's the basic
molecular dynamics algorithm.

00:10:52.190 --> 00:10:54.580
So you specify how many
steps you want to run it to.

00:10:54.580 --> 00:11:00.085
And each time step
is one femtosecond.

00:11:03.280 --> 00:11:04.010
So very small.

00:11:06.920 --> 00:11:10.720
So usually what happens is
you can find the atom pairs.

00:11:10.720 --> 00:11:12.990
So that means all the
non-bonded interactions,

00:11:12.990 --> 00:11:15.560
you have to make a list,
basically an n squared list.

00:11:15.560 --> 00:11:17.080
If you have n atoms,
you can expect

00:11:17.080 --> 00:11:19.680
to have about n
squared atom pairs.

00:11:19.680 --> 00:11:22.150
But if you consider a cut
off, then what's usually done

00:11:22.150 --> 00:11:27.110
is only atom pairs
within that cut off

00:11:27.110 --> 00:11:30.350
are considered and
included into that list.

00:11:30.350 --> 00:11:32.140
And that computation
is pretty expensive,

00:11:32.140 --> 00:11:36.790
d it's sometimes only done only
every, say every ten steps.

00:11:36.790 --> 00:11:39.060
And atoms don't
really move that much.

00:11:39.060 --> 00:11:40.990
So it's usually pretty accurate.

00:11:40.990 --> 00:11:44.639
And if you use a cut off of, say
15 angstroms, and then, well,

00:11:44.639 --> 00:11:46.930
the forces have dropped to
almost zero by 12 angstroms,

00:11:46.930 --> 00:11:48.190
then that's pretty accurate.

00:11:48.190 --> 00:11:52.180
So this is usually
an optimization.

00:11:52.180 --> 00:11:55.210
It optimizes the
simulation a little bit.

00:11:55.210 --> 00:11:57.250
The next step is to
compute the bonded forces.

00:11:57.250 --> 00:11:59.140
So that includes the
bonds, angles, dihedrals

00:11:59.140 --> 00:12:00.160
and improper angles.

00:12:03.120 --> 00:12:04.800
And then compute the
non-bonded forces,

00:12:04.800 --> 00:12:07.310
so that iterates over
all the atom pairs.

00:12:07.310 --> 00:12:09.990
And then the integration is
done to obtain the new atom

00:12:09.990 --> 00:12:11.400
positions.

00:12:11.400 --> 00:12:13.500
So here's, running
on the PlayStation,

00:12:13.500 --> 00:12:18.040
this is kind of roughly how
this is just running on the PPU

00:12:18.040 --> 00:12:19.000
alone.

00:12:19.000 --> 00:12:23.350
So finding the atom pairs,
if I don't use a cut off,

00:12:23.350 --> 00:12:25.405
I can just compute the
atom pairs list once,

00:12:25.405 --> 00:12:26.530
and then never do it again.

00:12:26.530 --> 00:12:29.130
So that's why I put here 0.

00:12:29.130 --> 00:12:32.070
But otherwise, generally
takes a lot of time

00:12:32.070 --> 00:12:33.790
to compute the atom
pairs, because it

00:12:33.790 --> 00:12:38.650
involves getting the distances
between every two atoms.

00:12:38.650 --> 00:12:46.350
And this is a pretty big
system, maybe about 1,000 atoms.

00:12:46.350 --> 00:12:47.670
So that's why.

00:12:47.670 --> 00:12:50.650
So these times are not really
representative of smaller

00:12:50.650 --> 00:12:51.772
systems.

00:12:51.772 --> 00:12:53.500
AUDIENCE: What is
BF [INAUDIBLE]?

00:12:53.500 --> 00:12:54.990
GREG PINTILIE:
BF, so that's done

00:12:54.990 --> 00:12:56.320
using the brute force approach.

00:12:56.320 --> 00:12:58.700
So that means that you
check every two atoms.

00:12:58.700 --> 00:13:02.990
And this is using a KD tree,
so that breaks space up.

00:13:06.120 --> 00:13:12.621
So then checking the
neighboring atoms is n log n.

00:13:12.621 --> 00:13:14.114
Or it's actually
a constant factor

00:13:14.114 --> 00:13:15.280
after you've built the tree.

00:13:15.280 --> 00:13:16.690
Building the tree is n log n.

00:13:16.690 --> 00:13:18.520
So using that kind
of an algorithm,

00:13:18.520 --> 00:13:20.740
you get much better improvement.

00:13:20.740 --> 00:13:25.680
So this is a scaling of
increasing the system size

00:13:25.680 --> 00:13:27.650
by two in the number of atoms.

00:13:27.650 --> 00:13:30.010
So say going from
1,000 to 2,000 atoms.

00:13:30.010 --> 00:13:34.670
The time too, using a
brute force approach,

00:13:34.670 --> 00:13:35.840
increases exponentially.

00:13:35.840 --> 00:13:39.780
Whereas, using a KD tree, it
actually just about doubles.

00:13:39.780 --> 00:13:44.334
So that kind of shows that using
a better algorithm sometimes

00:13:44.334 --> 00:13:46.000
for these things is
probably much better

00:13:46.000 --> 00:13:49.590
than trying to parallelize it.

00:13:49.590 --> 00:13:54.807
Doing the bonded forces,
say about 50 milliseconds,

00:13:54.807 --> 00:13:57.390
and with cut off it's the same
time, because it doesn't really

00:13:57.390 --> 00:13:58.700
affect the step.

00:13:58.700 --> 00:14:01.480
But then computing
the non-bonded forces

00:14:01.480 --> 00:14:03.640
is actually the
biggest chunk of time.

00:14:03.640 --> 00:14:07.870
So that takes about 1.4
seconds for the system.

00:14:07.870 --> 00:14:12.760
And then this is
considering a million pairs.

00:14:12.760 --> 00:14:15.090
With a cut off there is
less pairs to consider,

00:14:15.090 --> 00:14:18.622
but the time is still the
dominant time basically.

00:14:18.622 --> 00:14:20.580
So generally, for molecular
dynamic simulations

00:14:20.580 --> 00:14:22.620
this is the dominant
time factor.

00:14:22.620 --> 00:14:24.260
And that's what people
really focus on,

00:14:24.260 --> 00:14:25.990
trying to speed things up.

00:14:25.990 --> 00:14:28.460
And then the integration
is really fast,

00:14:28.460 --> 00:14:31.480
because it just loops
over the atoms once.

00:14:31.480 --> 00:14:34.370
So there is basically three
different parallelization

00:14:34.370 --> 00:14:37.440
approaches that I've come
across in the literature.

00:14:37.440 --> 00:14:40.950
And one of them is referred
to as force decomposition.

00:14:40.950 --> 00:14:44.420
So if you think of a
force operation being,

00:14:44.420 --> 00:14:47.850
say a non-bonded force operation
where you consider two atoms,

00:14:47.850 --> 00:14:51.080
and you compute the force
the two, van-der-Waals

00:14:51.080 --> 00:14:56.330
and electrostatic forces,
one easy way to split this up

00:14:56.330 --> 00:15:01.340
amongst, say six processors,
is to just give each processor,

00:15:01.340 --> 00:15:05.045
say n divided by 6 of
these force operations.

00:15:08.440 --> 00:15:12.170
So you have to send each
processor both atom positions,

00:15:12.170 --> 00:15:15.120
and then some other properties
of the atoms, such as the mass

00:15:15.120 --> 00:15:19.205
and-- actually not the mass,
but the charges and the minimum

00:15:19.205 --> 00:15:19.705
radius.

00:15:22.250 --> 00:15:24.950
And basically, once that
computation is done,

00:15:24.950 --> 00:15:28.090
the processor just has to
either store or return the force

00:15:28.090 --> 00:15:29.530
on both atom.

00:15:29.530 --> 00:15:32.514
And it has to come back to
sort of like the main processor

00:15:32.514 --> 00:15:33.930
where all the
forces are added up,

00:15:33.930 --> 00:15:35.304
and then the
integration is done.

00:15:39.920 --> 00:15:43.240
So what I just described is
sort of the approach I took.

00:15:43.240 --> 00:15:45.690
And it's sort of
an easy approach

00:15:45.690 --> 00:15:49.270
to sort of think of
if you couldn't store,

00:15:49.270 --> 00:15:51.350
say every atom on
every processor.

00:15:51.350 --> 00:15:53.620
Because you just have
to send each processor

00:15:53.620 --> 00:15:56.160
a certain number of these
force computation units.

00:15:56.160 --> 00:15:58.630
And the processor does then
and then returns the forces.

00:15:58.630 --> 00:16:01.480
And then you can just add up
all the forces on every atom

00:16:01.480 --> 00:16:03.110
and do that.

00:16:03.110 --> 00:16:06.210
So basically here is the
algorithm how it works roughly.

00:16:06.210 --> 00:16:10.187
And so basically,
first set up the SPUs,

00:16:10.187 --> 00:16:12.020
send the control box
with, say the addresses

00:16:12.020 --> 00:16:15.650
where a these force
operations will be stored.

00:16:15.650 --> 00:16:20.090
The bonded forces are
computed on the PPU,

00:16:20.090 --> 00:16:23.835
just for simplicity for now.

00:16:23.835 --> 00:16:25.960
And then the computation
for the non-bonded forces,

00:16:25.960 --> 00:16:30.890
basically, all the non-bonded
forces operations remaining

00:16:30.890 --> 00:16:31.987
are considered.

00:16:31.987 --> 00:16:33.570
And then they're
split up into blocks.

00:16:33.570 --> 00:16:35.980
So each SPU sent
a block with, say

00:16:35.980 --> 00:16:37.690
a number of force operations.

00:16:37.690 --> 00:16:40.480
And in this case it was 200.

00:16:40.480 --> 00:16:43.320
So the control block
is sent to the SPU,

00:16:43.320 --> 00:16:46.220
the SPU is told to
start processing.

00:16:46.220 --> 00:16:49.370
And then the PPU waits
for each SPU to finish.

00:16:49.370 --> 00:16:52.320
And then once the
SPU is finished,

00:16:52.320 --> 00:16:55.417
the forces are added to the
two atoms that are involved

00:16:55.417 --> 00:16:56.500
in that force computation.

00:16:59.100 --> 00:17:02.250
And then this loop is done
once all the SPUs are finished.

00:17:02.250 --> 00:17:03.740
And then it goes,
in this repeats

00:17:03.740 --> 00:17:06.700
until there is non-bonded
operations remaining.

00:17:06.700 --> 00:17:09.596
So for example here,
just to illustrate this,

00:17:09.596 --> 00:17:11.720
let's say these are all
the non-bonded operations I

00:17:11.720 --> 00:17:12.800
have to do.

00:17:12.800 --> 00:17:15.310
And I'm going to use two SPUs.

00:17:15.310 --> 00:17:18.289
And each SPU can only store,
say three force operations.

00:17:20.890 --> 00:17:22.730
So each SPU is sent by a block.

00:17:22.730 --> 00:17:24.520
I mean that, say
these three operations

00:17:24.520 --> 00:17:25.435
are sent to each SPU.

00:17:25.435 --> 00:17:28.560
The SPU does them.

00:17:28.560 --> 00:17:31.720
The next block is sent to the
next SPU at the same time,

00:17:31.720 --> 00:17:33.510
and then that SPU also does it.

00:17:33.510 --> 00:17:36.040
And this is iteration
0 for example.

00:17:36.040 --> 00:17:38.860
And this is done until all
of the force computations

00:17:38.860 --> 00:17:40.500
are done.

00:17:40.500 --> 00:17:41.590
So here's a timing.

00:17:41.590 --> 00:17:44.680
So you might imagine
that, as I said,

00:17:44.680 --> 00:17:49.540
there is there is a
lot of communication

00:17:49.540 --> 00:17:52.410
overhead over here.

00:17:52.410 --> 00:17:55.360
So here's how this performs.

00:17:55.360 --> 00:18:01.080
So on the PPU, the MD
simulation per time step,

00:18:01.080 --> 00:18:04.320
runs in 310 milliseconds.

00:18:04.320 --> 00:18:09.580
On one SPU, the time was
up to 630 milliseconds.

00:18:09.580 --> 00:18:12.350
So there is a lot of
communication overhead here.

00:18:12.350 --> 00:18:15.400
On two SPUs it goes
back down to 390.

00:18:15.400 --> 00:18:18.560
And as the number of
SPUs is increased,

00:18:18.560 --> 00:18:19.620
it gets lower and lower.

00:18:19.620 --> 00:18:23.310
But it's not that much
better than really running it

00:18:23.310 --> 00:18:24.400
sequentially.

00:18:24.400 --> 00:18:28.120
And the reason is that the way
I implemented is pretty rough.

00:18:28.120 --> 00:18:32.130
One of the limitations, I mean
one very obvious optimization

00:18:32.130 --> 00:18:36.190
that should be done
is that SPU basically

00:18:36.190 --> 00:18:38.350
waits for this whole
block to arrive,

00:18:38.350 --> 00:18:39.624
and then starts computation.

00:18:39.624 --> 00:18:41.040
But really what I
should have done

00:18:41.040 --> 00:18:47.090
is once one force
computation arrives,

00:18:47.090 --> 00:18:50.080
you can already do
that computation.

00:18:50.080 --> 00:18:52.320
Instead of waiting for all
of them to be transferred

00:18:52.320 --> 00:18:53.903
and then starting
to do the operation.

00:18:53.903 --> 00:18:56.540
So that I think that
could've been a lot better.

00:18:56.540 --> 00:18:57.534
AUDIENCE: [INAUDIBLE]

00:18:57.534 --> 00:18:58.031
GREG PINTILIE: No.

00:18:58.031 --> 00:18:58.906
AUDIENCE: [INAUDIBLE]

00:19:03.030 --> 00:19:03.780
GREG PINTILIE: No.

00:19:06.790 --> 00:19:11.830
So right, so this is sort
of the simplest approach,

00:19:11.830 --> 00:19:14.670
I mean sort of a simple
approach to implement

00:19:14.670 --> 00:19:15.620
on this architecture.

00:19:15.620 --> 00:19:18.120
And it sort of scales
well as the system grows.

00:19:18.120 --> 00:19:20.250
So if you can't store
every item, for example,

00:19:20.250 --> 00:19:25.660
in every SPU, this
doesn't really care.

00:19:25.660 --> 00:19:28.177
The system can grow
as large as you want.

00:19:28.177 --> 00:19:29.760
There's some other
approaches that are

00:19:29.760 --> 00:19:31.270
done sort of in the literature.

00:19:31.270 --> 00:19:34.070
I didn't implement
these, but another one

00:19:34.070 --> 00:19:35.410
is called atomic decomposition.

00:19:35.410 --> 00:19:37.570
So here, if you have,
say a number of atoms,

00:19:37.570 --> 00:19:41.140
you send each one of
these atoms to one SPU.

00:19:41.140 --> 00:19:45.370
And then that SPU is solely
responsible for computing

00:19:45.370 --> 00:19:49.250
the forces on it,
and also integrating

00:19:49.250 --> 00:19:50.550
to get the new positions.

00:19:50.550 --> 00:19:52.060
And then all you
have to communicate

00:19:52.060 --> 00:19:53.471
is the atomic positions.

00:19:53.471 --> 00:19:55.220
But this is a little
bit more complicated,

00:19:55.220 --> 00:20:00.500
because then every
SPU has to know

00:20:00.500 --> 00:20:02.670
about all of the
force computations

00:20:02.670 --> 00:20:04.190
that you want to do.

00:20:04.190 --> 00:20:06.610
So for example, in this
case it's just a non-bonded,

00:20:06.610 --> 00:20:10.151
but the SPU also
has to know about--

00:20:10.151 --> 00:20:12.151
AUDIENCE: So that would
be like the same example

00:20:12.151 --> 00:20:14.520
that we did in [INAUDIBLE].

00:20:14.520 --> 00:20:18.552
GREG PINTILIE:
Yeah, it would be.

00:20:18.552 --> 00:20:20.260
Yeah, the only problem
with that approach

00:20:20.260 --> 00:20:23.220
is that if the system was
much larger than the SPU could

00:20:23.220 --> 00:20:28.450
store, and you couldn't store
every single atom on the SPU,

00:20:28.450 --> 00:20:34.245
then an operation might refer
to atoms that are not there.

00:20:37.170 --> 00:20:39.010
So yeah, it would
be similar to that.

00:20:39.010 --> 00:20:40.950
But it would get more
complicated in this case

00:20:40.950 --> 00:20:42.450
as a system grew,
because you'd have

00:20:42.450 --> 00:20:44.650
to keep track of where
each atom is and then

00:20:44.650 --> 00:20:45.870
implement communication.

00:20:45.870 --> 00:20:48.120
If you don't have
a certain atom,

00:20:48.120 --> 00:20:49.840
to get to that atom
position and use it

00:20:49.840 --> 00:20:52.020
in the force computation.

00:20:52.020 --> 00:20:56.230
So yeah, this approach, OK,
I'll talk a little bit more

00:20:56.230 --> 00:20:56.735
about this.

00:20:56.735 --> 00:20:57.931
This is a--

00:20:57.931 --> 00:21:03.710
AUDIENCE: [INAUDIBLE] I mean
not SPU, but [INAUDIBLE].

00:21:03.710 --> 00:21:05.257
GREG PINTILIE:
That's true, yeah.

00:21:05.257 --> 00:21:07.340
So you only sort of give
a certain number of atoms

00:21:07.340 --> 00:21:09.860
to every SPU, and then, yeah.

00:21:09.860 --> 00:21:10.360
That's true.

00:21:10.360 --> 00:21:12.940
And then you would have to sort
of store every single force

00:21:12.940 --> 00:21:17.243
that that atom is
included in on the SPU.

00:21:17.243 --> 00:21:18.118
AUDIENCE: [INAUDIBLE]

00:21:21.790 --> 00:21:26.390
GREG PINTILIE: Yeah, like when
this will become a problem is,

00:21:26.390 --> 00:21:29.530
say the system was much larger
than you had local store.

00:21:29.530 --> 00:21:34.130
Then say you had like
1/10 of the system on SPUs

00:21:34.130 --> 00:21:35.900
and then 9/10 of the
system on the PPU.

00:21:35.900 --> 00:21:38.404
AUDIENCE: [INAUDIBLE]

00:21:38.404 --> 00:21:40.320
GREG PINTILIE: Yeah, it
would be hard to think

00:21:40.320 --> 00:21:41.278
of it scaling properly.

00:21:44.950 --> 00:21:47.640
So a special case of
this atomic decomposition

00:21:47.640 --> 00:21:49.370
is actually spatial
decomposition.

00:21:49.370 --> 00:21:51.184
OK, I'm almost done.

00:21:51.184 --> 00:21:52.975
So a special case of
spatial decomposition.

00:21:55.710 --> 00:21:58.510
So what makes this
approach sort of not good,

00:21:58.510 --> 00:22:01.510
and it sort of involves
a lot of communication,

00:22:01.510 --> 00:22:04.920
is just this fact that some
of these force computations,

00:22:04.920 --> 00:22:07.130
or force operations that
you have could include atoms

00:22:07.130 --> 00:22:08.320
on different units.

00:22:08.320 --> 00:22:11.130
So if you don't split up
these atoms sort of smartly,

00:22:11.130 --> 00:22:13.900
then you could end up
having to communicate a lot

00:22:13.900 --> 00:22:15.330
while computing these forces.

00:22:15.330 --> 00:22:17.410
So spatial decomposition
tries to address this.

00:22:17.410 --> 00:22:21.590
And basically, you store atoms
that are close to each other.

00:22:21.590 --> 00:22:25.830
So say I had the system, and
I wanted to split this up

00:22:25.830 --> 00:22:30.315
on six SPUs, then right,
I would give each SPU,

00:22:30.315 --> 00:22:31.834
say a certain area of space.

00:22:31.834 --> 00:22:34.250
But you could see right away
what the problem with this is

00:22:34.250 --> 00:22:36.010
is that it's not load balanced.

00:22:36.010 --> 00:22:39.749
So some SPUs could not
have too much work to do.

00:22:39.749 --> 00:22:41.790
And there's still some
communication, because you

00:22:41.790 --> 00:22:45.060
have to communicate
between neighboring SPUs,

00:22:45.060 --> 00:22:46.770
say some certain cases.

00:22:46.770 --> 00:22:48.620
But the communication
is still probably

00:22:48.620 --> 00:22:51.000
a lot better than in
the previous case,

00:22:51.000 --> 00:22:52.250
just for atomic decomposition.

00:22:55.230 --> 00:22:58.910
So here's the state of the
art on parallelization MD.

00:22:58.910 --> 00:23:01.260
And it's implemented
in this program called

00:23:01.260 --> 00:23:05.340
NAMD, which stands for Not
Another Molecular Dynamics

00:23:05.340 --> 00:23:06.600
simulator.

00:23:06.600 --> 00:23:12.630
So initially, NAMD 1.0 used
solely a spatial decomposition.

00:23:12.630 --> 00:23:14.472
And they try to address
this load balancing

00:23:14.472 --> 00:23:16.430
by actually creating many
more patches than you

00:23:16.430 --> 00:23:18.870
have processors.

00:23:18.870 --> 00:23:22.260
And then you just sort of
split these patches up,

00:23:22.260 --> 00:23:24.650
load balance them amongst
the different processors.

00:23:24.650 --> 00:23:25.650
That didn't really work.

00:23:25.650 --> 00:23:28.100
So NAMD 1 didn't
parallelize very well.

00:23:28.100 --> 00:23:30.580
So what they added
later was they

00:23:30.580 --> 00:23:33.130
added this idea
of patch objects,

00:23:33.130 --> 00:23:34.610
and also compute objects.

00:23:34.610 --> 00:23:38.710
So basically a patch
object puts a number

00:23:38.710 --> 00:23:40.346
of patches on a
processor, but then you

00:23:40.346 --> 00:23:43.550
can also create compute
patches, which include

00:23:43.550 --> 00:23:45.450
all of these force operations.

00:23:45.450 --> 00:23:47.760
And these force operations
are also equally distributed

00:23:47.760 --> 00:23:48.680
amongst processors.

00:23:48.680 --> 00:23:50.180
And then there's a
lot communication

00:23:50.180 --> 00:23:51.030
between these two.

00:23:51.030 --> 00:23:56.410
And it started off with sort
message oriented communication.

00:23:56.410 --> 00:23:59.330
Then it moved to a dependency
oriented communication

00:23:59.330 --> 00:24:01.810
so that each compute
object would sort

00:24:01.810 --> 00:24:05.760
of signal which atoms it needs.

00:24:05.760 --> 00:24:09.306
And it also sort
of optimized things

00:24:09.306 --> 00:24:12.770
by doing communication and force
computation at the same time.

00:24:12.770 --> 00:24:14.160
So it did that really well.

00:24:14.160 --> 00:24:17.170
And actually it had really
good parallel performance.

00:24:17.170 --> 00:24:21.450
So on this graph it shows
the number of processors

00:24:21.450 --> 00:24:23.560
times the time per time step.

00:24:23.560 --> 00:24:28.800
And as the number of
processors is increased,

00:24:28.800 --> 00:24:30.290
I mean optimal
parallel performance

00:24:30.290 --> 00:24:33.040
would be a straight horizontal
line, which is pretty

00:24:33.040 --> 00:24:34.520
close to what they achieve.

00:24:34.520 --> 00:24:36.830
And they also did it on
many thousands of processors

00:24:36.830 --> 00:24:37.520
on Blue Gene.

00:24:37.520 --> 00:24:40.080
But as you might expect,
the more processors

00:24:40.080 --> 00:24:44.600
you add, the step time actually
still sort of levels off

00:24:44.600 --> 00:24:46.290
at a certain time.

00:24:46.290 --> 00:24:50.864
So I guess I'll skip the sort
of live demo I was going to do,

00:24:50.864 --> 00:24:52.280
but I'll just show
one more thing.

00:24:55.930 --> 00:25:00.494
So here is sort of the kind of
systems that I was looking at.

00:25:00.494 --> 00:25:01.910
This is actually
a simulation that

00:25:01.910 --> 00:25:05.560
was done on one of the
PPUs and then recorded

00:25:05.560 --> 00:25:07.060
in positions file.

00:25:07.060 --> 00:25:13.610
And there is five
different molecules here.

00:25:13.610 --> 00:25:15.640
So generally, what happens
is first they're just

00:25:15.640 --> 00:25:16.460
placed randomly.

00:25:16.460 --> 00:25:18.700
Then they're an overlap
just very coarsely,

00:25:18.700 --> 00:25:22.360
to make sure that
huge forces are not

00:25:22.360 --> 00:25:23.507
entered into the system.

00:25:23.507 --> 00:25:25.215
And I'll just color
each one differently.

00:25:27.800 --> 00:25:30.960
So here's what happens
during the simulation.

00:25:30.960 --> 00:25:38.310
And here, in this simulation,
actually, the positions

00:25:38.310 --> 00:25:40.460
were recorded every
20 time steps.

00:25:40.460 --> 00:25:44.070
So that's why it sort of
seems to go a lot faster.

00:25:44.070 --> 00:25:46.646
If I did it on a
single molecule,

00:25:46.646 --> 00:25:48.385
I actually done that first.

00:25:48.385 --> 00:25:50.630
So here what a single
molecule of this looks like.

00:25:50.630 --> 00:25:54.130
And this is a protein with eight
residues, which is pretty big.

00:25:54.130 --> 00:25:56.680
And this is running in real
time on my computer right now.

00:25:56.680 --> 00:26:03.020
So like on a small system like
this, it runs reasonably fast.

00:26:03.020 --> 00:26:06.630
But as you add more and
more of these things,

00:26:06.630 --> 00:26:08.700
the computations do
get pretty expensive.

00:26:08.700 --> 00:26:12.930
So parallelization really helps.

00:26:12.930 --> 00:26:16.634
Well, not in what I
implemented, but generally.

00:26:16.634 --> 00:26:19.990
Generally, as you saw, NAMD
can do a pretty good job

00:26:19.990 --> 00:26:22.104
at doing it.

00:26:22.104 --> 00:26:25.400
Oh OK, I didn't minimize
the system well enough.

00:26:25.400 --> 00:26:27.524
But yeah, that's about it.

00:26:27.524 --> 00:26:28.690
AUDIENCE: Great. [INAUDIBLE]

00:26:35.550 --> 00:26:37.510
AUDIENCE: So
basically, on the SPUs,

00:26:37.510 --> 00:26:40.940
you managed to get them
synchronized properly, data in

00:26:40.940 --> 00:26:42.372
and data out?

00:26:42.372 --> 00:26:43.580
GREG PINTILIE: Yeah, exactly.

00:26:43.580 --> 00:26:45.537
So that's kind of to
the extent that I went.

00:26:45.537 --> 00:26:46.870
And that still took a long time.

00:26:46.870 --> 00:26:48.670
So [INAUDIBLE].