Tohline: Real-Time Analysis

Joel E. Tohline

Alumni Professor
Department of Physics & Astronomy
Louisiana State University

Real-Time Analysis

Let's outline a simulation scenario that would permit us to view and analyze results in real time while sitting in a CAVE visualization environment, for example. We'll base our estimates on the current throughput of our hydrocode (FLOW•ER) running on LSU's 'SuperMike' or NCSA's 'tungsten' linux clusters.

Binary Mass-Transfer Simulation

Preface: We are accustomed to generating animation sequences that contain 120 image frames per binary orbit. Because each animation sequence plays at 30 frames/second this means that, while watching each movie, the binary system completes each orbit in 4 seconds. This seems to be a good movie pace when the simulation is viewed from a frame that is rotating with the frequency of the orbit because, when the system is viewed from this rotating frame of reference, not much action happens on timescales shorter than a few orbits. However, at present, it does not appear to be feasible to produce a movie of this type (4 seconds per orbit) in real time.

Instead, let's consider viewing the simulation from an inertial frame of reference, in which case the system as a whole undergoes a great deal of change during each orbit (each star wanders all the way around its orbit) so the "movie" should be interesting even if we stretch it out so that a single orbit requires one minute (60 seconds) to complete. Therefore, the first question is, "Can the hydrodynamic simulation be carried out at a pace where a single orbit is completed in one minute of wall-clock time?"

According to Mario D'Souza, a simulation (of a q = 0.4 binary system) conducted with a grid resolution of 256³ on 256 SuperMike processors requires one second of wall-clock time to complete one integration time step, and 10⁵ time steps are required to push the system through one binary orbit. This means that each orbit requires 28 wall-clock hours (!) — i.e., approximately 1700 minutes. This is way too slow for real-time analysis! In order to get down to approximately 1 minute of wall-clock time per orbit, we need to speed up the simulation by a factor of approximately 1700. Let's do this as follows:

Run the simulation on the new IPM P575 system; this will give us a factor of 5 speed up, for the same number of processors (i.e., 256).

Run on all five LONI machines in sync with LSU's "pelican" sytem; this will give us approximately 600 processors (compared to 256), which will give us another factor of 2.3.

If we decrease the grid resolution by a factor of two, FLOW•ER should complete each time step 2³ = 8 times faster, and the binary system should require half as many time steps to complete each orbit because each time step will be twice as large (due to a doubling of the Courant time). So, let's drop the grid resolution by a factor of 4 (instead of just 2) in each direction and we should gain a factor of 16² = 256.

Based on these estimates, a simulation that is run with a grid resolution of 64³ on 600 processors of the LONI machines should speed up by a factor of approximately (5 × 2.3 × 256) = 2944. This is more than what we need (or it provides a nice margin of error for our estimates). Such a simulation should require only about 35 wall-clock seconds to complete one binary orbit (and each orbit will require about 10⁵/4 = 25,000 time steps).

Things to consider: Each LONI-machine processor will contain approximately 64³/600 = 440 fluid grid cells; will this fit entirely into cache? Should we reconsider how data domain decomposition is done? Will we be killed by the "transpose" step inside the Poisson solver?

Passing Data to the CAVE

If the "image" that is produced for the CAVE is updated 30 times each second, each binary orbit will then require the production of approximately (35 × 30) = 1000 images, that is, an image must be constructed by FLOW•ER approximately every 25 time steps. Since the "image" that will be sent to the CAVE is actually VRML data, this means that Wes Even's marching cubes algorithm will be called approximately once every 25 integration time steps.

How many vertices will be generated by each processor? Well, typically each isodensity surface requires 10⁴ - 10⁵ vertices, which means that, on average, each of the (600) processors will need to generate 17 - 170 vertices. (Actually, the load balance will not likely be good because of surface-to-volume issues.)

After discussions with Richard Muffoletto, its seems like the most efficient way to get these vertices to the CAVE is to let each processor send its information directly across the LONI network asynchronously, rather than waiting for all of the processors to finish and gathering the data together into one location before transferring the data to the CAVE. An asynchronous transmission will be beneficial because (a) it will keep the network active a larger fraction of the time, and (b) it will take advantage of the fact that each processor will finish the vertex-generation step at different times because processors will have varying numbers (from 0 to several hundred) of vertices to create.

We might also consider the following: The CAVE will need a new surface every 1/30^th of a second. This means that it has 1/30^th of a second over which to gather the new set of vertices together, during which time FLOW•ER will take approximately 25 time steps. So instead of calling Wes Even's program only once every 25 time steps, why not have 1/25^th of the processors (on each LONI machine) call Wes Even's program every time step (of course, a different subset of processors will be activated each time)? This will cut down on network contentions within the LONI machines (only 4 processors on each machine will be sending data down the network each time step!).

And, as Richard Muffoletto points out, it is not absolutely essential that a completely new iso-surface be available for the CAVE to display every 1/30^th of a second. The surface will not change its shape very rapidly, so partial (incomplete) updates probably won't interfere with the viewer's perception that things are changing gradually. The CAVE client will simply need to keep track of which one of the 600 LONI processors has just sent a new set of vertices and it can replace that subset. (Overall synchronization of the image will occur naturally because the fluid flow must already be synchronized within FLOW•ER every integration time step!)

— forecast made in July, 2005