Attachment C3

Recent Experiences with the Cray T3E_600
at the SDSC

Joel E. Tohline, John Cazes, and Patrick Motl

Scalability of the CFD Code

Table 1 details our execution times on various configurations of the T3E and on a variety of different problem sizes. As the table illustrates, we realize almost perfect linear speedup as we move from 2 nodes to 128 nodes on the T3E as long as the size of our problem doubles each time the number of accessed nodes is doubled. This represents significantly better scaling than we previously have been able to achieve on the SP-2.

Table 2 details our execution times on various configurations of the T3E and on a variety of different problem sizes. As the table illustrates, we realize almost perfect linear speedup as we move from 2 nodes to 128 nodes on the T3E as long as the size of our problem doubles each time the number of accessed nodes is doubled.

Table 3 provides a brief comparison between the numbers in Table 1 and the numbers in Table 2 in order to show at a glance how much the execution time of our CFD code has been improved by changed from HPF to mpi. In order to derive the numbers shown in Table 3, we have divided the red numbers along the diagonal column in Table 1 by their respective numbers along the diagonal in Table 2, and have adjusted the ratio to take into account the fact that the grid sizes are not identical in the two tables.

Scalability of the Gravitational CFD Code

Table 4 details our execution times on various configurations of the T3E and on a variety of different problem sizes precisely as reported in Tabel2, but here the timings include a solution of the global Poisson equation along with the CFD code.

aTo obtain the execution times reported in this table, the hydrocode was run for 200 integration timesteps utilizing the grid resolution specified at the top of each column of the table.

b For comparison (see Table 1 for details), running the same size problem (128 × 642) on a single node of a Cray Y/MP requires 13.30 cpu seconds per integration timestep.

c For comparison (see Table 1 for details), running the same size problem (128 × 642) on a single node of a Cray C90 requires 4.01 cpu seconds per integration timestep, and on an 8,192-node MasPar MP-1 requires 4.74 cpu seconds per integration timestep.