Attachment C3

Recent Experiences with the Cray T3E_600
at the SDSC

Joel E. Tohline, John Cazes, and Patrick Motl

Department of Physics & Astronomy
Louisiana State University


Scalability of the CFD Code

Performance Measures using the Portland Group Compiler (PGHPF):

Table 1 details our execution times on various configurations of the T3E and on a variety of different problem sizes. As the table illustrates, we realize almost perfect linear speedup as we move from 2 nodes to 128 nodes on the T3E as long as the size of our problem doubles each time the number of accessed nodes is doubled. This represents significantly better scaling than we previously have been able to achieve on the SP-2.

Table 1

CFD Code Timings on the SDSC T3E_600a
Using PGHPF
(seconds per integration timestep)
643 128×642 1282×64 1283 256×1282 2562×128 2563
2 26.60 -- -- -- -- -- --
4 12.38 22.96 -- -- -- -- --
8 6.98 b12.49 23.75 -- -- -- --
16 3.84 7.18 13.21 23.89 -- -- --
32 2.07 c3.75 6.97 12.49 24.09 -- --
64 -- -- 3.86 7.07 12.69 -- --
128 -- -- -- 3.95 -- 13.54 24.64



Performance Measures using mpi:

Table 2 details our execution times on various configurations of the T3E and on a variety of different problem sizes. As the table illustrates, we realize almost perfect linear speedup as we move from 2 nodes to 128 nodes on the T3E as long as the size of our problem doubles each time the number of accessed nodes is doubled.

Table 2

CFD Code Timings on the SDSC T3E_600a
Using mpi
(seconds per integration timestep)
662×64 662×128 130×66×128 1302×128 1302×256 258×130×256 2582×256 2582×512
4 2.456 4.945 9.212 -- -- -- -- --
8 1.468 b2.978 5.078 10.07 -- -- -- --
16 0.775 1.630 2.711 5.211 11.37 -- -- --
32 0.471 c0.968 1.584 3.027 6.573 11.32 -- --
64 -- -- 0.878 1.617 3.493 5.983 11.40 --
128 -- -- -- 0.968 2.057 3.453 6.548 15.19



Speedup of mpi over PGHPF:

Table 3 provides a brief comparison between the numbers in Table 1 and the numbers in Table 2 in order to show at a glance how much the execution time of our CFD code has been improved by changed from HPF to mpi. In order to derive the numbers shown in Table 3, we have divided the red numbers along the diagonal column in Table 1 by their respective numbers along the diagonal in Table 2, and have adjusted the ratio to take into account the fact that the grid sizes are not identical in the two tables.

Table 3

CFD Code Timings on the SDSC T3E_600a
Ratio of PGHPF timings to mpi timings
643 128×642 1282×64 1283 256×1282 2562×128 2563
4 5.36 -- -- -- -- -- --
8 -- 4.46 -- -- -- -- --
16 -- -- 5.10 -- -- -- --
32 -- -- -- 4.26 -- -- --
64 -- -- -- -- 3.75 -- --
128 -- -- -- -- -- 4.01 --



Scalability of the Gravitational CFD Code

Performance Measures using mpi:

Table 4 details our execution times on various configurations of the T3E and on a variety of different problem sizes precisely as reported in Tabel2, but here the timings include a solution of the global Poisson equation along with the CFD code.

Table 4

Gravitational CFD Code Timings on the SDSC T3E_600a
Using mpi
(seconds per integration timestep)
662×64 662×128 130×66×128 1302×128 1302×256 258×130×256 2582×256
4 3.552 7.016 -- -- -- -- --
8 2.050 4.122 -- -- -- -- --
16 1.118 2.237 4.008 -- -- -- --
32 0.6562 1.311 2.269 4.394 -- -- --
64 -- -- 1.280 2.381 4.850 8.638 --
128 -- -- -- 1.398 2.832 4.848 --



FOOTNOTES:

aTo obtain the execution times reported in this table, the hydrocode was run for 200 integration timesteps utilizing the grid resolution specified at the top of each column of the table.

b For comparison (see Table 1 for details), running the same size problem (128 × 642) on a single node of a Cray Y/MP requires 13.30 cpu seconds per integration timestep.

c For comparison (see Table 1 for details), running the same size problem (128 × 642) on a single node of a Cray C90 requires 4.01 cpu seconds per integration timestep, and on an 8,192-node MasPar MP-1 requires 4.74 cpu seconds per integration timestep.


Body of Proposal