SDSC Timings with mpi

Attachment C3

Recent Experiences with the Cray T3E_600
at the SDSC

Joel E. Tohline, John Cazes, and Patrick Motl

Department of Physics & Astronomy
Louisiana State University

Scalability of the CFD Code

Performance Measures using the Portland Group Compiler (PGHPF):

Table 1 details our execution times on various configurations of the T3E and on a variety of different problem sizes. As the table illustrates, we realize almost perfect linear speedup as we move from 2 nodes to 128 nodes on the T3E as long as the size of our problem doubles each time the number of accessed nodes is doubled. This represents significantly better scaling than we previously have been able to achieve on the SP-2.

Table 1

CFD Code Timings on the SDSC T3E_600^a
Using PGHPF
(seconds per integration timestep)

64³ 128×64² 128²×64 128³ 256×128² 256²×128 256³

2 26.60 -- -- -- -- -- --

4 12.38 22.96 -- -- -- -- --

8 6.98 ^b12.49 23.75 -- -- -- --

16 3.84 7.18 13.21 23.89 -- -- --

32 2.07 ^c3.75 6.97 12.49 24.09 -- --

64 -- -- 3.86 7.07 12.69 -- --

128 -- -- -- 3.95 -- 13.54 24.64

	CFD Code Timings on the SDSC T3E_600^a Using PGHPF (seconds per integration timestep)
64³	128×64²	128²×64	128³	256×128²	256²×128	256³
2	26.60	--	--	--	--	--	--
4	12.38	22.96	--	--	--	--	--
8	6.98	^b12.49	23.75	--	--	--	--
16	3.84	7.18	13.21	23.89	--	--	--
32	2.07	^c3.75	6.97	12.49	24.09	--	--
64	--	--	3.86	7.07	12.69	--	--
128	--	--	--	3.95	--	13.54	24.64

Performance Measures using mpi:

Table 2 details our execution times on various configurations of the T3E and on a variety of different problem sizes. As the table illustrates, we realize almost perfect linear speedup as we move from 2 nodes to 128 nodes on the T3E as long as the size of our problem doubles each time the number of accessed nodes is doubled.

Table 2

CFD Code Timings on the SDSC T3E_600^a
Using mpi
(seconds per integration timestep)

66²×64 66²×128 130×66×128 130²×128 130²×256 258×130×256 258²×256 258²×512

4 2.456 4.945 9.212 -- -- -- -- --

8 1.468 ^b2.978 5.078 10.07 -- -- -- --

16 0.775 1.630 2.711 5.211 11.37 -- -- --

32 0.471 ^c0.968 1.584 3.027 6.573 11.32 -- --

64 -- -- 0.878 1.617 3.493 5.983 11.40 --

128 -- -- -- 0.968 2.057 3.453 6.548 15.19

	CFD Code Timings on the SDSC T3E_600^a Using mpi (seconds per integration timestep)
66²×64	66²×128	130×66×128	130²×128	130²×256	258×130×256	258²×256	258²×512
4	2.456	4.945	9.212	--	--	--	--	--
8	1.468	^b2.978	5.078	10.07	--	--	--	--
16	0.775	1.630	2.711	5.211	11.37	--	--	--
32	0.471	^c0.968	1.584	3.027	6.573	11.32	--	--
64	--	--	0.878	1.617	3.493	5.983	11.40	--
128	--	--	--	0.968	2.057	3.453	6.548	15.19

Speedup of mpi over PGHPF:

Table 3 provides a brief comparison between the numbers in Table 1 and the numbers in Table 2 in order to show at a glance how much the execution time of our CFD code has been improved by changed from HPF to mpi. In order to derive the numbers shown in Table 3, we have divided the red numbers along the diagonal column in Table 1 by their respective numbers along the diagonal in Table 2, and have adjusted the ratio to take into account the fact that the grid sizes are not identical in the two tables.

Table 3

CFD Code Timings on the SDSC T3E_600^a
Ratio of PGHPF timings to mpi timings

64³ 128×64² 128²×64 128³ 256×128² 256²×128 256³

4 5.36 -- -- -- -- -- --

8 -- 4.46 -- -- -- -- --

16 -- -- 5.10 -- -- -- --

32 -- -- -- 4.26 -- -- --

64 -- -- -- -- 3.75 -- --

128 -- -- -- -- -- 4.01 --

	CFD Code Timings on the SDSC T3E_600^a Ratio of PGHPF timings to mpi timings
64³	128×64²	128²×64	128³	256×128²	256²×128	256³
4	5.36	--	--	--	--	--	--
8	--	4.46	--	--	--	--	--
16	--	--	5.10	--	--	--	--
32	--	--	--	4.26	--	--	--
64	--	--	--	--	3.75	--	--
128	--	--	--	--	--	4.01	--

Scalability of the Gravitational CFD Code

Performance Measures using mpi:

Table 4 details our execution times on various configurations of the T3E and on a variety of different problem sizes precisely as reported in Tabel2, but here the timings include a solution of the global Poisson equation along with the CFD code.

Table 4

Gravitational CFD Code Timings on the SDSC T3E_600^a
Using mpi
(seconds per integration timestep)

66²×64 66²×128 130×66×128 130²×128 130²×256 258×130×256 258²×256

4 3.552 7.016 -- -- -- -- --

8 2.050 4.122 -- -- -- -- --

16 1.118 2.237 4.008 -- -- -- --

32 0.6562 1.311 2.269 4.394 -- -- --

64 -- -- 1.280 2.381 4.850 8.638 --

128 -- -- -- 1.398 2.832 4.848 --

	Gravitational CFD Code Timings on the SDSC T3E_600^a Using mpi (seconds per integration timestep)
66²×64	66²×128	130×66×128	130²×128	130²×256	258×130×256	258²×256
4	3.552	7.016	--	--	--	--	--
8	2.050	4.122	--	--	--	--	--
16	1.118	2.237	4.008	--	--	--	--
32	0.6562	1.311	2.269	4.394	--	--	--
64	--	--	1.280	2.381	4.850	8.638	--
128	--	--	--	1.398	2.832	4.848	--

FOOTNOTES:

^aTo obtain the execution times reported in this table, the hydrocode was run for 200 integration timesteps utilizing the grid resolution specified at the top of each column of the table.

^b For comparison (see Table 1 for details), running the same size problem (128 × 64²) on a single node of a Cray Y/MP requires 13.30 cpu seconds per integration timestep.

^c For comparison (see Table 1 for details), running the same size problem (128 × 64²) on a single node of a Cray C90 requires 4.01 cpu seconds per integration timestep, and on an 8,192-node MasPar MP-1 requires 4.74 cpu seconds per integration timestep.

Body of Proposal