Early Experiences with the Cray T3E at the SDSC

Joel E. Tohline and John Cazes

Department of Physics & Astronomy
Louisiana State University


Part I: Background

Type of code ported:

Our research is primarily directed toward a more complete understanding of dynamical gas flows in astrophysical systems. In the context of star formation, we're interested in understanding why stars form preferentially in pairs and how global gas dynamical processes in galaxy disks drive star formation events. As the accompanying movie illustrates, through some recent simulations we also have examined how, during the late stages of evolution, some binary star systems may merge dynamically.

MOVIE
The Coalescence of Two Neutron Stars.
(326K)

To carry out these investigations, we are employing fairly traditional direct numerical simulation (DNS), i.e., explicit time-integration, techniques. Our primary computational tool is a fortran algorithm that is a finite-difference representation of the multidimensional equations governing the dynamics of inviscid, compressible (and frequently supersonic) gas flows. Specifically, the employed finite-difference algorithm is based on the van Leer monotonic interpolation scheme described by Stone and Norman (1992), but extended to three dimensions on a uniform, cylindrical coordinate mesh.

At each time step, the Poisson equation is solved in order to determine, in a self-consistent fashion, how the fluid is to be accelerated in response to its own self-gravity. Presently, the Poisson equation is solved using a combined Fourier transformation and ADI (alternating-direction, implicit) scheme (cf. Cohl, Sun, & Tohline 1997).

Parallel Platforms on which our code has been executed previously:

The parallel version of our CFD code originally was developed on a SIMD-architecture, MasPar MP-1 computer. The code was written by a former graduate student, John Woodward, in mpf (MasPar Fortran) which is a language patterned after Fortran90 but which includes certain extensions along the lines of HPF. Numerous simulations have been performed successfully over the past several years on the 8,192-node MP-1 system in the Concurrent Computing Laboratory for Materials Simulation (CCLMS) in the Department of Physics and Astronomy at Louisiana State University and on an MP-2 at the Scalable Computing Laboratory of the Ames Laboratory at Iowa State University.

As is shown in Table 1, below, on LSU's MP-1 system we have achieved execution speeds approximately three times that of a single-node Cray-Y/MP, and on the MP-2 the code out-performs the Cray C90 by a factor of two.

Before moving to the T3E, we previously have compiled and successfully executed a basic version of our CFD code on two other parallel machines with MIMD architectures:

Table 1 documents the performance of our code on these machines. As the timings in the table illustrate, we have been unable to achieve a high degree of scalability on the SP-2 platform.

As we explain in more detail in an accompanying discussion, the CFD code we currently are using on the T3E was ported from the CMFortran version of our code.

Table 1

Timings on Several Different Machinesa
Compiler nodes Total time
(sec)
Seconds per
timestep
Y/MP Ratio MP-1 Ratio
Cray Y/MP f77 1 2660.0 13.30 1.00 0.36
MasPar MP-1 mpf 8,192 947.4 4.74 2.81 1.00
Cray
C90
Fortran90 1 802.8 4.01 3.31 1.18
MasPar MP-2 mpf 8,192 388.6 1.94 6.84 2.44
" " 4,096 681.4 3.41 3.90 1.39
CM-5 cmf
Block3D
32 1098.3 5.49 2.42 0.86
" " 64 584.6 2.92 4.56 1.62
" " 128 319.3 1.60 8.33 2.97
" " 256 187.0 0.93 14.23 5.07
SP2 XLHPF
Block3D
16 982.4 4.91 2.71 0.96
" " 64 471.4 2.36 5.64 2.01
" " 128 374.6 1.87 7.10 2.53

FOOTNOTE:

aTo obtain these execution times, the CFD code was run for 200 integration timesteps utilizing a grid resolution in cylindrical coordinates of 128 × 64 × 64. It should be noted that the timing comparisons were obtained with a purely hydrodynamic version of the code, that is, a solution to the Poisson equation and, hence, the self-gravity of the fluid was not included. Only minor changes in the mpf code were required before it could be compiled and successfully run on the C90 and the CM-5. However, because a Fortran90 compiler was not available on the Y/MP, we utilized VAST to first convert the mpf code to f77 before the code was compiled and run on the Y/MP.


Title Page
(Top of) Part I
Part II
Part III
Part IV