A REQUEST FOR NPACI RESOURCES
Joel E. Tohline
Department of Physics & Astronomy
Louisiana State University
|This entire document, along with several illustrative animation sequences is available on-line at the address http://www.phys.lsu.edu/faculty/tohline/NPACI98/|
1. Summary of Proposed Research, Including Objectives
Recent observational investigations of the frequency of occurrence of pre-main-sequence (PMS) binary stars have reinforced earlier suspicions (Abt 1983; Duquennoy & Mayor 1991) that "binary formation is the primary branch of the star-formation process" (Mathieu 1994). That is to say, stars tend to form in pairs rather than as single objects. Every indication is that, even in the PMS phase of stellar evolution, binaries with total masses less than or equal to three solar masses exhibit a wide range of orbital periods (days to at least 105 yr) with a median period of a few hundred years corresponding to a median separation on the order of 50 AU (astronomical units). Why and how do stars preferentially form in pairs? This is the fundamental question that motivates the proposed research activities described here.
As Bodenheimer, Ruzmaikina, and Mathieu (1993) have reviewed, a number of different theoretical scenarios have been proposed to explain the preponderance of binary stars. The ones given most attention over the past two decades are: Capture; direct fragmentation; disk fragmentation; and fission. Because either three-body encounters or close encounters that increase the importance of tides are required to make the capture process work, in normal star formation environments capture alone does not appear to provide a satisfactory explanation. (Note, however, that the presence of disks may enhance capture rates in dense clusters as pointed out by Larson 1990 and Clarke & Pringle 1991.)
As Shu, Adams and Lizano (1987) have summarized, it seems likely that binary systems with orbital periods between a few hundred and a few million years result from the direct fragmentation of a protostellar cloud early in its phase of collapse (at cloud densities n ~ 103 - 1010 cm- 3). Hydrodynamic simulations performed, for example, by Bodenheimer, Tohline & Black (1980), Boss (1980, 1986), Bodenheimer & Boss (1983), Miyama, Hayashi, & Narita (1984), Monaghan & Lattanzio (1991), and Miyama (1989, 1992) illustrate how this might occur. But at higher densities, clouds are unable to cool efficiently upon contraction and, consequently, direct fragmentation becomes problematical. Because higher mean densities are directly associated with systems having shorter dynamical times, one is led to investigate mechanisms other than direct cloud fragmentation for forming binary systems with orbital periods less than a few hundred years.
Through his ongoing NSF-funded project (AST-9528424), the PI is carrying out a variety of different numerical simulation projects in an effort to understand whether disk fragmentation and/or the fission of rapidly rotating protostars produce short-period binary star systems under physical conditions that are expected to exist naturally in protostellar environments. For completeness, we review briefly here our recent work on disk fragmentation. Then our discussion will turn to the fission problem, which is the focus of the research proposed herein.
Because gas clouds have difficulty getting rid of excess angular momentum during a phase of dynamical collapse, there is reason to believe that all stars form with some sort of (accretion) disk surrounding them. Depending on the rate at which mass is lost to the central protostar via accretion and the rate at which mass is gained via infall from the surrounding interstellar cloud, a disk can grow to be substantial in size (e.g., Terebey, Shu & Cassen 1984; Cassen, Shu & Terebey 1985). From an observational perspective, disks certainly appear to be ubiquitous in the context of star formation. Not only did the solar system sport a disk at a young age, but it is now clear that most classical T Tauri stars are surrounded by disks (Rucinski 1985; Basri & Bertout 1993; Beckwith & Sargent 1993) and disks even appear to frequently accompany PMS binary systems (Mathieu 1994). Accumulating observational evidence suggests, in fact, that protostellar disks can contain a significant amount of mass (Adams et al. 1990; Mathieu et al. 1995). If a disk becomes sufficiently massive, compared to the central object that it surrounds, a gravitational instability in the system may cause the disk to accumulate into an off-axis, binary companion of the central object or to break into two or more pieces. In the latter case, one or more components would have to be ejected from the system before an isolated binary would result.
We recently have completed an exhaustive study of the onset of nonaxisymmetric instabilities in geometrically thick, self-gravitating protostellar disks (Woodward, Tohline & Hachisu 1994; hereafter WTH). Animation sequences illustrating two of the simulations from this study are available on-line (see the http address at the top of this document) in formats indicated in the captions to the two "Movie Icons" shown here. Movie-1 shows the nonlinear development of the "Papaloizou-Pringle" instability in a massless disk; through simulations of this type we have been able to demonstrate that our nonlinear simulations produce nonaxisymmetric growth rates that match the predictions of linear theory (Papaloizou and Pringle 1984; Frank and Robertson 1988; Kojima 1986) to very high precision. Movie-2 shows the development of an "m = 2" mode instability in a fully self-gravitating torus.
24-bit mpeg (515K)
Upon examining systems with a wide range of disk-to-central-object mass ratios, we have concluded that the mode that is most likely to go unstable in a disk of non- negligible mass is an "m = 1" mode very similar -- if not identical -- to the "eccentric mode" first discussed by Adams, Ruden & Shu (1989; hereafter ARS). As this long-wavelength, global mode amplifies in the disk, it resonates with a particular macroscopic motion of the central protostellar object, driving the central object into an orbit of increasing radius about the center of mass of the combined star-disk system. Apparently a variety of physical mechanisms in protostellar disks are likely to promote the growth of a global, m = 1 spiral mode if the central protostar's motion is taken properly into account (Lin & Papaloizou 1989; Shu et al. 1990; Papaloizou & Savonije 1991; Heemskerk, Papaloizou & Savonije 1992). ARS have suggested that the nonlinear growth of the "eccentric mode" is likely to lead in a very natural way to the formation of a binary star system.
To the extent that we have been able to follow the nonlinear growth of this mode in our own thick-disk simulations, we support the ARS conjecture. We have observed the development of a single, coherent, self-gravitating clump of material in the disk and, in concert with the breaking of the axisymmetry of the disk, we have watched the central star (treated as an idealized point mass in our simulations) spiral out away from the center-of-mass of the system. Unfortunately, we have not yet been able to determine whether a binary star system is created as a result of this instability. In every case, we have been forced to terminate our simulations before the nonlinear development of the disk's m = 1 mode has been completed because the spiraling trajectory of the central point mass invariably brings the central object in contact with the inner edge of the disk. Using an SPH simulation technique, Adams & Benz (1992) have provided tantalizing, but equally unconvincing evidence that this instability leads to the formation of a binary star system. As part of our ongoing NSF-sponsored research, we are conducting a series of more complete simulations that are designed to determine whether or not, in protostellar disk systems, the fully nonlinear development of the eccentric instability leads to the formation of a binary star system, but it is not for this work that we are requesting an allocation of NPACI resources.
Before discussing either the classical fission hypothesis for binary star formation, or what we consider to be a more reasonable, modified version of that hypothesis in view of recent simulations, it is useful to review what is known about allowable equilibrium configurations for rapidly rotating protostars. Consider, first, all rotating objects that are precisely ellipsoidal in shape, having principal axes a, b, and c. Furthermore, assign these labels such that a always refers to the object's longest axis and c is the axis about which the object rotates. Figure 1, with coordinate axes 0 £ b/a £ 1 and 0 £ c/a £ 1, then defines a two-dimensional parameter space that contains all such ellipsoidal figures. For example, the top-right corner of the box in Figure 1 (point S) defines a sphere ( b/a = 1; c/a = 1 ); the right-hand edge of the box ( b/a = 1; 0 £ c/a £ 1 ) identifies all possible (axisymmetric) oblate spheroids; and a diagonal line connecting point S with the bottom-left corner of the box identifies all (axisymmetric) prolate spheroids ( c/b = 1 ).
As Chandrasekhar (1969) has reviewed in detail (see also Durisen and Tohline 1985), if protostellar objects are assumed to be self-gravitating,
Physically meaningful evolutionary sequences also can be diagrammed within the parameter space defined by Figure 1. For example, as an initially slowly rotating (nearly spherical) protostellar gas cloud gravitationally contracts toward a mean density and size that is typical of a hydrogen-burning (main-sequence) star, it will become more and more rotationally flattened if it conserves its mass and angular momentum. The sequence of oblate spheroidal models along the right-hand edge of the box in Figure 1 (between points S and E, for example) describes well this phase of the protostar's early evolution. The curve in Figure 1 that connects the right-hand edge (at point E) to the lower-left corner of the box illustrates another possible evolutionary path (for a protostar that is already substantially flattened due to rotation) as it traces out a sequence of equilibrium configurations having constant vorticity. This sequence of genuinely triaxial ellipsoids is said to "bifurcate" from the axisymmetric sequence at point E. The comprehensive, 19th-century analytical work of Jacobi, Dedekind, and particularly Riemann (again, see Chandrasekhar 1969 for a review) has demonstrated that a variety of constant vorticity sequences bifurcate from the axisymmetric sequence at different locations. For our purposes here, the curve drawn in Figure 1 will serve to illustrate any one of a number of these different sequences. If, at the point of bifurcation, the ellipsoidal sequence defines a configuration that has the same mass and angular momentum, but a lower total energy than the adjacent configuration on the axisymmetric sequence, then after reaching point E the energetically preferred evolutionary path will be one that takes the contracting protostar away from the axisymmetric sequence.
A careful examination of equilibrium models along sequences of progressively more elongated ellipsoids (such as the constant vorticity sequence illustrated in Figure 1) shows that these sequences exhibit additional points of bifurcation. Each point of bifurcation identifies the first model belonging to a new sequence of objects that have "higher order" surface distortions. For example, analytical work on incompressible objects (Chandrasekhar 1969) clearly has identified bifurcation points that lead to equilibrium sequences of pear-shaped and dumbbell-shaped configurations. Point D in Figure 1 shows where bifurcation to a dumbbell sequence occurs along one illustrative, constant-vorticity sequence of ellipsoids. Figure 2 (taken from the review by Durisen and Tohline 1985) illustrates furthermore how the surface distortion of equilibrium figures becomes more and more extreme as one moves along the dumbbell sequence away from the initial point of bifurcation: There is a smooth transition from the initially ellipsoidal figure, to figures with a pronounced dumbbell shape, then to fully detached binary star structures.
(i) Classical Fission Hypothesis
It is easy to see how a "fission hypothesis for binary star formation" emerges from this classical work on ellipsoidal figures of equilibrium. Imagine that as a rotating protostar undergoes slow contraction, it evolves from point S, through point E, to point D along the equilibrium sequences illustrated in Figure 1. Then, from point D it contracts along the dumbbell sequence through each of the geometrical configurations (a --> d) illustrated in Figure 2 until a well-separated binary star system has been formed. This is certainly an aesthetically pleasing picture of how (predominantly equal-mass) binary star systems might form. If the outlined evolutionary path proves to be the energetically preferred one under a variety of different protostellar cloud conditions, the fission hypothesis would also explain naturally why the majority of stars form as binary systems.
In reality, however, the picture is not this clear. Even when this classical fission hypothesis was discussed by mathematicians and astronomers a century ago, several problems were recognized:
(ii) Recent Problems Promoting the Fission Hypothesis
Because of its significantly higher level of mathematical complexity, the extension of this important study to more realistic, compressible configurations -- as well as to systems with nontrivial internal motions -- was not possible before the advent of modern computers. (Indeed, even the construction of incompressible figures with significantly nonellipsoidal geometries, such as models along the dumbbell/binary sequence displayed in Figure 2, was not possible until very recently; see Hachisu and Eriguchi 1984.) Even with the availability of high-performance computers, however, progress on this problem has been pathetically slow. More importantly, the progress that has been made toward understanding the equilibrium and stability properties of rotating, compressible structures has not been particularly supportive of the fission hypothesis. Consider the following:
An animation sequence illustrating the nonlinear development of this two-armed, spiral-mode instability is provided online (see the http address at the top of this document) in a format indicated in the caption to the accompanying "Movie-3" icon. Several still frames from this movie are also displayed on the following page of this proposal, in Figure 3. After the nonaxisymmetric distortion reaches a nonlinear amplitude, the evolution proceeds as follows: Via the trailing spiral structure, gravitational torques are able to effectively redistribute angular momentum on a dynamical time scale; a relatively small amount of material carrying a relatively large fraction of the system's angular momentum is shed into an equatorial disk; and the central object (containing most of the initial object's mass) settles down into a new, centrally condensed, equilibrium configuration. That is, "fission" into a binary star system as depicted schematically by frames a --> d of Figure 2 does not occur.
Figure 3 (Caption)
The top six frames (read, in sequence, from left to right) have been taken directly from "Movie-3," an animation sequence illustrating the nonlinear development of the two-armed, spiral-mode instability in a rapidly rotating protostar. More precisely, the initial model (1st frame) is an axisymmetric, n = 3/2 polytrope with a rotation profile given by an n' = 0 sequence (Durisen et al. 1986), and an initial ratio of rotational to gravitational energy T/|W| = 0.30. The separation in time between the first frame and the second is 7.66 cirps (central initial rotation periods); all other frame pairs are separated in time by approximately 2.75 cirps. Frame six shows the "final" steady-state configuration, 18.7 cirps into the evolution. This 47,500 time-step simulation was performed with a spatial resolution of 1283 primarily on the Cray T3E_600 at the SDSC, although a portion of it was on the Cray T3E_900 at the NAVOCEANO's MSRC. The last two frames illustrate how, in the future, tracer particles may be placed in the fluid in order to permit visualization of internal, differential motions.
With regard to this last point, the results appear to be particularly significant. Simulations conducted with several independently developed gravitational hydrodynamics codes (Durisen, et al. 1986; Houser, Centrella, and Smith 1994) have each shown that the dynamical instability that arises in rapidly rotating, axisymmetric models of compressible gas clouds does not lead to the formation of a binary star system, as envisioned classically. Hence, as has been reviewed by Bodenheimer, Ruzmaikina, and Mathieu (1993), among the various ideas that have been proposed to explain the preponderance of binary stars, the "binary fission hypothesis" currently is out of favor on theoretical grounds.
Evidence that binary star formation may yet result from the "fission" of a rapidly rotating, protostellar gas cloud comes from several numerical simulation projects that have been conducted by the PI and his students in the context of his ongoing NSF-sponsored research.
(i) The Existence of a Robust, Ellipsoidal Structure with Nontrivial Internal Motions
As stated (and illustrated) above, through a fairly violent process, the two-armed spiral-mode instability is able to transform an initially isolated, axisymmetric gas cloud into an entirely new steady-state configuration with a much different radial distribution of angular momentum. As described above, at the end of its dynamical phase of evolution the system contains a new centrally condensed object that is surrounded by a disk of high specific angular momentum material. It is also clear from the simulation results presented here, however, (see Figure 3) that the central object is not axisymmetric! Instead, as was emphasized a number of years ago by Williams and Tohline (1988) and is most easily appreciated by watching the animation sequence labeled "Movie-3", the final object has a decidely ellipsoidal (even slightly dumbbell) shape. An analysis of the simulation data also clearly shows that this nonaxisymmetric, steady-state configuration contains a nontrivial internal flow. That is, although the configuration as a whole appears to be an ellipsoidal-shaped figure that is spinning coherently as a solid body, in reality the steady-state configuration also contains strong differential fluid motions internally.
(ii) Sequences of Equilibrium, Nonaxisymmetric Disks with Nontrivial Internal Motions
As mentioned above, approximately a decade ago Hachisu (1986b) developed a self-consistent-field technique that permitted him to construct nonaxisymmetric, equilibrium configurations of self-gravitating objects with nonellipsoidal shapes and nontrivial internal motions if the structures were assumed to be incompressible fluids. Working with the PI, LSU graduate student Saied Andalib recently has developed a technique by which nonaxisymmetric, equilibrium models of self-gravitating, compressible configurations can be constructed with nontrivial internal motions (Tohline and Andalib 1997). In order to accomplish this task, Andalib has adopted the following two simplifications: (a) The models are two-dimensional, in the sense that they are either infinite in extent or infinitesimally thin in the vertical direction; and (b) each system is assumed to have uniform vortensity, where vortensity is defined at each point in the fluid to be the ratio of vorticity to density (see, for example, Lin and Papaloizou 1989).
In constructing equilibrium sequences of this type, Andalib has discovered that he is able to construct compressible disks with a wide range of equatorial axis ratios b/a, except models having b/a close to 1. That is to say, unlike the incompressible models with simple rotation (see Figure 1), his nonaxisymmetric sequences cannot be connected to the axisymmetric sequence. This result has been illustrated schematically in Figure 5; a gap exists between point E on the axisymmetric sequence and the beginning of the sequence of nonaxisymmetric equilibrium objects. We conclude from this work that it is possible for rotating protostellar gas clouds to exist in equilibrium as nonaxisymmetric objects, but it is unlikely that they can make the transition from an axisymmetric structure to a nonaxisymmetric one gracefully.
With the NPACI resources being requested here, we propose to investigate thoroughly both of these conjectures.
(iii) The Merger (or Lack Thereof) of Close, Equal-Mass Binaries
Because an objective of the PI's ongoing research efforts is to understand how binary star systems form, it is important to demonstrate that the gravitational CFD simulation techniques being employed by the PI are capable of spatially resolving and accurately representing the equilibrium structure of binary stars. Recently, the PI has published the results of an extensive study of the stability of close, equal-mass binary stars systems that exhibit a variety of different degrees of gas compressibility (New and Tohline 1997). In essence, this was a numerical investigation designed to determine whether or not, in realistic astrophysical situations, evolution will ever occur backwards in Figure 2 (i.e., from frame d to frame a), resulting in the merger (or destruction) of a binary star system. In connection with this proposal, we stress two key results that have emerged from this published investigation:
With the NPACI resources being requested here, we propose to explore more completely the relationship between this binary merger problem and the fission problem.
We propose to perform the following well-defined simulation tasks in order to examine to what extent the ideas outlined above contribute to a more complete understanding of the binary star formation process in rotating protostellar gas clouds. If our educated speculations based on recent simulations prove to be correct, we will be in a position to demonstrate conclusively why and how stars preferentially form in pairs.
Proposed simulations utilizing NPACI resources:
A. Extend the "Movie-3" evolution, testing the stability of the final ellipsoidal configuration.
B. Examine the stability against merger of unequal-mass binaries
2. Computational Methodology/Algorithms
Our primary dynamical simulation tool is an algorithm designed to perform multidimensional computational fluid dynamic (CFD) simulations in astrophysical settings. It is a High-Performance-Fortran (HPF) algorithm that is a finite-difference representation of the multidimensional equations governing the dynamics of inviscid, self-gravitating, compressible gas flows. More specifically, the employed finite-difference algorithm is based on the second-order accurate, van Leer monotonic interpolation scheme described by Stone and Norman (1992), but extended to three dimensions on a uniform, cylindrical coordinate mesh. The fluid equations are advanced in time via an explicit integration scheme (often referred to as "direct numerical simulation") so an upper limit to the size of each integration time step is set by the familiar Courant condition. At each time step, the Poisson equation is solved in order to determine, in a self-consistent fashion, how the fluid is to be accelerated in response to the fluid's own self-gravity. Complete descriptions of our algorithm to solve the coupled CFD + Poisson equations can be found in WTH and in New (1996). In many of our recent simulations, the Poisson equation has been solved implicitly using a combined Fourier transformation and iterative ADI (alternating- direction, implicit) scheme. However, in the future we will be replacing the iterative ADI component of this algorithm with a fully direct solution as described by Cohl, Sun and Tohline (1997).
As discussed in ¤1 of this proposal, our ongoing NSF-funded research is aimed at understanding to what extent various rapidly rotating, self-gravitating structures -- such as protostars, accretion disks, and close binary stars -- are dynamically stable. Utilizing the gravitational CFD code described above, our broad objective is to examine the development of nonaxisymmetric structures that arise spontaneously as "normal modes" in such systems. It is important in such studies that we have the capability to construct accurate equilibrium structures as initial input into each dynamical simulation so that the simulations are not dominated by dynamical transients that depend heavily on initial conditions. Drawing on the work of Hachisu (1986a,b), we have successfully developed a robust self-consistent-field code (hereafter, HSCF code) that readily generates nonaxisymemtric as well as axisymmetric equilibrium structures as input to our gravitational CFD code. We have made an axisymmetric version of this code available to others via the world wide web ( http://www.phys.lsu.edu/ astro/ H_Book.current/ Applications/ Structure/ HSCF_Code/ HSCF.outline.shtml). To date, most of our investigations have focused on structures that have relatively simple angular momentum distributions, but as discussed above in connection with the fission problem, we are extending the capabilities of the HSCF code to include the generation of nonaxisymmetric equilibrium structures with nontrivial internal motions.
When performing multidimensional CFD simulations, it is always important to develop an efficient technique by which the numerical results can be readily analyzed for physical content. Because three-dimensional CFD simulations in particular generate enormous amounts of data, and the computational efficiency of modern high-performance-computers is quickly degraded by any significant I/O demands, generally it is preferable to analyze the data "on the fly" rather than trying to save the data to disk for future, post-processing. As described more fully below, we have constructed a heterogeneous computing environment through which we are able to routinely create volume-rendered images and 3D animation sequences of our evolving protostellar cloud configurations. (All of the "Movie" sequences referred to in ¤1 of this proposal have been produced in this fashion.) Relatively small amounts of data are regularly passed across the network from the parallel machine that is performing the CFD simulation to a machine that is more suited to visualization tasks; images that will become part of an extended animation sequence are created from this data; then the 3D data sets are immediately destroyed in order to minimize the impact on disk storage. At the SDSC, we have utilized the scriptable attributes of Alias/Wavefront (TM) software on middle- and high-end SGI systems in order to perform these critical visualization tasks. (The images displayed in Fig. 3 as well as the images associated with the animation sequences labeled Movie-1 and Movie-3, above, are products of this visualization tool.) This visualization tool and the heterogeneous computing environment in which it functions will continue to provide a critical component of our ongoing gravitational CFD simulations at the SDSC.
There is always room for improvement in connection with the numerical tools that are used to simulate and visualize the structure and evolution of complex physical systems. Recently we have invested a considerable amount of time developing two new tools that we expect to integrate into our program during the coming year. First, it is clear that one of the most computationally demanding components of our current gravitational CFD code is the algorithm used to calculate boundary values for the gravitational potential that must be fed into the Poisson solver in order to be able to solve for the gravitational potential throughout the volume of the fluid. Historically, we have used an algorithm based on a multipole expansion in terms of spherical harmonics (Tohline 1980). To improve the execution speed of this component of our code, we are developing an algorithm based on cylindrical Bessel functions because Bessel functions are much more naturally suited to the (cylindrical) geometry of our chosen computational grid.
Second, although we have been very pleased with the volume-rendering tool that we have developed using Alias/Wavefront software on the SGI platforms at the SDSC, this tool presently provides no mechanism for visualizing the differential fluid motions within the evolving protostellar cloud structure. Working with the visualization team at the NAVOCEANO MSRC (see ¤6, below), we are developing an algorithm through which Wavefront can follow the motion of a large number of "tracer particles" in the fluid at the same time as it renders various nested "surfaces" of the entire cloud. The last two frames shown in Figure 3 provide a simple example of how 10 tracer particles initially radially aligned with one another in the fluid can be followed to show, a short time later, a "spiral" pattern created by the underlying fluid's differential motion.
3. Local Computing Environment
In 1991, through an equipment grant from the Louisiana State Board of Regents, the LSU Department of Physics and Astronomy acquired an 8,192-node, SIMD-architecture MasPar MP-1 to support the development of parallel algorithms for a variety of computational physics and chemistry projects. The architectural topology of the MP-1 is that of a 2D torus characterized by a 128 x 64, two-dimensional grid of MasPar-produced processing elements. The MP-1 also incorporates a two-tier processor communication strategy which includes rapid nearest neighbor communication via X-Net communication and efficient global communication performance using a tree- nested global router. Each processing element contains 64 Kbytes of local dynamic RAM, that is, the total memory on LSU's MP-1 system is 0.5 GBytes. As described in ¤4, below, we have taken full advantage of this parallel hardware platform to perform a variety of astrophysical simulations at moderate spatial resolutions. Our computing strategy is to continue to use this machine to conduct "preliminary" calculations over a broad range of the available physical parameter space, then to study in detail the dynamical evolution of selected systems at higher spatial resolution on the Cray T3E at the SDSC.
The LSU Department of Physics and Astronomy also supports a large cluster of (over 45) Sun Microsystems workstations. Sparc 10 and Sparc 2 workstations that are integrated into this cluster, plus a NeXT Dimension workstation and a recently acquired SGI Octane provide the PI and his students with graphical unix interfaces into LSU's campuswide ethernet system and the internet. LSU is currently working through SURA (the Southeastern University Research Association) to upgrade its off-campus internet connections from a pair of T1 lines to a T3 bandwidth and Internet-2 status.
4. Preliminary Progress
Taking advantage of our acquisition of an 8K-node MasPar at LSU (see ¤3, above), in December 1991, we rewrote our CFD code in mpf (MasPar Fortran), which is a language patterned after Fortran 90 but includes certain extensions along the lines of high-performance Fortran (HPF). Utilizing the MP-1, we also have gained valuable experience in the development of parallel algorithms to efficiently solve the Poisson equation in curvilinear coordinate systems (Cohl, Sun & Tohline 1997). Numerous gravitational CFD simulations have been performed successfully over the past six years on the LSU MP-1 system and on an MP-2 platform at the Scaleable Computing Laboratory of the Laboratory at Iowa State University. Our algorithm has achieved a very high degree of parallelism on the MasPar. Because our algorithm involves primarily nearest-neighbor communications, we have been able to take particular advantage of the hardware's X-Net communications network. As described in Cohl, Sun and Tohline (1997), our global solution of the Poisson equation incorporates a matrix transpose technique that utilizes both the X-Net and global router networks efficiently (Flanders 1982).
It was at LSU as well that we initially developed a heterogeneous computing environment through which our gravitational CFD data can be visualized "on-the-fly." Using communication strategies, such as unix sockets and PVM (Parallel Virtual Machine), at LSU the computational work load is distributed such that workstations perform the rendering of our data simultaneously with any ongoing CFD simulation on the MasPar. For example, the animation sequences labeled Movie-2, Movie-4, and Movie-5, above, were generated at LSU using the IDL software package running on Sun Microsystems workstations.
Over the past four years, our parallel CFD algorithm that was developed in mpf on the MasPar has been ported to the Thinking Machines CM-5 at NCSA and to the IBM SP2 at the Cornell Theory Center. We also have been able to implement an f77 serial version on the Cray Y/MP at NCSA and an F90 version on the Cray C90 at SDSC. Table 1 in Attachment C2 documents the performance of our CFD code on these five different machine architectures. To date, performance results on the CM-5 and the SP2 have not been sufficiently good to warrant migration of our production runs away from LSU's MP-1 system. As discussed below, however, the Cray T3E architecture appears to be well-suited to our existing parallel algorithm so we have moved our primary simulation work to the SDSC.
Early in 1997, the PI applied for and was granted 19,200 SUs on the Cray T3E at the SDSC for a project entitled, "Dynamical Modeling of Star Formation in Galaxies." (The allocation was granted on April 1, 1997 and will expire on March 31, 1998. Our current request for NPACI resources is intended to be a direct extension of this successful, 1st-year SDSC project.) During the first couple of months of this project, we focused our efforts on porting our gravitational CFD code to the T3E and documenting its performance. Using the available Cray HPF compiler (PGHPF, developed by the Portland Group), we found that relatively few unexpected code modifications were required in order to successfully compile and execute our code on the SDSC system. (This was because we already had had experience successfully modifying the code to execute on the CM-5 using cmf and on the Cray C90 using the Cray Fortran90 compiler.) In June, 1997, we submitted a technical report entitled, "Early Experiences with the Cray T3E at the SDSC," to Jay Boisseau at the SDSC in which we fully documented how well our code was performing on the T3E. (This technical document is included as Attachment C2 to this proposal.) In summary, we found that:
On the down side, we also discovered that the PGHPF compiler was producing executable code that performed computational instructions on each T3E processor with extraordinarily low efficiency. That is, as a parallel application, the code was scaling well, but its overall efficiency measured relative to the published peak-performance capabilities of the machine was poor. By way of illustration, we show in Table 1a the measured MFlop rating of our CFD code on a single processor of the T3E_600 at the SDSC. For two different lattice sizes (one with and one without power-of-two arrays), the PGHPF-compiled code runs at about 13 MFlops, which is only 2% of the published peak rating of 600 MFlops. (The numbers in Table 1a were obtained using "pat," a performance monitoring tool that only became available to us at the SDSC in September. The report from which these numbers were drawn accompanies this proposal as Attachment C1.) Clearly there was plenty of room for improvement, but it was equally clear that our performance measures were not exceptional. Throughout its first year of general availability, users have been reporting "typical" T3E single-processor performance ratings of anywhere from 4 to 100 MFlops (John Levesque [APRI], private communication; see also "The Benchmarker's Guide to Single-processor Optimization for CRAY T3E Systems" published by the Benchmarking Group at Cray Research). Our recent progress on improving the code's performance is described below.
|Test||Compiler||Streams||Execution Speed (MFlops)|
64 × 32 × 32
67 × 34 × 32
Working with several staff members in the SDSC Vislab, during the summer we successfully established a heterogeneous computing environment at the SDSC through which the results of our gravitational CFD simulations can be routinely visualized. Extending the ideas that we first developed at LSU, we have established a procedure by which, at predetermined time intervals during a simulation, the CFD code running on the T3E writes out a 3D density file and sends a signal to the Vislab's SGI Power Onyx/Infinite Reality Engine indicating that a new data file is available for processing. Having received the signal from the T3E, the graphics engine executes a previously written shell script which (i) transforms the data from its cylindrical coordinate lattice to a cartesian lattice; (ii) searches through the data to locate four separate isodensity surfaces and generates a wire-frame structure (i.e., vertices and polygons) that corresponds to each of these 3D surfaces; (iii) executes an Alias/Wavefront script which renders the set of wire-frame structures as a multiply nested, reflective and colored transparent set of isosurfaces; (iv) stores on disk this 2D, rendered image of our evolving model at one particular time step in the simulation; then (v) deletes the original 3D data set and wire frame models to conserve disk space. This entire task is continually re-initiated as data is generated during an extended CFD simulation on the T3E. The net result of this process is a stored sequence of 2D images illustrating the evolution of our physical system. The series of 2D images is transferred via the internet to LSU where it can be displayed as a movie, allowing us to rapidly interpret the dynamics of the systems that we are examining. The animation sequences labeled as Movie-1 and Movie-3 in ¤1 of this proposal (see also Fig. 3) were generated in this manner. We consider the successful establishment of this environment which smoothly couples the high-end, SGI visualization tools at the SDSC with our simulation efforts on the T3E to have been a significant accomplishment this past year.
Throughout the fall, we used the SDSC facilities to study carefully two unstable physical systems: (1) A "massless" accretion disk surrounding a point-like protostellar object (see Movie-1, above). As noted above, this particular system has been examined previously through linear stability analyses and was simulated at this time in order to provide us with a quantitative check of the CFD code's performance on the T3E. (2) A rapidly rotating protostellar object that was subject to the two-armed, spiral-mode instability described above (see Movie-3 and Fig. 3). By tracking the nonlinear development of this instability at a spatial resolution that heretofore has been computationally unachievable, we have confirmed the general results of earlier simulation efforts and have produced a spatially well-resolved, rapidly rotating ellipsoidal-shaped structure whose relative stability can now be examined in the context of the classical fission hypothesis. The results of this work are currently being prepared for presentation at the "Numerical Astrophysics 1998" conference to be held during March in Tokyo, Japan and will be published in the proceedings of that meeting.
As is explained in ¤6, below, the PI recently has gained access to some of the computing facilities at the NAVOCEANO MSRC in Stennis, Mississippi. Via access to the T3E_900 and to several onsite workshops at the NAVO MSRC, the PI and his students have gained a much better appreciation of the way a single processor on the T3E handles the loading and storing of data arrays and the streaming of data into floating-point registers. By making adjustments in the stride of our principal data arrays and by taking advantage of some compiler/environment options, but without making any changes in the implemented HPF algorithm, we are able to generate executable code with a substantially higher execution efficiency than originally reported. As the data in Table 1b indicates, for example, by switching the STREAMS environment variable "on" and avoiding power-of-two array dimensions, through the Cray F90 compiler our code's single-processor execution speed can be increased from 7.74 MFlops to 38.2 MFlops, that is, by a factor of 5!
At present it is less obvious how improvements of this magnitude can be gained when using the PGHPF compiler. Hence, it is unclear how such speedups can be realized on parallel HPF applications. Based on the fact that, with relatively little effort, our current CFD code can achieve 38 MFlops on a single node of the T3E_900, we are optimistic that we will be able to increase the execution efficiency of our parallel applications on the T3E_600 at the SDSC by a factor of 2.5 over the next few months. With this in mind, we feel confident that a new allocation of 20,000 service units on the T3E will permit us to complete a number of substantial and significant simulations over the coming year.
|NAVOCEANO MSRC: T3E_900|
|Test||Compiler||Streams||Execution Speed (MFlops)|
64 × 32 × 32
67 × 34 × 32
5. Justification of the Number of Service Units Requested
We are requesting an allocation of 20,000 SUs on the Cray T3E at the SDSC as well as continued access to the SGI platforms in the SDSC Vislab. These platforms are suitable because, as has been detailed above in ¤4 and in Attachment C2, over the past year we have (i) demonstrated that our primary HPF simulation tool scales well on the T3E architecture; (ii) shown that we understand, in principal, how significant improvements may be made in the execution speed of our application code on the T3E; and (iii) developed a heterogeneous computing environment that smoothly integrates the unique capabilities of the T3E with those of the SGI platforms in the Vislab.
Code Requirements and Optimization Issues
Over the past year, we have been performing gravitational CFD simulations with lattice resolutions of 1283 and selecting the physical problems to be studied accordingly. At this resolution, our problem will not fit on fewer than 32-nodes of the SDSC T3E -- that is, the problem requires at least 4 GBytes of RAM. However, when the visualization tasks (being carried out simultaneously on the SGI Onyx) are considered as an integral component of each CFD simulation, a more ideal load-balancing is achieved if we run each simulation on 64-nodes of the T3E. In this configuration, typically 14 minutes of wall-clock time are required to evolve the system forward in time by 1/60 of a rotation period and slightly under 10 minutes of wall-clock time are required to complete the volume-rendering of one 3D data set. Hence, the SGI Onyx and the T3E are kept approximately equally busy generating 60 movie frames per rotation period and a comfortably smooth animation sequence is produced during each run.
When we double our current lattice size in order to handle the simulation tasks outlined in ¤1d of this proposal, we will not be able to run on fewer than 64-nodes of the T3E. Depending on how much the visualization tasks slow down as a result of this larger problem size, we may find it advisable to occasionally utilize 128-nodes; but it is unlikely that our planned simulations will require the full machine.
Proposed Production Runs
In ¤1 of this proposal we provided an extensive discussion of the particular scientific problem that we are investigating in connection with our ongoing, NSF-sponsored research. In ¤1d, in particular, we described in detail the simulation tasks that we would like to complete over the coming year in support of this research utilizing NPACI resources. Following up on that discussion -- as well as the discussion in ¤4 of our progress, to date -- our specific request for T3E resources at the SDSC is detailed in Table 2. The letter/number label that has been adopted for each proposed simulation in this table follows the outline used in our earlier discussion of the respective problems in ¤1d; the required number of system rotation periods (cirps) listed in the table also has been drawn directly from this earlier discussion. to
|A.||Test the stability of the ellipsoidal configuration that was formed at the end of the "Movie-3" simulation.|
|1.||Shift to rotating frame of reference.||6||10,800|
|3.||Cool the object slowly.||15||27,000|
|B.||Examine the stability against merger of unequal-mass binaries.|
|1.||Follow the evolution of two systems, each with a mass ratio q » 0.95, but with slightly different initial separations. One separation should be chosen so that both objects are inside their Roche lobe; this system is expected to be stable. The other separation should be chosen such that one object initially just fills its Roche lobe; this system is predicted to be unstable.||8||14,400|
In estimating the SU requirements for each extended run (refer to the last column of Table 2) we have adopted a default single-processor execution rating of 13 MFlops (see Table 1a). At this rate, and doubling the lattice size (that is, moving to a lattice of 128 ´ 128 ´ 256) in order to remove the p-symmetry constraint discussed in ¤1d, each system rotation period will require approximately 1,800 SUs. By default, then, the SUs required to complete all of these simulations totals 52,200.
Now, we argued in ¤4, above, that over the next few months we should be able to improve the execution efficiency of our gravitational CFD code by a factor of 2.5. If we achieve this goal, the SUs required to complete all of the simulation tasks outlined in Table 2 will be approximately 20,880. Hence, our formal request is for 20,000 SUs on the T3E_600 at the SDSC and the burden is on us to achieve the factor of 2.5 improvement in code efficiency.
Performance Monitor Results
As discussed earlier, utilizing "pat" -- a performance analysis tool that accesses the hardware performance monitors on the T3E -- we have been able to document the single-processor performance of our gravitational CFD code on the T3E. Numbers drawn directly from "pat" are shown in Tables 1a and 1b; they are also listed in Attachment C1 to this proposal.
As explained in detail above, we rely heavily on the SDSC Vislab facilities to routinely analyze our simulation results. In order for this project to be a success, therefore, it is essential that we continue to have access to those facilities. It is extremely important that the Alias/Wavefront software continue to be made available on the SGI Onyx as well.
We should point out that the manner in which disk storage is presently controlled and maintained at the SDSC is a problem for us. A single 3D data array in our gravitational CFD simulations is 16 MBytes and a "continuation file" which contains the minimum amount of information that is necessary to start or continue an evolution is approximately 100 MBytes. Therefore, in connection with the T3E, it is very difficult to work with a maximum personal storage allocation of only 10 MBytes. In the Vislab, data storage -- even on scratch disks -- appears to always be at a premium. We have worked very hard to develop a visualization tool (described in detail above) that preserves a bare minimum of data, but in order to make it work properly the tool must have access to a reasonable amount of scratch disk space on the SGI Onyx during each T3E simulation run. Frequently during an overnight run, the automatic visualization tool will fail to function simply because the scratch disk has been completely filled up by other users. We strongly encourage the SDSC staff to correct these problems with disk storage and maintenance.
6. Other Supercomputer Support
Having recently been designated a Major Shared Resource Center for the Department of Defense, during 1997 the Naval Oceanographic Offices (NAVOCEANO) at the Stennis Space Center in Mississippi began to purchase and install a variety of high- performance- computing platforms and visualization architectures. In particular, they have acquired a Cray T3E and an SGI Onyx with supporting compilers and software tools to create a supercomputing environment that resembles the one to which we have grown accustomed at the SDSC. In part because of LSU's close proximity to Stennis (the Stennis Space Center is only 2 hours away from LSU by car), in August, 1997, the PI and his students were invited to utilize the MSRC resources at a modest level of activity through the center's Programming Environment and Training (PET) program. Close interactions with the MSRC technical staff already are proving useful in connection with our efforts to gain better single-processor execution efficiencies on the T3E and to develop visualization tools that will permit us to study in detail the velocity flow-fields that develop during our dynamical simulations.
Because the PI does not have a DoD-funded research project, he has not been granted a priority allocation of time on the high-performance-computing hardware at the NAVOCEANO's MSRC. In connection with the PI's NSF-funded research activities, therefore, it is critically important that a guaranteed allocation of supercomputing time be secured through NPACI resources.
7. Qualifications of the Principal Investigator
A major component of the PI's research work over the past twenty years has involved the large-scale numerical simulation of multidimensional gas flows in astrophysical systems utilizing a variety of different supercomputing and high- performance- computing platforms. In the early 1980s, the PI held a postdoctoral research position in group T-6 at Los Alamos National Laboratories at which time he broadened his exposure to state-of-the-art CFD techniques and learned to develop efficient algorithms for vector supercomputers. From the mid-1980s through the early 1990s, Tohline was PI on a large account at the Cornell Theory Center during which time he and his students experimented with the parallel implementation of CFD algorithms on the IBM 3090 and various accompanying FPS Array Processors. When the 8K-node MasPar MP-1 system arrived at LSU (see ¤3, above), the PI abandoned the CTC facilities because his group was having considerable success utilizing the MasPar system for its largest simulation efforts.
As has been detailed elsewhere in this proposal, in recent years the PI and his students have gained experience developing gravitational hydrodynamic algorithms for execution on a variety of parallel platforms including MasPar's MP-2, the CM-5, the SP-2, and the Cray T3E. The group also has developed and implemented a heterogeneous computing environment (at LSU, at the SDSC, and at the NAVOCEANO's MSRC) that tightly couples their parallel simulation efforts to sophisticated visualization tools.
An accompanying attachment entitled, "Curriculum Vitae and Selected Publications," documents the PI's formal educational background and professional experience, and provides a list of ten significant publications that are particularly relevant to the project being proposed here.
8. Others Working on the Project
John E. Cazes (firstname.lastname@example.org): John Cazes is currently a 6th-year graduate student enrolled in the Ph.D. physics program at LSU. To date, John has been principally responsible for the migration and optimization of our CFD code, having successfully ported it to the CM-5, the SP-2, and the Cray T3E. He also has played the lead role in developing the heterogeneous computing environment through which our CFD simulations are routinely and automatically accompanied by 3D animation sequences. The astrophysics problem on which this project proposal is based also is the focus of John's Ph.D. dissertation research.
Howard S. Cohl (email@example.com): Howard Cohl is currently a 6th-year graduate student enrolled in the Ph.D. physics program at LSU. He has been principally responsible for developing, implementing, and optimizing the Poisson solver that is tightly integrated into our CFD simulations. Howie has worked closely with John Cazes on the development of the heterogeneous computing environment through which animation sequences of our CFD simulations are routinely created. He is also playing the lead role in developing techniques by which the our 3D velocity flow-fields can be readily visualized.
Patrick M. Motl (firstname.lastname@example.org): Patrick Motl is currently a 5th-year graduate student enrolled in the Ph.D. physics program at LSU. He recently has perfected the 3D self-consistent-field technique which will permit us to construct unequal-mass, equilibrium binary systems and then to test their dynamical stability with the dynamical simulation tools described herein.
|Curriculum Vitae and Selected Publications for the PI|
|Timing and Performance Data|