A REQUEST FOR NSF PACI RESOURCES
Joel E. Tohline
Department of Physics & Astronomy
Louisiana State University
|This entire document, along with several illustrative animation sequences is available on-line at the address http://www.phys.lsu.edu/ faculty/ tohline/ NRAC98.proposal/|
1. Summary of Proposed Research, Including Objectives and Goals
Observational investigations of the frequency of occurrence of pre-main-sequence (PMS) binary stars have reinforced earlier suspicions (Abt 1983; Duquennoy & Mayor 1991) that "binary formation is the primary branch of the star-formation process" (Mathieu 1994). That is to say, stars tend to form in pairs rather than as single objects. Every indication is that, even in the PMS phase of stellar evolution, binaries with total masses less than or equal to three solar masses exhibit a wide range of orbital periods (days to at least 105 yr) with a median period of a few hundred years corresponding to a median separation on the order of 50 AU (astronomical units). Why and how do stars preferentially form in pairs? This is the fundamental question that motivates the proposed research activities described here.
As Bodenheimer, Ruzmaikina, and Mathieu (1993) have reviewed, a number of different theoretical scenarios have been proposed to explain the preponderance of binary stars. The ones given most attention over the past two decades are: Capture; direct fragmentation; disk fragmentation; and fission. Because either three-body encounters or close encounters that increase the importance of tides are required to make the capture process work, in normal star formation environments capture alone does not appear to provide a satisfactory explanation. (Note, however, that the presence of disks may enhance capture rates in dense clusters as pointed out by Larson 1990 and Clarke & Pringle 1991.)
As Shu, Adams and Lizano (1987) have summarized, it seems likely that binary systems with orbital periods between a few hundred and a few million years result from the direct fragmentation of a protostellar cloud early in its phase of collapse (at cloud densities n ~ 103 - 1010 cm- 3). Hydrodynamic simulations performed, for example, by Bodenheimer, Tohline & Black (1980), Boss (1980, 1986), Bodenheimer & Boss (1983), Miyama, Hayashi, & Narita (1984), Monaghan & Lattanzio (1991), Miyama (1989, 1992) and Truelove et al. (1997) illustrate how this might occur. But at higher densities, clouds are unable to cool efficiently upon contraction and, consequently, direct fragmentation becomes problematical. Because higher mean densities are directly associated with systems having shorter dynamical times, one is led to investigate mechanisms other than direct cloud fragmentation for forming binary systems with orbital periods less than a few hundred years.
Through his ongoing NSF-funded project (AST-9528424), the PI is carrying out a variety of different numerical simulation projects in an effort to understand whether disk fragmentation and/or the fission of rapidly rotating protostars produce short-period binary star systems under physical conditions that are expected to exist naturally in protostellar environments. The research for which we are requesting PACI resources focuses on the fission problem.
Before discussing either the classical fission hypothesis for binary star formation, or what we consider to be a more reasonable, modified version of that hypothesis in view of recent simulations, it is useful to review what is known about allowable equilibrium configurations for rapidly rotating protostars. Consider, first, all rotating objects that are precisely ellipsoidal in shape, having principal axes a, b, and c. Furthermore, assign these labels such that a always refers to the object's longest axis and c is the axis about which the object rotates. Figure 1, with coordinate axes 0 £ b/a £ 1 and 0 £ c/a £ 1, then defines a two-dimensional parameter space that contains all such ellipsoidal figures. For example, the top-right corner of the box in Figure 1 (point S) defines a sphere ( b/a = 1; c/a = 1 ); the right-hand edge of the box ( b/a = 1; 0 £ c/a £ 1 ) identifies all possible (axisymmetric) oblate spheroids; and a diagonal line connecting point S with the bottom-left corner of the box identifies all (axisymmetric) prolate spheroids ( c/b = 1 ).
As Chandrasekhar (1969) has reviewed in detail (see also Durisen and Tohline 1985), if protostellar objects are assumed to be self-gravitating,
Physically meaningful evolutionary sequences also can be diagrammed within the parameter space defined by Figure 1. For example, as an initially slowly rotating (nearly spherical) protostellar gas cloud gravitationally contracts toward a mean density and size that is typical of a hydrogen-burning (main-sequence) star, it will become more and more rotationally flattened if it conserves its mass and angular momentum. The sequence of oblate spheroidal models along the right-hand edge of the box in Figure 1 (between points S and E, for example) describes well this phase of the protostar's early evolution. The curve in Figure 1 that connects the right-hand edge (at point E) to the lower-left corner of the box illustrates another possible evolutionary path (for a protostar that is already substantially flattened due to rotation) as it traces out a sequence of equilibrium configurations having constant vorticity. This sequence of genuinely triaxial ellipsoids is said to "bifurcate" from the axisymmetric sequence at point E. The comprehensive, 19th-century analytical work of Jacobi, Dedekind, and particularly Riemann (again, see Chandrasekhar 1969 for a review) has demonstrated that a variety of constant vorticity sequences bifurcate from the axisymmetric sequence at different locations. For our purposes here, the curve drawn in Figure 1 will serve to illustrate any one of a number of these different sequences. If, at the point of bifurcation, the ellipsoidal sequence defines a configuration that has the same mass and angular momentum, but a lower total energy than the adjacent configuration on the axisymmetric sequence, then after reaching point E the energetically preferred evolutionary path will be one that takes the contracting protostar away from the axisymmetric sequence.
A careful examination of equilibrium models along sequences of progressively more elongated ellipsoids (such as the constant vorticity sequence illustrated in Figure 1) shows that these sequences exhibit additional points of bifurcation. Each point of bifurcation identifies the first model belonging to a new sequence of objects that has a "higher order" surface distortion. For example, analytical work on incompressible objects (Chandrasekhar 1969) clearly has identified bifurcation points that lead to equilibrium sequences of pear-shaped and dumbbell-shaped configurations. Point D in Figure 1 shows where bifurcation to a dumbbell sequence occurs along one illustrative, constant-vorticity sequence of ellipsoids. Figure 2 (taken from the review by Durisen and Tohline 1985) illustrates furthermore how the surface distortion of equilibrium figures becomes more and more extreme as one moves along the dumbbell sequence away from the initial point of bifurcation: There is a smooth transition from the initially ellipsoidal figure, to figures with a pronounced dumbbell shape, then to fully detached binary star structures.
(i) Classical Fission Hypothesis
It is easy to see how a "fission hypothesis for binary star formation" emerges from this classical work on ellipsoidal figures of equilibrium. Imagine that as a rotating protostar undergoes slow contraction, it evolves from point S, through point E, to point D along the equilibrium sequences illustrated in Figure 1. Then, from point D it contracts along the dumbbell sequence through each of the geometrical configurations (a --> d) illustrated in Figure 2 until a well-separated binary star system has been formed. This is certainly an aesthetically pleasing picture of how (predominantly equal-mass) binary star systems might form. If the outlined evolutionary path proves to be the energetically preferred one under a variety of different protostellar cloud conditions, the fission hypothesis would also explain naturally why the majority of stars form as binary systems.
In reality, however, the picture is not this clear. Even when this classical fission hypothesis was discussed by mathematicians and astronomers a century ago, several problems were recognized:
(ii) Recent Problems Promoting the Fission Hypothesis
An animation sequence illustrating the nonlinear development of this two-armed, spiral-mode instability is provided online (see the http address at the top of this document) in a format indicated in the caption to the accompanying "Movie-1" icon. Several still frames from this movie are also displayed in Figure 3, following page 4 of this proposal. After the nonaxisymmetric distortion reaches a nonlinear amplitude, the evolution proceeds as follows: Via the trailing spiral structure, gravitational torques are able to effectively redistribute angular momentum on a dynamical time scale; a relatively small amount of material (5%) carrying a relatively large fraction of the system's angular momentum (20%) is shed into an equatorial disk; and the central object (containing most of the initial object's mass) settles down into a new, centrally condensed, equilibrium configuration. That is, "fission" into a binary star system as depicted schematically by frames a --> d of Figure 2 does not occur.
Figure 3 (Caption)
The top six frames (read, in sequence, from left to right) have been taken directly from "Movie-1," an animation sequence illustrating the nonlinear development of the two-armed, spiral-mode instability in a rapidly rotating protostar. More precisely, the initial model (1st frame) is an axisymmetric, n = 3/2 polytrope with a rotation profile given by an n' = 0 sequence (Durisen et al. 1986), and an initial ratio of rotational to gravitational energy T/|W| = 0.30. The separation in time between the first frame and the second is 7.66 cirps (central initial rotation periods); all other frame pairs are separated in time by approximately 2.75 cirps. Frame six shows the "final" steady-state configuration, 18.7 cirps into the evolution. This 47,500 time-step simulation was performed with a spatial resolution of 1283 primarily on the Cray T3E_600 at the SDSC, although a portion of it was performed on the Cray T3E_900 at the NAVOCEANO's MSRC.
The bottom two frames are from a later evolution, discussed in ¤3c of this proposal.
With regard to this last point, the results appear to be particularly significant. Simulations conducted with several independently developed gravitational hydrodynamics codes (Durisen, et al. 1986; Houser, Centrella, and Smith 1994) have each shown that the dynamical instability that arises in rapidly rotating, axisymmetric models of compressible gas clouds does not lead to the formation of a binary star system, as envisioned classically. Hence, as has been reviewed by Bodenheimer, Ruzmaikina, and Mathieu (1993), among the various ideas that have been proposed to explain the preponderance of binary stars, the "binary fission hypothesis" currently is out of favor on theoretical grounds.
New evidence suggesting that binary star formation may yet result from the "fission" of a rapidly rotating, protostellar gas cloud comes from several numerical simulation projects that have been conducted recently by the PI and his students in the context of his ongoing NSF-sponsored research.
(i) The Existence of a Robust, Ellipsoidal Structure with Nontrivial Internal Motions
As we describe more fully in ¤3c, below, by isolating this final ellipsoidal-shaped configuration from its surrounding disk and evolving it through an additional eight dynamical times (equal to two full pattern rotations) we have demonstrated that this object is not only a realizable steady-state configuration but it is also dynamically stable. An analysis of the simulation data also shows that this nonaxisymmetric, steady-state configuration contains a nontrivial internal flow. That is, although the configuration as a whole appears to be an ellipsoidal-shaped figure that is spinning coherently as a solid body, in reality the steady-state configuration also contains strong differential fluid motions internally. An animation sequence illustrating the 3D flow-field of the object is provided online in a format indicated in the caption accompanying the "Movie-2" icon. In the first frame of Movie-2, a vertical sheet of test particles has been aligned with the major axis of the ellipsoidal configuration. The subsequent motion of these particles illustrates that there is relatively little vertical fluid motion, but the angular velocity of the fluid varies considerably with radius and azimuthal location.
(ii) Sequences of Equilibrium, Nonaxisymmetric Disks with Nontrivial Internal Motions
Approximately a decade ago, Hachisu (1986b) developed a self-consistent-field technique that permitted him to construct equilibrium configurations of self-gravitating objects with a wide variety of nonaxisymmetric shapes and nontrivial internal motions if the structures are assumed to be incompressible fluids. Working with the PI, LSU graduate student Saied Andalib recently has developed a technique by which nonaxisymmetric, equilibrium models of self-gravitating, compressible configurations can be constructed with nontrivial internal motions (Tohline and Andalib 1997; Andalib 1998). In order to accomplish this task, Andalib has adopted the following two simplifications: (a) The models are two-dimensional, in the sense that they are either infinite in extent or infinitesimally thin in the vertical direction; and (b) each system is assumed to have uniform vortensity, where vortensity is defined at each point in the fluid to be the ratio of vorticity to density (see, for example, Lin and Papaloizou 1989).
Tieing this new work on nonaxisymmetric equilibrium structures in with the studies discussed above in connection with the "two-armed, spiral-mode instability" that arises in rapidly rotating axisymmetric structures we suspect the following (Tohline, Cazes, & Cohl 1998):
With the NSF PACI resources being requested here, we propose to investigate thoroughly both of these conjectures. (The details of our proposed effort are presented in ¤2b, below.) If we successfully demonstrate the viability of this second conjecture in particular, we will be in a position to explain for the first time how and why dense protostellar gas clouds are able to form binary stars during phases of slow, quasi-adiabatic contraction.
(iii) The Merger (or Lack Thereof) of Close, Equal-Mass Binaries
Because a primary objective of our ongoing research efforts is to understand how binary star systems form, it is important to demonstrate that the gravitational CFD simulation techniques being employed by the PI are capable of spatially resolving and accurately representing the equilibrium structure of binary stars. Recently, the PI has published the results of an extensive study of the stability of close, equal-mass binary star systems that exhibit a variety of different degrees of gas compressibility (New and Tohline 1997). In essence, this was a numerical investigation designed to determine whether or not, in realistic astrophysical situations, evolution will ever occur backwards in Figure 2 (i.e., from frame d to frame a), resulting in the merger (or destruction) of a binary star system. In connection with this proposal, we stress two key results that have emerged from this published investigation:
With the NSF PACI resources being requested here, we propose to explore more completely the relationship between this binary merger problem and the fission problem. (The details of our proposed effort are presented in ¤2b, below.)
2. Computational Methodology/Algorithms
Our primary dynamical simulation tool is an algorithm designed to perform multidimensional computational fluid dynamic (CFD) simulations in astrophysical settings. It is a High-Performance-Fortran (HPF) algorithm that is a finite-difference representation of the multidimensional equations governing the dynamics of inviscid, self-gravitating, compressible gas flows. More specifically, the employed finite-difference algorithm is based on the second-order accurate, van Leer monotonic interpolation scheme described by Stone and Norman (1992), but extended to three dimensions on a uniform, cylindrical coordinate mesh. The fluid equations are advanced in time via an explicit integration scheme (often referred to as "direct numerical simulation") so an upper limit to the size of each integration time step is set by the familiar Courant condition. At each time step, the Poisson equation is solved in order to determine, in a self-consistent fashion, how the fluid is to be accelerated in response to the fluid's own self-gravity. Complete descriptions of our algorithm to solve the coupled CFD + Poisson equations can be found in Woodward, Tohline, & Hachisu (1994) and in New (1996). In most of our recent simulations, the Poisson equation has been solved implicitly using a combined Fourier transformation and iterative ADI (alternating- direction, implicit) scheme.
As discussed in ¤1 of this proposal, our ongoing NSF-funded research is aimed at understanding to what extent various rapidly rotating, self-gravitating structures -- such as protostars, accretion disks, and close binary stars -- are dynamically stable. Utilizing the gravitational CFD code just described, our broad objective is to examine the development of nonaxisymmetric structures that arise spontaneously as "normal modes" in such systems. It is important in such studies that we have the capability to construct accurate equilibrium structures as initial input into each dynamical simulation so that the simulations are not dominated by dynamical transients that depend heavily on initial conditions. Drawing on the work of Hachisu (1986a,b), we have successfully developed a robust self-consistent-field code (hereafter, HSCF code) that readily generates nonaxisymmetric as well as axisymmetric equilibrium structures as input to our gravitational CFD code. We have made an axisymmetric version of this code available to others via the world wide web ( http://www.phys.lsu.edu/ astro/ H_Book.current/ Applications/ Structure/ HSCF_Code/ HSCF.outline.shtml). To date, most of our investigations have focused on structures that have relatively simple angular momentum distributions, but as discussed above in connection with the fission problem, we are extending the capabilities of the HSCF code to include the generation of nonaxisymmetric equilibrium structures with nontrivial internal motions.
When performing multidimensional CFD simulations, it is always important to develop an efficient technique by which the numerical results can be readily analyzed for physical content. Because three-dimensional CFD simulations in particular generate enormous amounts of data, and the computational efficiency of modern high- performance- computers is quickly degraded by any significant I/O demands, generally it is preferable to analyze the data "on the fly" rather than trying to save the data to disk for future, post-processing. As described more fully below, we have constructed a heterogeneous computing environment through which we are able to routinely create volume-rendered images and 3D animation sequences of our evolving protostellar cloud configurations. (All of the "Movie" sequences referred to in ¤1 of this proposal have been produced in this fashion.) Relatively small amounts of data are regularly passed across the network from the parallel machine that is performing the CFD simulation to a machine that is more suited to visualization tasks; images that will become part of an extended animation sequence are created from this data; then the 3D data sets are immediately destroyed in order to minimize the impact on disk storage. At the SDSC, we have utilized the scriptable attributes of Alias|Wavefront (TM) software on middle- and high-end SGI systems in order to perform these critical visualization tasks. This visualization tool and the heterogeneous computing environment in which it functions (see ¤3b for a more detailed description) will continue to provide a critical component of our ongoing gravitational CFD simulations at the SDSC.
We propose to perform the following well-defined simulation tasks in order to examine to what extent the ideas outlined in ¤1b, above, contribute to a more complete understanding of the binary star formation process in rotating protostellar gas clouds. If our educated speculations based on recent simulations prove to be correct, we will be in a position to demonstrate conclusively why and how stars preferentially form in pairs.
We propose to "cool" the object by imposing at each time step a very small reduction in the gas pressure uniformly across all fluid elements. The reduction will be at a rate such that, over a period of 8 cirps (central initial rotation periods), a spherically symmetric object of the same size would be expected to contract to half its initial radius. In order to ensure that the system is at no time far from equilibrium and that the model has ample opportunity to react in a dynamically correct way to any new instability that might arise, the virial energy of the system will be monitored continuously and, at several different times during the cooling evolution (for details, see the discussion associated with Table 2 in ¤4, below), we will turn off the cooling process for approximately one pattern rotation period (i.e., about 4 cirps). This technique should permit the object to contract slowly, conserving its total mass and angular momentum, and evolve quasistatically along an equilibrium dumbbell-binary sequence, if such a sequence is available to it energetically.
This simulation will be performed initially in a coordinate frame that is rotating with the measured pattern speed of the ellipsoidal configuration, thereby introducing a minimal amount of numerical viscosity into the simulation. Each time the cooling process is temporarily stopped, the angular velocity of the coordinate frame will be modified to coincide with the central system's rotational pattern speed at that point in its evolution.
The simulation will be performed on a grid lattice that is twice the size of the 1283 lattice that was utilized in our simulation of the two-armed, spiral-mode instability (shown above in Movie-1 and Figure 3). During the "Movie-1" simulation, a two-fold azimuthal symmetry was imposed on the physical system in order to take maximum advantage of the computational grid resolution. (Previous work on these types of systems had indicated that they were only unstable to azimuthal modes with two-fold symmetry, so this symmetry constraint was a reasonable one to impose.) This symmetry constraint will be removed during our cooling simulation (and, hence, a doubling of the lattice size will be required in the azimuthal direction) in order to test whether the evolving ellipsoidal-shaped object is susceptible to the growth of odd azimuthal mode distortions, such as a pear-mode distortion as discussed earlier in the context of incompressible systems.
We should emphasize that as our system slowly cools, it will be evolving with an adiabatic exponent (ratio of specific heats) g = 5/3. Hence our simulation will not be susceptible to the "artificial Jeans instability" that has caused problems previously in many isothermal collapse calculations (Truelove et al. 1997). In our ellipsoidal-shaped equilibrium structure, the Jeans length is approximately equal to the radius of the object, so a grid lattice of 128 x 128 x 256 in (R,Z,q) will ensure that the proper physical Jeans length is very well resolved throughout the simulation.
Although this issue opens up a parameter space that is potentially very large, we are guided by the recent work of Andalib (1998) to consider next an axisymmetric model that has uniform vortensity. Using our self-consistent-field code, we propose to construct a sequence of rotationally flattened, n = 3/2 polytropes that are in axisymmetric equilibrium but have a variety of different values of the vortensity, measured as a ratio of the fluid vorticity to the flattened cloud's surface density. We will then select a model along this sequence that is only marginally unstable to an m = 2 (ellipsoidal or two-armed spiral) mode instability and follow its evolution through the nonlinear development of this instability.
If this system evolves in a manner that is analogous to the "Movie-1" evolution and forms a new, steady-state, nonaxisymmetric configuration, then we propose to follow its evolution through a subsequent phase of slow cooling, as described above.
The recent study by New and Tohline (1997) has indicated that common-envelope binary systems with a soft equation of state are stable against merger. As pointed out above, this may help explain why the binary state ultimately is the preferred physical state for protostellar gas clouds. However, New and Tohline only examined equal-mass binary systems. The study of unequal-mass systems quickly becomes more complicated, not only because an additional parameter is introduced into the choice of models to be studied -- the initial system mass ratio q -- but also because one object can easily fill its "Roche lobe" before the other and a situation quickly arises in which asymmetric mass exchange occurs. In systems with uniform specific entropy, it is expected that the less massive component will rapidly transfer most of its mass to the more massive component (or into a large common-envelope) thereby driving the system away from (rather than toward) the value q = 1.
If binary stars are formed via the modified fission scenario outlined above, then they will evolve through an early common-envelope (dumbbell-shaped configuration) phase and would be equally susceptible to a mass-transfer instability that drives the mass ratio away from unity. We therefore propose to extend the New and Tohline investigation to binary configurations that have values of q close to, but not equal to 1, in order to determine whether the observed stability of close binary systems is sensitive to the initial system mass ratio. Evolutions covering as few as 4 initial orbital periods should be sufficient to answer the question of dynamical stability.
3. Preliminary Results and Progress to Date
Taking advantage of our acquisition of an 8K-node MasPar at LSU (see ¤5, below), in December 1991, we rewrote our gravitational CFD code in mpf (MasPar Fortran), which is a language patterned after Fortran 90 but includes certain extensions along the lines of high-performance Fortran (HPF). Utilizing the MP-1, we also have gained valuable experience in the development of parallel algorithms to efficiently solve the Poisson equation in curvilinear coordinate systems (Cohl, Sun & Tohline 1997). Numerous gravitational CFD simulations have been performed successfully over the past six years on the LSU MP-1 system and on an MP-2 platform at the Scaleable Computing Laboratory of the Laboratory at Iowa State University. Our algorithm has achieved a very high degree of parallelism on the MasPar. Because our algorithm involves primarily nearest-neighbor communications, we have been able to take particular advantage of the hardware's X-Net communications network. As described in Cohl, Sun and Tohline (1997), our global solution of the Poisson equation incorporates a matrix transpose technique that utilizes both the X-Net and global router networks efficiently (Flanders 1982).
It was at LSU as well that we initially developed a heterogeneous computing environment through which our gravitational CFD data can be visualized "on-the-fly." Using communication strategies, such as unix sockets and PVM (Parallel Virtual Machine), at LSU the computational work load is distributed such that workstations perform the rendering of our data simultaneously with any ongoing CFD simulation on the MasPar. For example, the animation sequences labeled Movie-3 and Movie-4, above, were generated at LSU using the IDL software package running on Sun Microsystems workstations.
Over the past four years, our parallel CFD algorithm has been ported to the Thinking Machines CM-5 at NCSA and to the IBM SP2 at the Cornell Theory Center. We also have been able to implement an f77 serial version on the Cray Y/MP at NCSA and an F90 version on the Cray C90 at SDSC. Table 1 in Attachment C2 documents the performance of our CFD code on these five different machine architectures. To date, performance results on the CM-5 and the SP2 have not been sufficiently good to warrant migration of our production runs away from LSU's MP-1 system. As discussed below, however, the Cray T3E architecture appears to be well-suited to our existing parallel algorithm.
(i) Porting the Gravitational CFD Code to the T3E
Early in 1997, the PI applied for and was granted 19,200 SUs on the Cray T3E at the SDSC. After review, on April 1 of this year, an additional 20,000 SUs was added to our allocation and the project was extended through March 31, 1999. (Our current request for PACI resources will build directly upon this successful, ongoing SDSC project.) During the first couple of months of this project, we focused our efforts on porting our gravitational CFD code to the T3E and documenting its performance. Using the available Cray HPF compiler (PGHPF, developed by the Portland Group), we found that relatively few unexpected code modifications were required in order to successfully compile and execute our code on the SDSC system. (This was because we already had had experience successfully modifying the code to execute on the CM-5 using cmf and on the Cray C90 using the Cray Fortran90 compiler.) In June, 1997, we submitted a technical report entitled, "Early Experiences with the Cray T3E at the SDSC," to Jay Boisseau at the SDSC in which we fully documented how well our code was performing on the T3E. This technical document is included as Attachment C2 to this proposal. In summary, we found that:
On the down side, we also discovered that the PGHPF compiler was producing executable code that performed computational instructions on each T3E processor with extraordinarily low efficiency. That is, as a parallel application, the code was scaling well, but its overall efficiency measured relative to the published peak-performance capabilities of the machine was poor. By way of illustration, we show in Table 1a the measured MFlop rating of our CFD code on a single processor of the T3E_600 at the SDSC. For two different lattice sizes (one with and one without power-of-two arrays), the PGHPF-compiled code runs at about 13 MFlops, which is only 2% of the published peak rating of 600 MFlops.
|Test||Compiler||Streams||Execution Speed (MFlops)|
64 × 32 × 32
67 × 34 × 32
(The numbers in Table 1a were obtained using "pat," a performance monitoring tool that only became available to us at the SDSC in September, 1997. The report from which these numbers were drawn accompanies this proposal as Attachment C1.) Clearly there was plenty of room for improvement, but it was equally clear from discussions with SDSC operators and other T3E users that our performance measures were not unusual. Throughout its first year of general availability, users have been reporting "typical" T3E single-processor performance ratings of anywhere from 4 to 100 MFlops (John Levesque [APRI], private communication; see also "The Benchmarker's Guide to Single-processor Optimization for CRAY T3E Systems" published by the Benchmarking Group at Cray Research). As discussed below in subsection iii, however, we are investigating ways in which the code's performance can be improved.
(ii) Establishing a Heterogeneous Computing Environment
Working with several staff members in the SDSC Vislab during the summer of 1997, we successfully established a heterogeneous computing environment (HCE) at the SDSC through which the results of our gravitational CFD simulations can be routinely visualized. Extending the ideas that we first developed at LSU, we have established a procedure by which, at predetermined time intervals during a simulation, the CFD code running on the T3E writes out a 3D density file and sends a signal to the Vislab's SGI Power Onyx/Infinite Reality Engine indicating that a new data file is available for processing. Having received the signal from the T3E, the graphics engine executes a previously written shell script which (i) searches through the data to locate four separate isodensity surfaces and generates a wire-frame structure (i.e., vertices and polygons) that corresponds to each of these 3D surfaces; (ii) executes an Alias|Wavefront script which renders the set of wire-frame structures as a multiply nested, reflective and colored transparent set of isosurfaces; (iii) stores on disk this 2D, rendered image of our evolving model at one particular time step in the simulation; then (iv) deletes the original 3D data set and wire frame models to conserve disk space. This entire task is continually re-initiated as data is generated during an extended CFD simulation on the T3E. The net result of this process is a stored sequence of 2D images illustrating the evolution of our physical system. The series of 2D images is transferred via the internet to LSU where it can be displayed as a movie, allowing us to rapidly interpret the dynamics of the systems that we are examining.
The animation sequences labeled Movie-1, Movie-2, Movie-5 and Movie-6 in this proposal (see also Fig. 3) were generated in this manner. (An HTML-based article that describes this HCE in detail has been prepared for submission to Computers in Physics. Through this on-line document -- http: //www.phys.lsu.edu/ faculty/ tohline/ CIP/ article.html -- the interested reader may download a copy of the unix shell scripts that we have developed to seamlessly link the T3E to the SGI Onyx and carry out the tasks outlined here.) We consider the successful establishment of this environment which smoothly couples the high-end, SGI visualization tools at the SDSC with our simulation efforts on the T3E to have been a significant accomplishment this past year.
(iii) Potential Improvements in Code Execution Efficiency
As is explained in ¤6, below, the PI recently has gained access to some of the computing facilities at the NAVOCEANO MSRC in Stennis, Mississippi. Via access to the T3E_900 and to several onsite workshops at the NAVO MSRC, the PI and his students have gained a much better appreciation of the way a single processor on the T3E handles the loading and storing of data arrays and the streaming of data into floating-point registers. By making adjustments in the stride of our principal data arrays and by taking advantage of some compiler/environment options, but without making any changes in the implemented HPF algorithm, we are able to generate executable code with a substantially higher execution efficiency than originally reported. As the data in Table 1b indicates, for example, by switching the STREAMS environment variable "on" and avoiding power-of-two array dimensions, through the Cray F90 compiler our code's single-processor execution speed can be increased from 7.74 MFlops to 38.2 MFlops, that is, by a factor of 5!
|NAVOCEANO MSRC: T3E_900|
|Test||Compiler||Streams||Execution Speed (MFlops)|
64 × 32 × 32
67 × 34 × 32
At present it is less obvious how improvements of this magnitude can be gained when using the PGHPF compiler. Hence, it is unclear how such speedups can be realized on parallel HPF applications. Based on the fact that, with relatively little effort, our current CFD code can achieve 38 MFlops on a single node of the T3E_900, we are optimistic that we will soon be able to increase the execution efficiency of our parallel application on the T3E_600 at the SDSC by a factor of two to three.
Following another line of attack, it has been clear to us for quite some time that one of the most computationally demanding components of our current gravitational CFD code is the algorithm used to calculate boundary values for the gravitational potential that must be fed into the Poisson solver in order to be able to solve for the gravitational potential throughout the volume of the fluid. Historically, we have used an algorithm based on a multipole expansion in terms of spherical harmonics (Tohline 1980). In an effort to improve the execution speed of this component of our code, we have been developing an alternate algorithm based on cylindrical Bessel functions because Bessel functions are much more naturally suited to the (cylindrical) geometry of our chosen computational grid. Within the past month we have completed the development and testing of a parallel version of this algorithm. It will be used in all of our future gravitational CFD simulations because it is both faster and more accurate than the one based on a multipole expansion in terms of spherical harmonics. Most importantly, though, its implementation will permit us to bring the top (and bottom) cylindrical grid boundary down closer to the rotationally flattened mass distribution and reduce the computational lattice size by a factor of two or three in the vertical direction.
With these improvements in mind, we are confident that the PACI resources being requested will be efficiently utilized and will permit us to complete a number of substantial and significant simulations over the coming year.
We began our first year of activity on the T3E at the SDSC, by carefully modeling the following two unstable physical systems whose behavior had been fairly well studied previously: (1) A "massless" accretion disk surrounding a point-like protostellar object. (2) A rapidly rotating protostellar object that was subject to the two-armed, spiral-mode instability described above. [An animation sequence illustrating the results of the first of these simulations is available online in Quicktime format as " Movie-5". The second has been illustrated above in Fig. 3 and "Movie-1."] Because the first of these models was unstable to a "Papaloizou-Pringle" instability, we were able to use it to demonstrate that our nonlinear gravitational CFD code produces nonaxisymmetric growth rates that match the predictions of linear theory (Papaloizou and Pringle 1984; Frank and Robertson 1988; Kojima 1986; Andalib et al. 1997) to very high precision. By modeling the second system at a spatial resolution that heretofore has been computationally unachievable, we have confirmed the general results of earlier simulation efforts (Durisen et al. 1986) and have produced a spatially well-resolved, rapidly rotating ellipsoidal-shaped structure whose relative stability can now be examined in the context of the fission hypothesis for binary star formation.
In connection with our "proposed simulation A," described in ¤2b above, we already have isolated the central ellipsoidal configuration from its surrounding disk; placed the configuration in a coordinate grid that is rotating with the figure's pattern speed and relaxes the two-fold azimuthal symmetry that had been enforced on the earlier portion of the evolution, thereby also doubling the size of the computational lattice; and evolved the isolated system through two full pattern rotations (8 cirps). As discussed in section 1bi, above, this has demonstrated to us that the configuration as displayed at the end of the "Movie-1" animation (see frame 6 of Fig. 3) is extremely robust and stable with respect to the development of any odd azimuthal mode distortions (such as the pear-mode instability). We also have carefully examined the velocity profile exhibited by the configuration's internal, differential fluid flow and compared it with the differential flow exhibited by the equilibrium models with uniform vortensity that have been constructed by Andalib (1998) using self-consistent-field techniques. (See our "Movie-2" animation and the discussion above in ¤1b.) As we have explained recently in a conference presentation (Tohline, Cazes, and Cohl 1998), the flows are very similar, lending support to our conjecture that our robust ellipsoidal configuration is but one equilibrium figure along a sequence of more and more distorted configurations that may be reached through a slow, quasi-adiabatic cooling evolution.
Very recently, we have also made a first quick pass at cooling the configuration. The results have been very tantalizing. The object is able to maintain a dynamical equilibrium and its geometric shape does indeed gradually become more distorted as the system slowly contracts. As the last two color frames of Fig. 3 show, at 32 cirps into the evolution, a pair of off-axis density maxima have become well defined inside a somewhat dumbbell-shaped common-envelope; then at 35 cirps, the highest density region (colored in green) shifts entirely to one side of the configuration, suggesting the development a pear-mode instability or a binary with unequal masses. We are now anxious to follow this up with a much more methodical and gradual cooling evolution (as proposed above) in order to ascertain which of these two is the preferred, lowest energy equilibrium state.
In connection with our "proposed simulation B," described in ¤2b above, using standard 2D self-consistent-field techniques we already have constructed a sequence of more and more rotationally flattened, axisymmetric objects that have an n = 3/2 polytropic equation of state and uniform vortensity. The model along this sequence with T/|W| = 0.30 has also been evolved far enough to indicate that it is unstable to a two-fold (bar-like or two-armed spiral) nonaxisymmetric structure, but the development of this instability has not yet been followed to nonlinear amplitudes.
In connection with our "proposed simulation C," using our HSCF code we have successfully constructed several equilibrium sequences of close, but detached binary polytropes having unequal masses. Initially, we have scrutinized most closely the properties of systems with mass ratio q = 0.81, synchronous rotation, and uniform specific entropy throughout both stellar components of the system. As a critical test of the capabilities of our techniques, one of these systems in which the less (more) massive component fills 95% (85%) of its Roche lobe was evolved hydrodynamically through two complete orbits. Reflecting the inevitable slight mismatch between the definition of an equilibrium in our initial model-generating (HSCF) code and in our fully hydrodynamical (gravitational CFD) code, a small amount of material was exchanged between the upper layers of both stars, but in all other respects the binary system maintained its steady, equilibrium state. (An animation sequence illustrating the results of this test simulation in a frame of reference rotating with the same frequency as the binary orbit is available online in Quicktime format as "Movie-6".)
Publications resulting directly from this work, to date, at the SDSC are:
4. Justification of Resources Requested
We are requesting an allocation of 80,000 SUs on the Cray T3E at the SDSC as well as continued access to the SGI platforms in the SDSC Vislab. These platforms are suitable because, as has been detailed above in ¤3 and in Attachment C2, over the past year we have (i) demonstrated that our primary HPF simulation tool scales well on the T3E architecture; (ii) shown that we understand, in principle, how significant improvements may be made in the execution speed of our application code on the T3E; and (iii) developed a heterogeneous computing environment that smoothly integrates the unique capabilities of the T3E with those of the SGI platforms in the Vislab.
Code Requirements and Optimization Issues
Over the past year, we have been performing gravitational CFD simulations with lattice resolutions of 1283 and selecting the physical problems to be studied accordingly. At this resolution, our problem will not fit on fewer than 32-nodes of the SDSC T3E -- that is, the problem requires at least 4 GBytes of RAM. However, when the visualization tasks (being carried out simultaneously on the SGI Onyx) are considered as an integral component of each CFD simulation, a more ideal load-balancing is achieved if we run each simulation on 64 nodes of the T3E. In this configuration, typically 14 minutes of wall-clock time are required to evolve the system forward in time by 1/60 of a rotation period and slightly under 10 minutes of wall-clock time are required to complete the volume-rendering of one 3D data set. Hence, the SGI Onyx and the T3E are kept approximately equally busy generating 60 movie frames per rotation period and a comfortably smooth animation sequence is produced during each run. When we double our current lattice size in order to handle the simulation tasks outlined in ¤2b of this proposal, we will not be able to run on fewer than 64-nodes of the T3E. Depending on how much the visualization tasks slow down as a result of this larger problem size, we may find it advisable to occasionally utilize 128-nodes; but it is unlikely that our planned simulations will require the full machine.
Proposed Production Runs
In ¤2b of this proposal we described in detail the simulation tasks that we would like to complete over the coming year utilizing PACI resources. Following up on that discussion -- as well as the discussion in ¤3 of our progress, to date -- our specific request for T3E resources at the SDSC is detailed in Table 2. The letter label that has been adopted for each proposed simulation in this table follows the outline used in our earlier discussion of the respective problems in ¤2b. The column labeled SU/tcirp lists the approximate number of service units required per unit of physical time (measured here in cirps), as determined from prior experience and adopting a default single-processor execution rating of 13 MFlops (see Table 1a); the column labeled tcirp estimates how long each evolution will have to be run, in cirps; and the estimated SUs required to complete each proposed simulation task is obtained by multiplying column 1 by column 2.
For simulation task "A," we propose to separate four evolutionary periods of slow cooling (of duration 4 cirps, 2 cirps, 1 cirp, and 1 cirp, respectively) with four periods (each of duration 4 cirps) during which the cooling will be turned off. Hence, the need to follow the evolution for 24 cirps. For simulation task "B," we expect to follow the initial phase of development of the spiral-mode instability through 22 cirps before a new, nonaxisymmetric equilibrium configuration has been formed; this will be followed by an 8 cirp evolutionary phase in which the configuration is isolated in a rotating frame of reference; and, finally, a cooling evolution of exactly the type described for task "A" will be conducted. For simulation task "C," we propose to follow three separate binary systems (systems with two different mass ratios and, for one of those ratios, systems at two different initial binary separations) through 4 orbital cycles (i.e., 4 cirps each), requiring a total of 12 cirps. Using our present simulation tools, then, the SUs required to complete all of these simulations totals 159,000.
|A.||Slowly cool ellipsoidal-shaped configuration.||2000||24||48,000|
|B.||Examine stability of uniform vortensity model.|
|-||Follow spiral-mode instability in initial axisymmetric object.||500||22||11,000|
|-||Shift to rotating frame of reference.||2000||8||16,000|
|-||Cool final configuration slowly.||2000||24||48,000|
|C.||Examine the stability of 3 unequal-mass binary systems.||3000||12||36,000|
|Total (default) requirement.||159,000|
Now, as we argued in ¤3, very shortly we should be able to improve the execution efficiency of our gravitational CFD code by at least a factor of 2, either by adopting a more accurate and efficient technique for calculating the gravitational potential on the boundary of our computational lattice or through direct optimization of the current HPF code. If we achieve this goal, the SUs required to complete all of the simulation tasks outlined in Table 2 will be approximately 80,000. Hence, our formal request is for 80,000 SUs on the T3E_600 at the SDSC and the burden is on us to achieve the factor of two improvement in code efficiency.
Performance Monitor Results
As discussed earlier, utilizing "pat" -- a performance analysis tool that accesses the hardware performance monitors on the T3E -- we have been able to document the single-processor performance of our gravitational CFD code on the T3E. Numbers drawn directly from "pat" are shown in Tables 1a and 1b; they are also listed in Attachment C1 to this proposal.
As explained in detail above, we rely heavily on the SDSC Vislab facilities to routinely analyze our simulation results. In order for this project to be a success, therefore, it is essential that we continue to have access to those facilities. It is extremely important that the Alias|Wavefront software continue to be made available on the SGI Onyx as well.
We should point out that the manner in which disk storage is presently controlled and maintained at the SDSC is a problem for us. A single 3D data array in our gravitational CFD simulations is 16 MBytes and a "continuation file" which contains the minimum amount of information that is necessary to start or continue an evolution is approximately 100 MBytes. Therefore, in connection with the T3E, it is very difficult to work with a maximum personal storage allocation of only 10 MBytes. In the Vislab, data storage -- even on scratch disks -- appears to always be at a premium. We have worked very hard to develop a visualization tool (described in detail above) that preserves a bare minimum of data, but in order to make it work properly the tool must have access to a reasonable amount of scratch disk space on the SGI Onyx during each T3E simulation run. Frequently during an overnight run, the automatic visualization tool will fail to function simply because the scratch disk has been completely filled up by other users. We strongly encourage the SDSC staff to correct these problems with disk storage and maintenance.
5. Local Computing Environment
In 1991, through an equipment grant from the Louisiana State Board of Regents, the LSU Department of Physics and Astronomy acquired an 8,192-node, SIMD-architecture MasPar MP-1 to support the development of parallel algorithms for a variety of computational physics and chemistry projects. The architectural topology of the MP-1 is that of a 2D torus characterized by a 128 x 64, two-dimensional grid of MasPar-produced processing elements. The MP-1 also incorporates a two-tier processor communication strategy which includes rapid nearest neighbor communication via X-Net communication and efficient global communication performance using a tree- nested global router. Each processing element contains 64 Kbytes of local dynamic RAM, that is, the total memory on LSU's MP-1 system is 0.5 GBytes. As described in ¤4, below, we have taken full advantage of this parallel hardware platform to perform a variety of astrophysical simulations at moderate spatial resolutions. Our computing strategy is to continue to use this machine to conduct "preliminary" calculations over a broad range of the available physical parameter space, then to study in detail the dynamical evolution of selected systems at higher spatial resolution on the Cray T3E at the SDSC.
The LSU Department of Physics and Astronomy also supports a large cluster of (over 45) Sun Microsystems workstations. Sparc 10 and Sparc 2 workstations that are integrated into this cluster, plus a NeXT Dimension workstation and a recently acquired SGI Octane provide the PI and his students with graphical unix interfaces into LSU's campuswide ethernet system and the internet. Working with several institutions in SURA (the Southeastern University Research Association) and with NSF funding, LSU recently has upgraded its off-campus internet connections from a pair of T1 lines to a T3 bandwidth. Within the past few weeks, LSU also has been notified by NSF that it is now a vBNS-certified institution. This certification brings LSU closer to Internet-2 status, which should significantly improve the PI's ability to perform the simulation tasks outlined in this proposal at the SDSC.
6. Other Supercomputer Support
Having recently been designated a Major Shared Resource Center for the Department of Defense, during 1997 the Naval Oceanographic Offices (NAVOCEANO) at the Stennis Space Center in Mississippi began to purchase and install a variety of high- performance- computing platforms and visualization architectures. In particular, they have acquired a Cray T3E and an SGI Onyx with supporting compilers and software tools to create a supercomputing environment that resembles the one to which we have grown accustomed at the SDSC. In part because of LSU's close proximity to Stennis (the Stennis Space Center is only 2 hours away from LSU by car), in August, 1997, the PI and his students were invited to utilize the MSRC resources at a modest level of activity through the center's Programming Environment and Training (PET) program. Close interactions with the MSRC technical staff already are proving useful in connection with our efforts to gain better single-processor execution efficiencies on the T3E and to develop visualization tools that will permit us to study in detail the velocity flow-fields that develop during our dynamical simulations.
Because the PI does not have a DoD-funded research project, he has not been granted a priority allocation of time on the high-performance-computing hardware at the NAVOCEANO's MSRC. In connection with the PI's NSF-funded research activities, therefore, it is critically important that a guaranteed allocation of supercomputing time be secured through PACI resources.
7. Project Team Qualifications
A major component of the PI's research work over the past twenty years has involved the large-scale numerical simulation of multidimensional gas flows in astrophysical systems utilizing a variety of different supercomputing and high- performance- computing platforms. In the early 1980s, the PI held a postdoctoral research position in group T-6 at Los Alamos National Laboratories at which time he broadened his exposure to state-of-the-art CFD techniques and learned to develop efficient algorithms for vector supercomputers. From the mid-1980s through the early 1990s, Tohline was PI on a large account at the Cornell Theory Center during which time he and his students experimented with the parallel implementation of CFD algorithms on the IBM 3090 and various accompanying FPS Array Processors. When the 8K-node MasPar MP-1 system arrived at LSU (see ¤5, above), the PI abandoned the CTC facilities because his group was having considerable success utilizing the MasPar system for its largest simulation efforts.
As has been detailed elsewhere in this proposal, in recent years the PI and his students have gained experience developing gravitational hydrodynamic algorithms for execution on a variety of parallel platforms including MasPar's MP-2, the CM-5, the SP-2, and the Cray T3E. The group also has developed and implemented a heterogeneous computing environment (at LSU, at the SDSC, and at the NAVOCEANO's MSRC) that tightly couples their parallel simulation efforts to sophisticated visualization tools. An accompanying attachment entitled, "Curriculum Vitae and Selected Publications," documents the PI's formal educational background and professional experience, and provides a list of ten significant publications that are particularly relevant to the project being proposed here.
John E. Cazes, Howard S. Cohl, and Patrick M. Motl are 5th and 6th-year graduate students enrolled in the Ph.D. physics program at LSU. Each is expected to be heavily involved in the simulation work outlined in this proposal. To date, John Cazes has been principally responsible for the migration and optimization of our CFD code, having successfully ported it to the CM-5, the SP-2, and the Cray T3E. He also has played the lead role in developing the heterogeneous computing environment through which our CFD simulations are routinely and automatically accompanied by 3D animation sequences. The astrophysics problem on which this project proposal is based also is the focus of John's Ph.D. dissertation research. Howard Cohl has been principally responsible for developing, implementing, and optimizing the Poisson solver that is tightly integrated into our CFD simulations. He has worked closely with John Cazes on the development of the heterogeneous computing environment through which animation sequences of our CFD simulations are routinely created. He is also playing the lead role in developing techniques by which the our 3D velocity flow-fields can be readily visualized. Patrick Motl recently has perfected the 3D self-consistent-field technique which will permit us to construct unequal-mass, equilibrium binary systems and then to test their dynamical stability with the dynamical simulation tools described herein.
|Form G: Request for NSF PACI Resources|
|Curriculum Vitae and Selected Publications for the PI|
|Timing and Performance Data|