A REQUEST FOR NSF PACI RESOURCES
Joel E. Tohline, Juhan Frank, & Patrick Motl
Department of Physics & Astronomy
Louisiana State University
tohline@physics.lsu.edu
This entire document, along with several illustrative animation sequences is available online at the address http://www.phys.lsu.edu/faculty/tohline/NRAC99/ 
1. Summary of Proposed Research, Including Objectives and Goals
Many classes of interesting astronomical phenomena are known for which the proposed astrophysical scenarios involve dynamical or thermal masstransfer in a close binary star system, sometimes leading to a common envelope phase. These models are invoked to explain either their origin, or a particular phase of their evolution, or their current state, or even their ultimate fate. We shall focus on relatively rapid phases of masstransfer, investigate the conditions required for stable/unstable masstransfer, and follow numerically their evolution in unstable cases. In many cases, the presentday systems are known to be undergoing longterm, stable masstransfer driven by angular momentum losses or nuclear evolution. Although such longterm, secular phases of evolution may be easier to study observationally, they are too slow to be amenable to present numerical simulations and are therefore beyond the scope of the present proposal. Wellknown examples of the above are cataclysmic variables (CVs) [Warner 1995], lowmass Xray binaries (LMXBs) [White et al. 1995; van Paradijs & McClintock 1995; Verbunt & van den Heuvel 1995], millisecond pulsars (Bhattacharya 1995), soft Xray transients (SXTs) or Xray novae [Chen, Shrader & Livio 1997], Algols (Batten 1989) and W Serpentis stars (Wilson 1989), contact binaries (W Ursae Majoris stars) [Rucinski 1989], some central stars of planetary nebulae (Iben & Livio 1993), double degenerate white dwarf (WD) binaries, perhaps leading to supernovae (SNe) of type Ia through a merger (Iben & Tutukov 1984), or subdwarf sdO, sdB stars (Iben 1990), and double neutron star (NS) or NSblack hole binaries, perhaps yielding gammaray bursts in a fireball when the system coalesces (Mészáros & Rees 1992; Ruffert et al. 1997; Kluzniak & Lee 1997).
The dynamical masstransfer that ensues in unstable binaries rapidly reaches nonlinear amplitudes and may lead to the formation of a massive disklike object, maybe a common envelope phase (Iben & Livio 1993), some mass loss from the system, and in some cases, end in a merger. These phenomena are truly three dimensional in character and largescale numerical simulations are required to further our theoretical understanding of them. Accurate simulations of these rapid phases of binary interaction and masstransfer are crucial to our understanding of wide classes of objects that are under current observational investigation.
We plan to study the masstransfer stability of close binaries in which the component stars obey a polytropic equation of state (EOS) but, in general, possess nonuniform entropy. As we argue below, these simple structural approximations enable us to treat, with some limitations, a very diverse set of objects including main sequence (MS) stars, WDs and NSs. Even in the case of a double NS binary, our simulations will adopt a Newtonian treatment of the potential but with some ad hoc relativistic corrections.
We distinguish two types of dynamical instability which can occur in close binaries:
The classical tidal instability, first studied by Darwin (1887), deals with the secular evolution away from synchronous rotation due to dissipation. Recent treatments of this secular instability include: an analysis of tidal stability of coplanar binaries (spin axes perpendicular to orbital plane) by Counselman (1973), who shows that in unstable systems evolution away from synchronism is accompanied by a secular decrease or increase in the orbital separation; and a more general treatment by Hut (1980) allowing for arbitrary orientations of spin and orbital angular momenta.
In a series of papers Lai, Rasio & Shapiro (1993a; 1993b; 1994a; 1994b) studied the equilibrium and stability of binaries with polytropic EOS, making use of analytic techniques and a variational energy method. They were able to extend their work to unequal masses and allow for internal motions. They showed that in addition to the classical secular instability, a new dynamical instability exists not far from the point of secular instability, which leads to rapid binary inspiral under certain conditions. They showed that the instability occurred before contact was reached for stiff EOS (see also New & Tohline 1997). More recently, Uryu & Eriguchi (1999) have pointed out that for compact binaries the system may reach the masstransfer limit (contact) before the tidal instability limit. This drastically alters the proposed evolutionary scenarios based on the results of Lai, Rasio & Shapiro, and highlights the need to examine the fully threedimensional evolution of compact binaries with mass ratios different from one.
In binaries with unequal components, in general one of the stars will reach contact with its Roche lobe first (the donor) and masstransfer to its companion star (the accretor) will commence either because the donor star is evolving or because the binary separation is reduced by systemic angular momentum losses. What happens next, i.e. the stability of binaries to masstransfer, depends crucially on the reaction of the donor star to massloss, and on the changes in Roche geometry that ensue from the masstransfer. Important factors determining the evolution of the Roche geometry are: 1) whether all the mass transferred by the donor is accreted (conservative masstransfer); and 2) if some of the mass transferred is lost from the system, what are the consequential angular momentum losses (Webbink 1985; King & Kolb 1995). If upon contact and onset of masstransfer, the radius of the donor and the radius of the Roche lobe vary in such a way that contact becomes deeper, the masstransfer will proceed at an ever increasing rate and the situation is unstable. If, however, the contact becomes shallower, the donor will detach and masstransfer will cease. In the latter case, if some physical effect other than masstransfer insists on driving the system back into contact, stable masstransfer may be possible for as long as the driving is present. In CVs and LMXBs, systemic angular momentum losses due to gravitational radiation and magnetic braking, and in some cases the nuclear evolution of the donor, drive stable secular masstransfer from a less massive donor onto a compact object.
It has become customary to characterize the behavior of the donor and the Roche lobe upon masstransfer by introducing the following parameters (Webbink 1985, Ritter 1996): z_{R}, z_{e}, and z_{s}, which are respectively defined as the logarithmic derivatives of the donor's Roche radius, its thermal equilibrium radius and its adiabatic radius, with respect to the mass of the donor. So, for example,

It can be shown (Webbink 1985, Ritter 1996) that masstransfer is dynamically (or adiabatically) stable if z_{s} > z_{R}, whereas thermal stability requires z_{e} > z_{R}. Stars with deep convective envelopes, such as lower MS stars and giants have z_{s} < z_{e}, indicating that the dynamical instability sets in first, whereas stars with radiative envelopes, such as upper main sequence stars, have z_{e} < z_{s}. In the latter case, if z_{R} < z_{e} < z_{s}, it is in principle possible for the star to shrink rapidly first, avoiding a dynamical instability but then expand slowly by thermal relaxation yielding a stable masstransfer at a rate controlled by thermal relaxation.
The above stability considerations are 1dimensional in the sense that the state of the donor is represented by its instantaneous radius and the Roche geometry is reduced to an effective lobe radius. We plan to check the stability boundary with our numerical simulations, starting from equilibrium stellar models with both uniform and nonuniform entropy, in order to evaluate the effect of using selfconsistent potential, density and angular momentum distributions. 
It is worth mentioning that in stable masstransfer situations, the depth of contact is relatively shallow, on the order of a few pressure scale heights, and does not change much with time. In those situations it is correct to adopt an isothermal atmosphere approximation in order to model the masstransfer and the structure of the stream. Recent simulations by Blondin (1998) confirm that such a model can reproduce the behavior of the stream predicted by Lubow & Shu (1975; 1976). In our case, however, we are interested in dynamically unstable situations which lead to deeper degrees of contact and therefore a polytropic approximation for the envelope is better. It is wellknown that in the absence of nonlinear effects which we expect to limit the growth (see below), a polytropic atmosphere of uniform entropy would yield a divergent masstransfer rate in a finite time (e.g. Webbink 1985).
Before outlining what evolutionary scenarios are likely to develop as we follow the nonlinear evolution of systems that are found to be unstable toward either a tidal or masstransfer instability, it is important to point out that ...
throughout the execution of this project we also will make a concerted effort to demonstrate the degree to which our nonlinear simulation techniques are able to represent accurately the equilibrium properties of close binary systems that are dynamically stable. 
The construction of such models, in itself, is a nontrivial task. In an effort to illustrate our early successes with this aspect of our investigation, we offer two movies which may be accessed via the web by clicking on the "Movie1" and "Movie2" icons shown here (if you're reading this proposal online) or by accessing the following URLs: [ www.phys.lsu.edu/ astro/ cazes/ movies/ stable.db.mov ] and [ www.phys.lsu.edu/ faculty/ tohline/ NRAC99/ stable_binary_top.mov ].
Movie1 [taken from New & Tohline 1997] shows that, for a sufficiently soft equation of state, an equalmass (q = M_{donor}/M_{accretor} = 1) binary system can be stable against the tidal instability even when the two stars are so close to one another that they are sharing a common envelope. Movie1 follows the evolution of one such system through 3.5 full orbits. Although the system is being viewed with its orbital plane faceon, orbital motion is not apparent because the evolution is being shown in a rotating reference frame. Movie2 demonstrates how well we are able to follow the orbital motion of an unequalmass (q = 0.81) system in which the less massive component very nearly fills its Roche lobe. In this movie the system is also being viewed faceon, but from the inertial frame of reference.
In those cases where the masstransfer is initially unstable, we plan to follow the evolution in order to investigate how initial conditions such as the mass ratio, the entropy gradients, and the polytropic index influence the nonlinear growth of the masstransfer and the final outcome. 
We expect that depending on the assumed conditions, some binaries may become stable after a certain mass excess is transferred, some may exchange mass, and still others may evolve through a common envelope phase. This schematic division is adopted for simplicity since there may well exist cases where a combination of the above processes operate sequentially.
We anticipate that if the mass ratio is not extreme and the polytropic indexes are similar, it should be possible for the binary to transfer some mass and regain stability. Based on the 1dimensional linear stability theory, we expect models with steep entropy gradients near the surface to be stable or  if unstable initially  to return to a stable situation after the excess mass has either been accreted or lost from the system. Similar behavior may be obtained for homentropic donors if the mass ratio is reversed after a relatively modest masstransfer event. Movie3 and Movie4 show the very early stages of evolution in our first test run of an n = 3/2 polytropic system that was expected to be unstable toward mild mass transfer. The illustrated system has a mass ratio q = 0.88, and at the beginning of the simulation the less massive component (shown on the right) is just marginally filling its Roche lobe. In both movies the system is viewed faceon and from a frame of reference that is rotating with the orbital frequency of the system. Movie4 displays density contours in the equatorial plane, highlighting the flow of even the very lowest density material; Movie3 provides a magnified 3D rendering of isodensity surfaces in the accretion stream.
In some cases, if the accretor is not far from filling its Roche lobe, and the masstransfer stream does not dissipate much of its energy upon impact, part of the material transferred may make it back to the donor. Such hypothetical situations may well be relevant to the onset of masstransfer in Algols and of W Serpentis stars in particular, but initially at least we do not plan to attempt any detailed modeling of the these binaries. Nevertheless, even qualitative results in this area would be new and valuable as a starting point for future work. Note again that we are focusing on initial dynamical effects and are not concerned here with modeling the present secular stable masstransfer in these binaries (Blondin, Richards & Malinkowski 1995; Richards & Ratliff 1998).
Other questions related to the nonlinear dynamical phase are: 1) Under what conditions does a disk form? 2) How much mass is lost from the system to form a circumbinary disk? 3) What is the ultimate fate of these masstransfer events: a stable binary with a different mass ratio, or a merged object?
When the donor is more massive than the accretor (q > 1) and has a deep convective envelope, the masstransfer almost certainly will be dynamically unstable. It is generally thought that once the rate of transfer is high enough, the accretor is swamped and driven out of thermal equilibrium, and the matter transferred forms a common envelope. Most calculations of the subsequent evolution have dealt with a relatively compact star (the accretor) orbiting or plunging into the convective (polytropic) atmosphere of a giant star (Rasio & Livio 1996; Terman, Taam & Hernquist 1994; 1995 and references therein). Some calculations specifically focus on the ejection of the envelope (Taam et al. 1997; Sandquist et al. 1998).
The reasons why it is important to investigate the onset and early stages of a common envelope evolution (CEE) are discussed by Rasio & Livio (1996): 1) the dynamical phase is probably the only phase that can be calculated accurately within the capabilities of current computers; 2) the results of this early phase can be used as an initial state for followup calculations using ad hoc approximations; and 3) in some cases, as discussed in §§ 1ci and 1cii , the fate of the sytem is resolved during the dynamical phase.
We plan to study the early dynamical phases of selected common envelope evolutions and investigate the conditions for the formation of a common envelope and, if resources permit, calculate the efficiency with which orbital energy is used to eject the envelope. 
While it would be interesting to compare the results obtained for common envelope evolutions using Smooth Particle Hydrodynamics (SPH; the technique used by Rasio & Livio) and FiniteDifference Hydrodynamics (FDH; our selected technique) our intent is to explore some basic questions in a parameter regime quite different from what most CEE calculations assume. Therefore we are  at least initially  not aiming to reproduce the calculations of Rasio and Livio using a FDH code. In future developments (beyond the immediate scope of this proposed project) we would expect to incorporate more physics in such calculations (e.g., turbulent viscosity, radiation transport and convection). While this is relatively straightforward in FDH, it is not so in SPH.
Ultimately, the incorporation of additional physics  in particular, radiative transport or an ad hoc method of cooling the envelope material  is expected to be crucial to accurately model the CEE. Nevertheless, our proposed adiabatic simulation of the CEE represents, to our knowledge, the first attempt to simulate the dynamical evoluton of two wellresolved and selfconsistent binary components into a common envelope phase. As such, it will serve as an important basis for future research in this area.
The coalescence of double white dwarf and double neutron star binaries is important to examine since this process may produce a number of astrophysically interesting objects and events. Double white dwarf binary mergers have been suggested as precursors to some Type Ia Supernovae (Iben & Tutukov 1984; Iben 1988, 1991; Iben & Webbink 1989; Branch et al. 1995) and to long gammaray bursts (Katz & Canel 1996). White dwarfwhite dwarf mergers may also lead to the formation of massive single white dwarfs or neutron stars (Colgate & Petschek 1982; Saio & Nomoto 1985; Iben & Tutukov 1986; Kawai, Saio, & Nomoto 1987; Chen & Leonard 1993); to the formation of subdwarf stars; or to the formation of hydrogen deficient, highly luminous stars (Iben 1990 and references therein; Webbink 1984). Neutron starneutron star mergers may result in bursts of gammarays and neutrinos, in the production of rprocess elements, and in the formation of black holes (Eichler et al. 1989; Meyer 1989; Narayan, Paczynski, & Piran 1992; Rasio & Shapiro 1992; Davies, Benz, Piran, & Thielemann 1994; Katz & Canel 1996; Lipunov et al. 1995; Ruffert, Janka, & Schäfer 1996; but see the simulations of Shibata, Nakamura, and Oohara [1993] and Janka & Ruffert [1996] which cast doubt on the neutron starneutron star merger scenario as a precursor to gammaray bursts).
MOVIE5 Binary Merger 



mpeg (701K) 
The initial separation at which a particular binary model becomes unstable to merger, if one exists, can be estimated via a stability analysis of a set of binary models constructed in hydrostatic equilibrium, along a constant mass sequence of decreasing orbital separation. This sequence serves as an approximate representation of the evolution of the binary as its components are brought closer together by the effects of gravitational radiation. Analyses along these lines have recently been done by Lai, Rasio, & Shapiro (hereafter LRS), by Rasio & Shapiro (hereafter RS); and by Uyru & Eriguchi for binaries with polytropic EOSs. (For earlier related work see Gingold & Monaghan 1979; Hachisu & Eriguchi 1984a, b; Hachisu 1986.)
New & Tohline 1997 recently have performed stability analyses of equilibrium sequences of double white dwarf binaries constructed with the zerotemperature white dwarf equation of state, double neutron star binaries constructed with realistic neutron star equations of state and, for the sake of comparison with the work of LRS and RS, polytropic binaries with n = 0.5, 1.0, and 1.5. For simplicity, all binary models along these sequences were constructed as synchronously rotating systems having equalmass (q = 1.0) components. Movie1 (shown earlier) shows the behavior of one system that was found to be dynamically stable against merger. Systems that were found to be dynamically unstable via the tidal instability, such as the one illustrated here in "Movie5", were followed through a nonlinear phase of evolution to merger. The properties of these merged component systems have been detailed by New (1996). In many respects the stability analyses and dynamical simulations being proposed here can be viewed as an extension of the New (1996) and New & Tohline (1997) work. This published work on q = 1 systems, as well as more recent test runs on q ¹ 1 systems, demonstrates that we have the tools in hand to carry out the proposed simulations.
2. Computational Methodology/Algorithms
A primary focus of this research project is to carefully map out boundaries of stability for close binary systems. In order to accomplish this goal in a quantifiably meaningful fashion, it is important that we have the capability to construct accurate equilibrium structures as initial input into each dynamical simulation so that the simulations are not dominated by dynamical transients that depend heavily on initial conditions. We have used Hachisu's SelfConsistent Field technique (hereafter HSCF) to construct three dimensional equilibrium binary models. This iterative method was initially used by Hachisu to generate models of synchronously rotating white dwarf binaries with unequal masses (Hachisu 1986) to compare the total energies of white dwarf binaries and systems containing a white dwarf surrounded by a massive disk of white dwarf material.
To date, we have used the HSCF technique to generate synchronous, polytropic binaries with mass ratios 0.1 £ q £ 1.0 in which both stars have uniform (but different) entropy distributions (homentropic). To uniquely specify a model we must specify the maximum density of each component and set three boundary points along the line of centers where the density distribution vanishes. These points correspond to the inner boundary points of each star and the outer boundary point for one of the stars. The cylindrical computational lattice contains 128 × 128 × 256 grid zones in the R, Z, and q directions, respectively. Due to the symmetry of the tidal and centrifugal potentials about the equatorial plane, we need only calculate models in the hemisphere above the equatorial plane.
In addition to polytropic binaries, we are able to construct selfconsistent, Newtonian models of compact binaries by choosing an appropriate equation of state such as the zero temperature WD equation of state (Hachisu 1986) or any of several realistic NS equations of state as described by New & Tohline (1997). Binary models using homentropic equations of state are, however, inadequate for examining the stability of mass transfer in main sequence stars without deep convective envelopes (stars with masses greater than but on the order of one solar mass). For an homentropic polytrope of index n, z_{s} = (1  n)/(3  n) so that z_{s} = 1/3 when n = 3/2 for example. On the other hand, the proper mainsequence massradius relationship demands z_{s} » 1. By allowing the specific entropy to be a function of the density, we are able to construct equilibrium configurations that are both convectively stable and that reproduce the desired massradius relationship.
Our primary dynamical simulation tool is an mpi (message passing interface) based algorithm designed to perform multidimensional, gravitational CFD (computational fluid dynamic) simulations efficiently on parallel computing platforms. The evolution of a system is governed through a finitedifference representation of the multidimensional equations that describe the dynamics of inviscid, selfgravitating, compressible gas flows. More specifically, the employed finitedifference algorithm is based on the secondorder accurate, van Leer monotonic interpolation scheme described by Stone and Norman (1992), but extended to three dimensions on a uniform, cylindrical coordinate mesh. Most simulations are performed on a computational lattice containing 128×96×256 grid zones in the R, Z, and q directions, respectively, with no assumed geometric symmetries. The fluid equations are advanced in time via an explicit integration scheme (often referred to as "direct numerical simulation") so an upper limit to the size of each integration time step is set by the familiar Courant condition. At each time step, the Poisson equation is solved in order to determine, in a selfconsistent fashion, how fluid elements are to be accelerated in response to the fluid's own selfgravity. Complete descriptions of our algorithm to solve the coupled CFD equations can be found in New (1996) and in Cazes (1999). The Poisson equation is solved implicitly using a combined Fourier transformation and iterative ADI (alternating direction, implicit) scheme (Cohl, Sun & Tohline 1997) in conjunction with boundary values determined via a compact cylindrical Green's function technique (Cohl & Tohline 1999).
When performing multidimensional gravitational CFD simulations, it is always important to develop an efficient technique by which the numerical results can be readily analyzed for physical content. Because threedimensional CFD simulations in particular generate enormous amounts of data, and the computational efficiency of modern highperformance computers is quickly degraded by any significant I/O demands, generally it is preferable to analyze the data "on the fly" rather than trying to save the data to disk for future, postprocessing. As described more fully below, we have constructed a heterogeneous computing environment through which we are able to routinely create volumerendered images and 3D animation sequences of our evolving fluid configurations. (The 3D "Movie" sequences referred to in §1 of this proposal have been produced in this fashion.) Relatively small amounts of data are regularly passed across the network from the parallel machine that is performing the gravitational CFD simulation to a machine that is more suited to visualization tasks; images that will become part of an extended animation sequence are created from this data; then the 3D data sets are immediately destroyed in order to minimize the impact on disk storage. At the SDSC, we have utilized the scriptable attributes of AliasWavefront (TM) software on middle and highend SGI systems in order to perform these critical visualization tasks. This visualization tool and the heterogeneous computing environment in which it functions (see § 3b for a more detailed description) will continue to provide a critical component of our ongoing gravitational CFD simulations at the SDSC.
We propose to perform the following welldefined simulation tasks in order to begin to examine to what extent the ideas outlined in §1, above, contribute to a more complete understanding of the tidal and masstransfer instabilities in close binary systems.
We have laid the groundwork for these investigations by (a) constructing several equilibrium binary sequences with q close to but not equal to unity; (b) following through more than four orbits the dynamical evolution of one system that was expected to be stable because neither star was completely filling its Roche lobe (Movie2); and (c) following an early phase of the evolution of one n = 3/2, q = 0.88 model that was unstable toward mass transfer (Movies 3 & 4). In addressing Tasks I and II we propose initially to follow the n = 3/2, q = 0.88 model evolution through a large number of orbits in an effort to ascertain whether the initial masstransfer event remains conservative and develops into a steadystate flow. If it does, we will accurately measure, among other things, its rate of mass transfer and thereby determine whether or not it is feasible to follow the accretion event to completion with a Courantlimited dynamical simulation. We also propose to carry out a number of shorter evolutionary simulations starting from initial models with different mass ratios q and different polytropic indices n. By varying n, we will ascertain whether the instability is sensitive to the EOS. By varying q we may be able to address the principal question raised in Task II even if we are unable to follow a single masstransfer event to completion.
According to linear theory, a system with q > 1 is expected to be much more violently unstable toward masstransfer than systems in which the less massive star is the donor, so there is a greater likelihood that the nonlinear evolution will include a commonenvelope phase (§ 1ciii). If this is the case, we expect to be able to determine the ultimate fate of such a system by following its evolution through only five or six orbits. However, we almost certainly will have to extend our radial and vertical grid a factor of 24 beyond its initial size in order to ensure that most of the envelope material remains on the computational lattice.
3. Preliminary Results and Progress to Date
Taking advantage of our acquisition of an 8Knode MasPar at LSU, in December 1991 we rewrote our gravitational CFD code in mpf (MasPar Fortran), which is a language patterned after Fortran 90 but includes certain extensions along the lines of highperformance Fortran (HPF). Utilizing the MP1, we also gained valuable experience in the development of parallel algorithms to efficiently solve the Poisson equation in curvilinear coordinate systems (Cohl, Sun & Tohline 1997). It was at LSU as well that we initially developed a heterogeneous computing environment through which our gravitational CFD data can be visualized "onthefly" (Cazes, Tohline, Cohl, & Motl 1999; http://www.phys.lsu.edu/astro/cazes/dodconf/hce.html).
Over the past five years, our parallel CFD algorithm has been ported to the TMC CM5 at NCSA and to the IBM SP2 at the Cornell Theory Center. We also have been able to implement an f77 serial version on the Cray Y/MP at NCSA and an F90 version on the Cray C90 at SDSC. Table 1 in Attachment C2 documents the performance of our CFD code on these five different machine architectures. For a number of different reasons, we have found the Cray T3E hardware most suitable for our simulation tasks. In what follows we detail our experiences with code implementation and development on the T3E over the past few years.
i) Porting the Gravitational CFD Code to the T3E
Early in 1997, the PI applied for and was granted 19,200 SUs on the Cray T3E at the SDSC. After review, on April 1 of 1998 an additional 20,000 SUs was added to our allocation and the project was extended through March 31, 1999. Appreciating the broad scope of our investigation at that time, the reviewers also encouraged us to seek a larger allocation through NRAC. So, one year ago we requested and were awarded 80,000 SUs on NPACI T3Es (40,000 on the T3E_600 at SDSC and 40,000 on the T3E_600 at UT, Austin), an allocation that runs through October 31, 1999. Our current request for PACI resources will build directly upon this successful, ongoing NPACI project.
During the first couple of months of this project in 1997, we focused our efforts on porting our gravitational CFD code to the T3E and documenting its performance. Using the available Cray HPF compiler (PGHPF, developed by the Portland Group), we found that relatively few unexpected code modifications were required in order to successfully compile and execute our code on the SDSC system. (This was because we already had had experience successfully modifying the code to execute on the CM5 using cmf and on the Cray C90 using the Cray Fortran90 compiler.) In June, 1997, we submitted a technical report entitled, "Early Experiences with the Cray T3E at the SDSC," to Jay Boisseau at the SDSC in which we fully documented how well our code was performing on the T3E. This technical document is included as Attachment C2 to this proposal. In summary, we found that:
On the down side, we also discovered that the PGHPF compiler was producing executable code that performed computational instructions on each T3E processor with extraordinarily low efficiency. That is, as a parallel application, the code was scaling well, but its overall efficiency measured relative to the published peakperformance capabilities of the machine was poor. By way of illustration, we show in Table 1, below, the measured MFlop rating of our CFD code on a single processor of the T3E_600 at the SDSC. For two different lattice sizes (one with and one without poweroftwo arrays), the PGHPFcompiled code runs at about 13 MFlops, which is only 2% of the published peak rating of 600 MFlops. Clearly this left plenty of room for improvement, but it was equally clear from discussions with SDSC operators and other T3E users that our performance measures were not unusual. Throughout its first year of general availability, users had been reporting "typical" T3E singleprocessor performance ratings of anywhere from 4 to 100 MFlops (John Levesque [APRI], private communication; see also "The Benchmarker's Guide to Singleprocessor Optimization for CRAY T3E Systems" published by the Benchmarking Group at Cray Research). It was at this level of performance that we were granted last year's NPAC allocation.
SDSC: T3E_600  

Test  Compiler + Options  Streams  Execution Speed (MFlops)  
Grid size 64×32×32 
Grid size 67×32×32 

A  PGHPF  O3  OFF  12.79  13.06  
B  f90 (mpi)  O3,aggress lmfastv  OFF  15.61  24.83  
C  f90 (mpi)  O3,aggress lmfastv  ON  15.61  36.09  
D  f90 (mpi)  O3,aggress sdefault32  ON  21.73  43.06  
The numbers in this table have been obtained using "pat," a performance monitoring tool that runs on the T3E. The report from which these numbers have been drawn accompanies this proposal as Attachment C1. 
ii) Developing an mpibased Gravitational CFD Algorithm
Over the past nine months we have successfully rewritten our entire gravitational CFD algorithm, incorporating explicit message passing instructions via mpi. Exhaustive tests have convinced us that this new version of our code is producing physical results identical to the ones generated with our welltested HPF algorithm. In fact, the simulation illustrated above as Movie2 was performed entirely with the new mpibased algorithm. As illustrated by the numbers shown in Attachment C3:
As Table 1 documents, most of this improvement can be understood by looking at singleprocessor execution speeds. Using mpi (which, in turn, permits us to use f90 directly without passing through PGHPF), we gain a factor of approximately 2 by simply shifting to array sizes that do not have poweroftwo dimensions and another factor of approximately 1.5 by turning streams on. The final improvement, which gives us a factor of 4 speedup overall, comes from the parallel implementation. And, as the "pat" report indicates, this final speedup comes not so much from an overall improvement in communications efficiency but from the fact that the mpibased code requires significantly fewer floating point operations! (This last point came as a bit of a surprise to us.)
In last year's proposal we promised to seek ways in which the code's execution speed could be improved by a factor of two. By moving to an mpibased algorithm, we have more than met this goal. We see additional opportunities for improvements because we have not yet focused on unrolling loops, etc., in order to achieve a better utilization of the available cache on each processor. The mpibased (f90) code provides us an opportunity to pursue this type of tuning over the coming year. As Table 1 shows, we also will be able to run simulations more quickly in cases where doubleprecision floatingpoint calculations are not required.
(iii) A Significantly Improved Poisson Solver
It has been clear to us for quite some time that one of the most computationally demanding components of our current gravitational CFD code is the algorithm used to calculate boundary values for the gravitational potential that must be fed into the Poisson solver in order to be able to solve for the gravitational potential throughout the volume of the fluid. Historically, we have used an algorithm based on a multipole expansion in terms of spherical harmonics (Tohline 1980), but spherical harmonics are not a natural basis set when simulations are being conducted on a cylindrical coordinate mesh. During the past year we have discovered that the traditional Green's function expansion in cylindrical coordinates which involves an infinite integral over products of Bessel Functions can be written in a much more compact form (Cohl & Tohline 1999) that is extremely wellsuited for implementation on cylindrical coordinate meshes. We recently have implemented this "compact cylindrical Green's function" technique in parallel on the T3E. It will be used in all of our future gravitational CFD simulations because it is both faster and more accurate than the one based on a multipole expansion in terms of spherical harmonics. Most importantly, though, its implementation will permit us to bring the top (and bottom) cylindrical grid boundary down closer to the rotationally flattened mass distribution and reduce the computational lattice size in the vertical direction.
(iv) Establishing a Heterogeneous Computing Environment
Working with several staff members in the SDSC Vislab during the summer of 1997, we successfully established a heterogeneous computing environment (HCE) at the SDSC through which the results of our gravitational CFD simulations can be routinely visualized (Cazes, Tohline, Cohl, & Motl 1999). Extending the ideas that we first developed at LSU, we have established a procedure by which, at predetermined time intervals during a simulation, the CFD code running on the T3E writes out a 3D density file and sends a signal to the Vislab's SGI Power Onyx/Infinite Reality Engine indicating that a new data file is available for processing. Having received the signal from the T3E, the graphics engine executes a previously written shell script which (i) searches through the data to locate four separate isodensity surfaces and generates a wireframe structure (i.e., vertices and polygons) that corresponds to each of these 3D surfaces; (ii) executes an AliasWavefront script which renders the set of wireframe structures as a multiply nested, reflective and colored transparent set of isosurfaces; (iii) stores on disk this 2D, rendered image of our evolving model at one particular time step in the simulation; then (iv) deletes the original 3D data set and wire frame models to conserve disk space. This entire task is continually reinitiated as data is generated during an extended CFD simulation on the T3E. The net result of this process is a stored sequence of 2D images illustrating the evolution of our physical system. The series of 2D images is transferred via the internet to LSU where it can be displayed as a movie, allowing us to rapidly interpret the dynamics of the systems that we are examining. All of the "movie" sequences reference in §1 of this proposal have been generated in this manner. We consider the successful establishment of this environment which smoothly couples the highend, SGI visualization tools at the SDSC with our simulation efforts on the T3E to have been a significant accomplishment.
In the proposal that we submitted requesting NPAC resources last year, we proposed to use our gravitational CFD and heterogeneous computing environment tools to study three specific problems. The third one on our list at the time was to begin to "examine the stability against merger of unequalmass binaries." As our extensive discussion in §1 of this proposal has explained  and as our "Movie2," "Movie3," and "Movie4" animation sequences illustrate  we have made significant progress along these lines. Specifically, we have demonstrated that we are able to construct equilibrium configurations having a wide range of different mass ratios and polytropic indices as input to our dynamical investigations; we have demonstrated that our gravitational CFD code is capable of accurately modeling in a fully selfconsistent fashion the orbital motion of close binaries that are stable against mass transfer; and we have demonstrated that at least the early phases of a conservative masstransfer event can be modeled successfully. We are now prepared to focus our simulation efforts almost entirely on this problem.
Summarized briefly, the other two problems that we proposed to tackle last year were designed to help answer the question, "Why and how do stars preferentially form in pairs." Consistent with the objectives outlined in connection with the second of these two stated problems, over the past year we have been able to construct  from two quite different initial equilibrium objects  steadystate models of triaxial (barlike) configurations having compressible equations of state and nontrivial, supersonic internal motions. We also have demonstrated convincingly that these configurations are dynamically stable. By all accounts these models are compressible analogs of the classical, incompressible Riemann Stype ellipsoids.
With these dynamically stable, triaxial configurations in hand, we were then able to tackle the first  and most important  objective. Using our dynamical simulation tools, we were able to slowly "cool" one of these configurations and demonstrate, for the first time, that such a system is susceptible to "fission" into a close binary system. We consider this result to be a particularly significant one because the idea that a naturally occurring fission instability might lead to the formation of binary stars from rapidly rotating protostellar gas clouds has been around for over 100 years, but it has not been until this decade that we have had the computing capacity and skills to test the hypothesis.
Over the past year we have generated several different animation sequences illustrating the internal structure and dynamical stability properties of our compressible analogs of the Riemann Stype ellipsoids, and other movies illustrating the slow cooling evolution that ultimately leads to fission. These movies can be accessed via the following URL: www.phys.lsu.edu/ astro/ barmode. It is also our understanding that these most recent simulations are being featured in an upcoming issue of SDSC's "enVision" magazine. Publications resulting directly from this work, to date, on NPACI facilities are:
4. Justification of Resources Requested
i) Required Platforms
We are requesting an allocation of 65,000 SUs on the Cray T3E at the SDSC as well as continued access to the SGI workstations in the SDSC Vislab. From our past work at SDSC and elsewhere we feel these platforms are most suitable for our simulation work for the following reasons. First, through our past experience we have found that our gravitational CFD algorithm scales very well on the T3E architecture (for details, see Attachments C2 and C3). Second, it is on the T3E platform that our new mpi implementation has been developed and thoroughly tested, showing a performance improvement of roughly a factor of 4 over our HPF code. In addition, our familiarity with the T3E architecture suggests that further improvement in our code's performance can be realized by closely scrutinizing and optimizing the use of the cache system in critical subroutines within the overall program. Finally, we have developed a heterogeneous computing environment that smoothly integrates the unique capabilities of the T3E with those of the SGI platforms in the SDSC Vislab.
ii) Code Requirements and Optimization Issues
Our proposed simulations at a resolution of 128 x 96 x 256 will require at least 32 nodes on the SDSC T3E  that is, the problem requires at least 4 GBytes of RAM. For the higher resolution of 256 x 192 x 256 required to follow the development of the common envelope phase (Tasks III and IV) we will need to run on at least 64 processors to accommodate the increased problem size. Given that there is a finite amount of time in which to perform the simulations we have outlined previously, we anticipate running on 64 (128) nodes for the lower (enhanced) resolution simulations, respectively. Since we have discussed code optimization issues fairly thoroughly in earlier sections of this proposal, we will not comment further on them here.
Proposed Simulations  N_{models}  N_{orbits}  SU  

A.  Tasks I and II [Grid resolution 128×96×256 ==> 500 SU/orbit]  
  Extended evolution of n = 3/2, q = 0.88 model;  1  20  10,000  
  Examine stability of models with different values of n and q;  10  5  25,000  
B.  Tasks III and IV [Grid resolution 256×192×256 ==> 2000 SU/orbit]  
  Extended evolution of models with q > 1;  3  5  30,000  
Total requirement.  65,000 
iii) Proposed Production Runs
In §2d of this proposal we described in detail the simulation tasks that we would like to complete over the coming year utilizing PACI resources. Following up on that discussion, our specific request for T3E resources at the SDSC is detailed in Table 2. The Task numbers that have been identified in connection with each proposed simulation in this table follow the outline used in our earlier discussion of the respective problems in §2d. The column labeled N_{models} specifies the number of different initial model configurations that will be examined and the column labeled N_{orbits} lists the number of orbits through which we estimate we will need to follow each binary system in order to complete the specified task. Along with each Task we have specified the (R×Z×q) grid resolution that will be used and the approximate number of SUs that will be required to simulate one complete orbit at that grid resolution. (This number has been determined as follows: The singleprocessor execution rating of 36 MFlops, as reported in Table 1 of §3b, leads to a singleprocessor execution speed of approximately 70 microseconds per zone per time step as reported in Table 4 of Attachment C3. Also, from prior experience, we know that approximately 8000 Courantlimited time steps are required to complete one binary orbit during a masstransferring event. Hence, on a grid containing 3 Megazones, each integration time step will require approximately 0.061 SUs, and one binary orbit will require approximately 500 SUs.) Finally, the number of SUs that we estimate will be required to complete each proposed simulation task (last column of Table 2) is obtained by multiplying the required number of SUs/orbit by N_{orbits} and by N_{models}.
Performance Monitor Results
As discussed earlier, utilizing "pat"  a performance analysis tool that accesses the hardware performance monitors on the T3E  we have been able to document the singleprocessor performance of our gravitational CFD code on the T3E. Numbers drawn directly from "pat" are shown in Table 1 of §3, above; they also are listed in Attachment C1 to this proposal.
Special Requirements
As explained in detail above, we rely heavily on the SDSC Vislab facilities to routinely analyze our simulation results. In order for this project to be a success, therefore, it is essential that we continue to have access to those facilities. It is extremely important that the AliasWavefront software continue to be made available on the SGI Onyx as well.
5. Local Computing Environment
In 1991, through an equipment grant from the Louisiana State Board of Regents, the LSU Department of Physics and Astronomy acquired an 8,192node, SIMDarchitecture MasPar MP1 to support the development of parallel algorithms for a variety of computational physics and chemistry projects. The architectural topology of the MP1 is that of a 2D torus characterized by a 128 x 64, twodimensional grid of MasParproduced processing elements. The MP1 also incorporates a twotier processor communication strategy which includes rapid nearest neighbor communication via XNet communication and efficient global communication performance using a tree nested global router. Each processing element contains 64 Kbytes of local dynamic RAM, that is, the total memory on LSU's MP1 system is 0.5 GBytes. As described in §4, above, we have taken full advantage of this parallel hardware platform to train students in parallel algorithm development and to perform a variety of astrophysical simulations at moderate spatial resolutions.
Our department also supports a large cluster of Sun Microsystems workstations. Sparcstations that are integrated into this cluster, plus a NeXT Dimension workstation and an SGI Octane provide the PI and his students with graphical unix interfaces into LSU's campuswide ethernet system and the internet. Working with several institutions in SURA (the Southeastern University Research Association) and with NSF funding, LSU has become a vBNScertified institution, which brings LSU closer to Internet2 status. Within the coming month, LSU expects to be connected directly into the vBNS network which should significantly improve our ability to perform the simulation tasks outlined in this proposal at the SDSC.
6. Other Supercomputer Support
Having been designated a Major Shared Resource Center for the Department of Defense, during 1997 the Naval Oceanographic Offices (NAVOCEANO) at the Stennis Space Center in Mississippi began to purchase and install a variety of high performance computing platforms and visualization architectures. In particular, they have acquired a Cray T3E and an SGI Onyx with supporting compilers and software tools to create a supercomputing environment that resembles the one to which we have grown accustomed at the SDSC. In part because of LSU's close proximity to Stennis (the Stennis Space Center is only 2 hours away from LSU by car), in August, 1997, the PI and his students were invited to utilize the MSRC resources at a modest level of activity through the center's Programming Environment and Training (PET) program. Close interactions with the MSRC technical staff are proving useful in connection with our efforts to gain better singleprocessor execution efficiencies on the T3E and to develop visualization tools that will permit us to study in detail the velocity flowfields that develop during our dynamical simulations. (For purposes of comparision with the SDSC T3E_600, some of the performance numbers from "pat" that we have reported in Attachment C1 to this proposal have been made on the NAVO T3E_900.)
Because the PI does not have a DoDfunded research project, he has not been granted a priority allocation of time on the highperformancecomputing hardware at the NAVOCEANO's MSRC. In connection with the PI's NASAfunded research activities, therefore, it is critically important that a guaranteed allocation of supercomputing time be secured through PACI resources.
7. Project Team Qualifications
A major component of the PI's research work over the past twenty years has involved the largescale numerical simulation of multidimensional gas flows in astrophysical systems utilizing a variety of different high performance computing platforms. In the early 1980s, the PI held a postdoctoral research position in group T6 at Los Alamos National Laboratories at which time he broadened his exposure to stateoftheart CFD techniques and learned to develop efficient algorithms for vector supercomputers. From the mid1980s through the early 1990s, Tohline was PI on a large account at the Cornell Theory Center during which time he and his students experimented with the parallel implementation of CFD algorithms on the IBM 3090 and various accompanying FPS Array Processors. When the 8Knode MasPar MP1 system arrived at LSU (see §5, above), the PI abandoned the CTC facilities because his group was having considerable success utilizing the MasPar system for its largest simulation efforts. As has been detailed elsewhere in this proposal, in recent years the PI and his students have gained experience developing gravitational CFD algorithms for execution on a variety of parallel platforms including MasPar's MP2, the CM5, the SP2, and the Cray T3E. The group also has developed and implemented a heterogeneous computing environment (at LSU, at the SDSC, and at the NAVO MSRC) that tightly couples their parallel simulation efforts to sophisticated visualization tools.
CoPI J. Frank has broad experience in studies of the astrophysics of interacting binary systems, particularly when such interactions involve fluid flow and accretion events. He is, for example, lead author of a widely referenced technical CUP publication entitled, "Accretion Power in Astrophysics." His expertise within this project helps ensure that the results of the proposed complex multidimensional and nonlinear simulations are physically correct and quantitatively meaningful when placed in the context of the extensive literature on this subject. Frank also is lead investigator on the NASAfunded, Astrophysics Theory Program project in support of which this request for NPAC resources is being made. (Tohline is CoPI on this NASAfunded project.)
Attachment B entitled, "Curriculum Vitae and Selected Publications," documents the formal educational background and professional experience of both Tohline and Frank, and provides a list of significant publications that are particularly relevant to the project being proposed here.
Patrick M. Motl is a 6thyear graduate student enrolled in the Ph.D. physics program at LSU. He is expected to be heavily involved in the simulation work outlined in this proposal. Indeed, the astrophysics problem on which this project proposal is based is the focus of Patrick's Ph.D. dissertation research. Patrick has perfected the 3D selfconsistentfield technique which permits us to construct unequalmass, equilibrium binary systems and then to test their stability with the dynamical simulation tools described herein. He also has been solely responsible for migrating the gravitational CFD code to an mpibased environment. Eric Barnes is a 4thyear graduate student enrolled in the Ph.D. physics program at LSU. He has just begun to get experience with largescale simulation tools, but will be expected to play a greater role in our ongoing gravitational CFD studies as the year goes along.
Form G: Request for NSF PACI Resources  
Bibliography  
Curriculum Vitae and Selected Publications  
Timing and Performance Data  