A REQUEST FOR NSF PACI RESOURCES
Joel E. Tohline, Juhan Frank, & Luis Lehner
Department of Physics & Astronomy,
Louisiana State University
and
Steve Liebling
Natural Science Division, Southampton College, Long Island University
This entire document, along with several illustrative movies is available online at http://www.phys.lsu.edu/faculty/tohline/NRAC_2004/ 
1. Summary of Proposed Research, Including Objectives and Goals
Over half of all stars in the sky are multiple star systems and most of these are binary stars. We recently have completed a review of the mechanisms that have been proposed to explain the origin of binary star systems (Tohline 2002). Approximately one half of the binary stars are close enough to one another for mass to be exchanged between the components at some point in their evolution (Trimble 1983). The components of these interacting binaries influence one another in a variety of ways including tidal forces, irradiation, and in the most extreme manifestation of the interaction, through mass transfer and mass exchange. As a result, the evolution and observational properties of the binary components and the binary itself are dramatically different from the behavior expected in the absence of mass transfer, even when this transfer proceeds at a leisurely pace.
Interacting binaries of one type or another, especially those in which the accretor is a compact object (Frank, King & Raine 2002), play central roles in many of the astrophysical sources of high energy photons, and are involved in the generation of gravitational and highenergy particle radiation. Most of the interacting binary types known observationally are presently undergoing longterm, stable masstransfer on timescales dictated by secular angular momentum losses (e.g. gravitational radiation and magnetic braking) and/or nuclear and/or thermal evolution. Wellknown examples of the above are cataclysmic variables (CVs) [Warner 1995, Beuermann 2002], lowmass Xray binaries (LMXBs) [White et al. 1995; van Paradijs & McClintock 1995; Verbunt & van den Heuvel 1995], soft Xray transients (SXTs) or Xray novae [Chen, Shrader & Livio 1997], Algols (Batten 1989, Vesper Honeycutt & Hunt 2001; Zavala et al. 2002) and W Serpentis stars (Wilson 1989), contact binaries (W Ursae Majoris stars) [Rucinski 1989; Qian 2001], supersoft Xray sources [Greiner 2000, Rappaport, Di Stefano & Smith 1994], and the ultraluminous Xray sources in nearby galaxies [King et al. 2001]. In these systems, the fraction of the donor star's mass that is transferred during one orbit is typically » 10^{12} to 10^{9}, many orders of magnitude less than what current numerical 3D hydrocodes can resolve. All of the above systems, however, must have descended from binaries in which the accretor of today was initially the more massive component who evolved first off the mainsequence. The mass transfer in these progenitor systems was in many instances dynamically unstable, yielding masstransfer rates many orders of magnitude above the currently observed rates and leading in some cases to a common envelope phase (see e.g. Warner 1995; Verbunt & van den Heuvel 1995; Nelson & Eggleton 2001) or a thermal masstransfer phase (King & Schenker 2002).
In addition, there is a wide class of binary star systems not presently undergoing masstransfer for which the astrophysical scenarios that have been proposed to explain their origin or ultimate fate involve dynamical or thermal masstransfer in a close binary. Examples of such systems are millisecond pulsars (Bhattacharya 1995), some central stars of planetary nebulae (Iben & Livio 1993), double degenerate white dwarf binaries, perhaps leading to supernovae of type Ia through a merger (Iben & Tutukov 1984), or subdwarf sd0, sdB stars (Iben 1990), and double neutron star binaries, perhaps yielding gray bursts in a fireball when the neutron stars coalesce (Paczynski 1986, Mészáros & Rees 1992, Ruffert et al 1997, Mészáros 2001). The evolutionary scenarios that are drawn upon to explain the existence of some of these systems call for events in which 10% or more of the donor's mass can be transferred during a single orbit.
We are conducting a multiyear project that focuses on these relatively rapid phases of masstransfer, investigating the conditions required for stable/unstable masstransfer and following numerically their evolution in unstable cases. The dynamical masstransfer that ensues in unstable binaries rapidly reaches nonlinear amplitudes and may lead to the formation of a massive disklike object, maybe a common envelope phase (Iben & Livio 1993), some mass loss from the system, and in some cases, end in a merger. These phenomena are truly threedimensional in character and require largescale numerical simulations to further our theoretical understanding of them. While the direct observation of these phases – with the possible exception of gravitational wave emission from mergers – remains unlikely in the near future, these investigations will help us understand the origin of these binaries and their mutual relationships. With our present numerical simulation tools, we are in a position to study both the tidal and the masstransfer stability of close binaries in which the component stars obey a polytropic equation of state (EOS) but, in general, possess nonuniform entropy. These simple structural approximations enable us to treat, with some limitations, a very diverse set of binary systems containing main sequence (MS) and white dwarf (WD) stars. In addition, we are embarking on a collaborative project to extend our numerical algorithms to handle relativistic fluid flows and to incorporate adaptivemeshrefinement (AMR) capabilities. In the future, this expanded toolset will permit us to spatially resolve accretion disk structures as they form around various types of compact stellar objects, and to accurately simulate masstransfer in neutron starneutron star (NSNS), neutron starblack hole (NSBH), and WDBH systems – systems which are especially relevant to the LIGO and LISA experiments as sources of gravitational radiation.
In an effort to illustrate our early successes with this longterm, numerically intensive project, throughout this proposal we will reference a variety of short animation sequences (movies) that we have produced from the results of various extended simulations. If you're reading this proposal online, each movie can be accessed by clicking on the relevant "movie icon." The URL of each movie is also listed in the "Attachments" section at the end of the proposal text.
We distinguish two types of dynamical instability that can occur in close binaries: 1) A dynamical instability which arises in close proximity of extended binary components with sufficiently stiff EOS due to the presence of tidal forces, even before contact and onset of masstransfer; and 2) A dynamical instability which arises upon contact, even when total angular momentum and mass are conserved, as mass is transferred from one component to the other. We shall refer to these instabilities as the tidal and masstransfer instabilities, respectively.
MOVIE1 Tidal Instability 



mpeg
(701K) 
In binaries with unequal components, in general one of the stars will reach contact with its Roche lobe first (the donor) and masstransfer to its companion star (the accretor) will commence either because the donor star is evolving or because the binary separation is reduced by systemic angular momentum losses. What happens next, i.e. the stability of binaries to masstransfer, depends crucially on the reaction of the donor star to massloss, and on the changes in Roche geometry that ensue from the masstransfer. Important factors determining the evolution of the Roche geometry are: 1) whether all the mass transferred by the donor is accreted (conservative masstransfer); and 2) if some of the mass transferred is lost from the system, what are the consequential angular momentum losses (Webbink 1985; King & Kolb 1995). If upon contact and onset of masstransfer, the radius of the donor and the radius of the Roche lobe vary in such a way that contact becomes deeper, masstransfer will proceed at an ever increasing rate and the situation is unstable. If, however, the contact becomes shallower, the donor will detach and masstransfer will cease. In the latter case, if some physical effect other than masstransfer insists on driving the system back into contact, stable masstransfer may be possible for as long as the driving is present. (As mentioned above, in CVs and LMXBs, systemic angular momentum losses drive stable secular masstransfer from a less massive donor onto a compact object.) In a series of simulations that we have completed over the past few years, we have focused our investigation on the effect that the initial system mass ratio q º M_{donor}/M_{accretor} has on the stability of such masstranfer events.
Before outlining what evolutionary scenarios are likely to develop as we follow the nonlinear evolution of systems that are found to be unstable toward either a tidal or masstransfer instability, it is important to point out that throughout the execution of this project we also are making a concerted effort to demonstrate the degree to which our nonlinear simulation techniques are able to accurately represent the equilibrium properties of close binary systems that are dynamically stable.
MOVIE2 q = 1 Contact Binary 



Quicktime (1,380K) 
MOVIE3 Stable, q = 0.84 Binary 



Quicktime (8,451K) 
In those cases where the masstransfer is initially unstable, we are able to follow the evolution through ~ 10 orbits to investigate how initial conditions (e.g., mass ratio, entropy gradients, or polytropic index) influence the nonlinear growth of the masstransfer and the final outcome. Depending on the assumed conditions, some binaries may become stable after a certain mass excess is transferred, some may exchange mass, and still others may evolve through a common envelope phase. This schematic division is adopted for simplicity since there may well exist cases where a combination of the above processes operate sequentially.
MOVIE4 q = 1.32 



Quicktime (23 MB) 
Movie4 shows results from our first investigation of an n = 3/2 polytropic system in which both stars have comparable radii, but the more massive star is the donor. Specifically, the illustrated system has an initial mass ratio q = 1.32 and, at the beginning of the simulation, the donor is just marginally filling its Roche lobe. Eventually, however, it develops a welldefined masstransfer stream (as illustrated in the Movie4 icon) and the masstransfer rate steadily increases thereafter. It is interesting to note that, in this case, the center of mass of the system (and, hence, the orbital axis for the system) is positioned to the right of and L1 Lagrange point and is actually inside the donor envelope. The evolution is shown in a rotating frame of reference that is rotating with the initial angular velocity of the system and, as Movie4 illustrates, we have been able to follow this system evolution through 12 full orbits. Eventually the objects merge into a single, rotationally flattened object.
MOVIE5 q = 0.69 



Quicktime (12 MB) 
The coalescence of WDWD and NSNS binaries is important to examine since this process may produce a number of astrophysically interesting objects and events. Double white dwarf binary mergers have been suggested as precursors to some Type Ia Supernovae (Iben & Tutukov 1984; Iben 1988, 1991; Iben & Webbink 1989; Branch et al. 1995) and to long gray bursts (Katz & Canel 1996). WDWD mergers may also lead to the formation of massive single white dwarfs or neutron stars (Colgate & Petschek 1982; Saio & Nomoto 1985; Iben & Tutukov 1986; Kawai, Saio, & Nomoto 1987; Chen & Leonard 1993); to the formation of subdwarf stars; or to the formation of hydrogen deficient, highly luminous stars (Iben 1990 and references therein; Webbink 1984). NSNS mergers may result in bursts of grays and neutrinos, in the production of rprocess elements, and in the formation of black holes (BH) (Eichler et al. 1989; Meyer 1989; Narayan, Paczynski, & Piran 1992; Rasio & Shapiro 1992; Davies, Benz, Piran, & Thielemann 1994; Katz & Canel 1996; Lipunov et al. 1995; Ruffert, Janka, & Schäfer 1996; New & Tohline 1997; Swesty, Wang & Calder 2000; Rosswog, Davies, Thielemann & Piran 2000). Merging compact binaries are also expected to be relatively strong sources of gravitational radiation (Abramovici et al. 1992; Cutler et al. 1993; Thorne 1995).
Most of the numerical simulations of mergers that have been carried out in recent years have been focused on the NSNS or NSBH problem because of their direct relevance to LIGO (and its sister instruments, like TAMA and VIRGO); and most, although not all (cf. Rasio & Shapiro 1994; Zhuge et al. 1996; Rosswog et al. 2000), have focused on equalmass (q = 1) systems. Even when relativistic effects are ignored in such simulations, the stars usually merge on a dynamical timescale because one star (or both simultaneously, in the equalmass case) encounters a tidal instability before either star fills its Roche lobe (Rasio & Shapiro 1994; New & Tohline 1997). This occurs, in turn, because the EOS of a NS is relatively stiff. As Rasio & Shapiro (1994) have summarized, the existence of a Roche limit and any associated masstransfer instability has little importance for most NS star binaries because this "dynamical instability always arises before the Roche limit." When mass transfer begins, the system is already in "a state of dynamical coalescence," so the masstransfer rates can be much larger than 10% per orbit. One such evolution, taken from New & Tohline (1997), was highlighted above as Movie1.
Although we are interested in continuing to pursue this type of merger problem, the focus of our present research is on systems in which the secondary star has a relatively soft EOS (e.g., n = 3/2 polytrope) and the star fills its Roche lobe before the system encounters a tidal instability. As we have discovered – see both Movie4 and Movie5, above – these masstransfer events can also lead to a merger of the two stars.
2. Computational Methodology/Algorithms
In carrying out a number of additional binary masstransfer simulations this year, as outlined in §2.d, below, we will follow the same procedures as have lead to the various simulation results discussed above and illustrated through Movies 15. For each simulation, we will initially utilize a 3D selfconsistent field (SCF) technique to generate accurate equilibrium structures of synchronously rotating binaries in circular orbits. Each rotationally flattened and tidally distorted binary model will then serve as input into (initial conditions for) a dynamical simulation. Each dynamical simulation will be performed using a 3D, finitedifference hydrodynamic (FDH) technique that follows in a selfconsistent fashion the motion of every fluid element in both stars. Our present SCF and FDH codes, as well as forerunners of these codes, have been effectively used to study gravitationally driven instabilities in protostellar gas clouds (Durisen et al. 1986; Woodward, Tohline & Hachisu 1994; Cazes & Tohline 2000); rapidly rotating, compact stellar objects (New, Centrella & Tohline 2000; Lindblom, Tohline & Vallisneri 2001, 2002); and close binary systems (New & Tohline 1997; Motl, Tohline, & Frank 2002). A detailed description of the present structure of both codes and their capabilities is given in Motl, Tohline, & Frank (2002). Here we briefly summarize the principal elements of both.
It is important that we have the capability to construct accurate equilibrium structures as initial input into each dynamical simulation so that the simulations are not dominated by dynamical transients that depend heavily on initial conditions. We use the iterative SCF technique described by Hachisu (1986a,b) in its most general form in order to construct unequalmass polytropic binaries with the internal structure of both stellar components fully resolved. Models are constructed on a uniformly zoned, cylindrical mesh that conforms exactly to the mesh that is used in the FDH code (see below) so that no interpolation is needed when the converged initial state is introduced into the dynamical code. To date, we have used this SCF technique to generate a wide variety of synchronous, polytropic binaries with mass ratios 0.1 < q < 1.5 in which both stars have uniform (but different) entropy distributions (homentropic). We can readily build detached, semidetached or, in the limit of identical components, contact systems. These models have been constructed on computational grids ranging in size from ~ 128^{3} to ~ 256^{3} as necessary to conform to the chosen resolution of the FDH code. As measured by the global virial error, these models typically converge to better than one part in 10^{5}. As a further test, we have compared parameters from our equilibrium binaries with those calculated from the point mass Roche model by Mochnacki (1984). As expected, our results differ from Mochnacki's because we use a selfconsistent gravitational potential as opposed to the point mass potential as is appropriate in the Roche model. Our results converge to Mochnacki's in cases where the mass ratio becomes small and/or the separation increases.
In addition to polytropic binaries, we are able to construct selfconsistent, Newtonian models of compact binaries by choosing an appropriate equation of state such as the zero temperature WD equation of state (Hachisu 1986) or any of several realistic NS equations of state as described by New & Tohline (1997). Binary models using homentropic equations of state are, however, inadequate for examining the stability of mass transfer in main sequence stars without deep convective envelopes (stars with masses greater than but on the order of one solar mass). By allowing the specific entropy to be a function of the density, we are able to construct equilibrium configurations that are both convectively stable and that reproduce the desired massradius relationship. In the context of this present proposal, however, we will confine our study to homentropic binaries with n = 3/2 polytropic equations of state.
Our primary dynamical simulation tool is an mpi (message passing interface) based algorithm designed to perform multidimensional, gravitational CFD (computational fluid dynamic) simulations efficiently on parallel computing platforms. As has been described in detail by Motl et al. (2002), the evolution of a system is governed through a finitedifference representation of the multidimensional equations that describe the dynamics of inviscid, Newtonian selfgravitating, compressible gas flows. More specifically, the employed finitedifference hydrodynamics (FDH) algorithm is based on the secondorder accurate, van Leer monotonic interpolation scheme described by Stone and Norman (1992), but extended to three dimensions on a uniform, cylindrical coordinate mesh. Most of our simulations, to date (e.g., those associated with Movies 46), have been performed on a computational lattice containing 162´98´256 grid zones in the R, Z, and q directions, respectively, with no assumed geometric symmetries. This resolution has been sufficient to resolve both binary components, along with the masstransfer stream. In order to properly resolve both stars in a system with a mass ratio q < 0.2 – and thereby test the dynamical stability criterion alluded to above – we will find it necessary to adopt a lattice containing significantly more grid zones because in such a low q system the volume of the Rochelobe surrounding the lessmassive (donor) star is significantly smaller than the Rochelobe of the more massive, accretor. Based on preliminary test runs, we expect to utilize a lattice having dimensions ~ 194´130´512. The fluid equations are advanced in time via an explicit integration scheme so an upper limit to the size of each integration time step is set by the familiar Courant condition. At each time step, the Poisson equation is solved implicitly using a combined Fourier transformation and iterative ADI scheme (Cohl, Sun & Tohline 1997) in order to determine, in a selfconsistent fashion, how fluid elements are to be accelerated in response to the fluid's own Newtonian selfgravity. Boundary values for the Poisson equation are determined via a compact cylindrical Green's function technique (Cohl & Tohline 1999). By performing our simulations in a frame of reference that is rotating with an angular velocity W_{0} equal to the initial angular velocity of the binary orbit, we are able to minimize the effects of numerical viscosity (see New & Tohline 1997; Swesty, Wang and Calder 2000), which invariably plays a role in any FDH algorithm.
We have constructed a heterogeneous computing environment through which we are able to routinely create volumerendered images and 3D animation sequences of our evolving fluid configurations. Relatively small amounts of data are regularly passed across the network from the parallel machine that is performing the FDH simulation to a machine that is more suited to visualization tasks; images that will become part of an extended animation sequence are created from this data; then the 3D data sets are immediately destroyed in order to minimize the impact on disk storage. We are utilizing the scriptable attributes of Maya (formerly AliasWavefront) software on SGI systems in order to perform these critical visualization tasks. (The 3D "Movie" sequences referred to throughout this proposal have been produced in this fashion. This visualization tool and the heterogeneous computing environment in which it functions will provide a critical component of the simulations being proposed here.
This is the computationally intensive segment of a multiyear research project that will provide a more complete understanding of the tidal and masstransfer instabilities in close binary systems. With the allocation of PACI resources that we are requesting this year, we will try to ascertain whether any n = 3/2 polytropic binary system that is initially unstable toward masstransfer can return to stability before the system merges; as explained above, linear theory predicts that the critical mass ratio is q_{crit} = 2/3. However neither one of our model evolutions that started from a value of q > 2/3 (models associated with Movie4 and Movie5) returned to stability as their massratio dropped below this critical value. Instead, both binary systems merged. It appears as though these relatively highq simulations do not return to stability for two reasons: First, because the accretion stream directly impacts the accretor, angular momentum carried by the stream material is robbed from the orbit and deposited onto the accretor. Hence, the assumption that orbital angular momentum is conserved during the masstransfer event, which is built into the linear theory prediction, is invalidated. Second, the masstransfer rates grow to such high (nonlinear) levels during the evolution of these dynamically unstable systems that the transfer is difficult to shut off even when q drops to a level below which the binary separation starts to increase.
MOVIE6 q = 0.41 



Quicktime (15 MB) 
ii) Code Development
Through NSF's mediumITR program, we recently have been awarded funds to significantly expand the capabilities of our existing numerical tools. A primary goal is to be able to couple our fluidflow simulations to the generalrelativistic (GR) Einstein equations to permit us to accurately study accretion in NSNS, NSBH, and WDBH systems. We have made a commitment to incorporate distributed adaptive mesh refinement (DAMR) techniques and the ability to transition to general relativistic settings by implementing the general relativistic hydrodynamic equations. We will build upon the experience, described above, of accurately simulating accretion flows in nonrelativistic, Newtonian gravitational fields, and will draw upon the expertise of the CoPis in the implementation of AMR and General Relativity codes.
Perhaps the main problem faced by most general relativistic simulations in three dimensions is the lack of computational resources. Such evolutions require a tremendous amount of storage as well as a huge number of floating point operations. As an example, consider a model to solve the Einstein equations with roughly 100 field variables. To model a black holes of mass M with boundaries at a distance about 100M away with a resolution of some small fraction of M, say M/10, requires roughly 1,000 points per side. The total storage required is then 100 fields × 1000^{3 (dimensions)}× 8 Bytes ~ 750 GB. Such an estimate is quite rough and assumes this is enough for a scheme to be capable of evolving the system in a stable way. Perhaps a much larger number of fields will be necessary; and a numerically stable scheme may ultimately require a much improved resolution. Furthermore, the numerical problem involves a large disparity in scales requiring improved resolution. For example, one must simultaneously resolve the nonlinear distortions around a black hole (small fractions of M) as well as the outgoing gravitational waves ~ 10's of M.
These huge computational demands outstrip the ability of current conventional supercomputers, and, even if they did not, such demands will likely increase at a rate faster than that of computer hardware. Add to such demands the fact that once we have good codes, we have a huge parameter space to explore and the turnaround for tests/experiments becomes a huge burden. Thankfully, two computational methods provide some hope. Distributed Adaptive Mesh Refinement: DAMR is an algorithm by which improved resolution (over the initial grid) is added where and when needed (Berger & Oliger 1984). The dynamical creation and destruction of fine subgrids limits the computational work so that it scales with the dynamics one models. Additionally, by breaking the computational problem into pieces – each of which can then be solved separately on different computational nodes – one takes advantage of massively parallel architectures. Achieving efficient distributed AMR requires minimizing the communication between nodes and keeping them busy instead of waiting for information (socalled load balancing). These are nontrivial tasks that we will address as part of our ITR goals. Thus, we need access to large architectures to both test the algorithms and use them in production runs.
We will build upon our previous experience with AMR both in studies of global existence (Liebling 2002), our twodimensional gravitational collapse (Choptuik et al. 2003), and recent AMR application in the characteristic formulation (Pretorius & Lehner 2003). The broadest ITR goal is to develop a distributed AMR code that enables us to evolve stably and consistently various systems of equations (of first and second order in time). Already developed is a simple implementation of AMR in flat space to solve the nonlinear sigma model (Liebling 2002, 2004). Such a model is a good starting point as it is nonlinear, but otherwise relatively simple. Our existing AMR code is written completely in Fortran, which helps make it very readable, portable, and fast. The approach has been to utilize the organizational principles of object oriented programming (e.g. C++) without the incumbent overhead. Thus, within the code are the various objects. At a fundamental level are the various fields which are themselves defined on grids at different resolutions. All grids at a given resolution form a level. In this spirit there are subroutines and functions ("methods") that act on the particular objects. The main driver only needs to operate on levels by, for example, time stepping the levels, applying interpolation between two levels, and outputting levels. To timestep a given level requires acting on the various grids that form the level.
Through this organization, the code has been straightforwardly extended so that grids can be distributed to various processors using the Message Passing Interface (MPI). One processor acts as the master while all other processors enter a loop in which they simply listen for actions. The necessary communication involves telling the processors which grids to create, having the processor which owns the parent of a particular grid update its fields by receiving information from the owner of the child, etc. Such communication can be hidden to some extent as it occurs with respect to grids, not levels. Thus someone familiar with how the single processor code works would have no difficulty understanding the changes made to accommodate different processors. The success obtained in the application of DAMR to study the nonlinear sigma model gives us confidence and a good starting point to tackle the hydrodynamics equations. We will first incorporate this technique into the Newtonian FDH code. This will require some further development in the infrastructure, although we foresee no problems beyond the testing involved behind the search for an efficient loadbalancing strategy. Once this is achieved we will use this powerful tool in future production runs.
At the same time, we will work on modifying the Newtonian code to incorporate the General Relativistic Hydrodynamic equations. In the first stage this will be done assuming a fixed background spacetime metric (corresponding to a single black hole). This will serve as a preliminary project with our ultimate, multiyear goal being to model accretion flows in fully dynamical spacetime settings. A significant amount of computer time is required to develop and test this implementation. Once this stage is passed we will also incorporate distributed AMR technique into the relativistic FDH code. Although this segment of our proposed project should be considered developmental work, a considerable amount of computer time will be required to test the scalability, performance and efficiency of the distributed AMR techniques in a realistic setting (i.e. when dealing with the full equations rather than a toy model). Additionally, the development of general relativistic hydrodynamical codes requires substantial computer time for development and testing.
3. Preliminary Results and Progress to Date
Over the past seven years, we have been awarded a total of approximately 800,000 SUs at the SDSC and UTAustin to study dynamically evolving astrophysical systems in connection with our NSF and NASAsponsored research into the birth and destruction of binary star systems. Our current request for PACI resources will allow us to build directly upon the successful simulation work that has grown out of this earlier, multiyear support. It is important to note, as well, that over the past two years the State of Louisiana has provided generous support to LSU in the broad field of information technology, and through this support LSU has built a 2.2 TeraFlop Linux SuperCluster (called "SuperMike") based on Intel's 1.8GHz Xeon DP processors and the Myrinet 2000 interconnect. (See §5, below, for a more complete description of SuperMike.) As we shall show in what follows, this Linux SuperCluster has generally performed as well as Blue Horizon (the IBM SP3) at the SDSC, so we have used it as a workhorse for our development and simulation work over the past year as well.
i) Parallel CFD Algorithm
Our initial simulations on the T3E at the SDSC were carried out with a version of our code that had been developed on other parallel platforms and written in highperformancefortran (HPF). Using the available Cray HPF compiler (PGHPF, developed by the Portland Group), we found that relatively few unexpected code modifications were required in order to successfully compile and execute our code on the SDSC system. We discovered, however, that the PGHPF compiler was producing executable code that performed computational instructions on each T3E processor with extraordinarily low efficiency. In 1998, we successfully rewrote our entire FDH algorithm, incorporating explicit message passing instructions via mpi. We found that this new, mpibased code executed four to five times faster than the old HPF version of the code! We also found that the code scaled very well as the number of processors was increased from 4 to 128. (Details of the code's performance on the Cray T3E can be found in the attachments to our 1999 NRAC proposal, which is available online: 1999.)
FDH
Code Timings
on the SDSC SP3
(nanoseconds per cell per integration timestep) 

130´66´128
[ 1.1 M cells ] 
130^{2}´128
[ 2.2 M cells ] 
130^{2}´256
[ 4.3 M cells ] 
258´130´256
[ 8.6 M cells ] 
258^{2}´256
[ 17.0 M cells ] 
258^{2}´512
[ 34.0 M cells ] 

16  870  938  740       
32  550  516  570  581     
64  392  315  308  312  306   
128    216  192  169  166  187 
256      150  124  101  100 
512        94  68  62 
Several years ago, we migrated our mpibased FDH code to the SP3 at SDSC. Table 1 illustrates the level of performance and degree of scalability of the code on the SDSC SP3, as it was configured in 2002. The first column lists the number of processors used in each test; and the numbers at the top of every other column specify each test's grid size as well as the total number of grid cells [in millions of cells]. The numbers in the body of the table specify the amount of time that is required to take one integration timestep in a typical simulation; the values quoted are in units of nanoseconds per grid cell. (The wallclock time required to take one timestep is obtained by multiplying the tabulated number by the number of grid cells; for example, running on 128 SP3 processors, a simulation with a grid resolution of 258^{2}×256 requires 166 ns/cell × 17M cells = 2.8 seconds per timestep.) Listed in this manner, if our code scaled perfectly linearly with problem size, the numbers across a given row would all be the same; and if our code parallelized perfectly, the numbers down each column would drop by exactly a factor of two each time the number of processors is doubled. In both directions, the scaling gets closer to linear as we take fuller advantage of the machine's available memory, that is, as we perform simulations with larger and larger grid sizes.
In the summer of 2001, our group participated in a short workshop held at NCSA that introduced users to an early Linux SuperCluster (512 nodes of 1.0GHz Intel Xeon DP processors). Within a matter of two days, our FDH code was successfully ported to this cluster and obtained reasonably good run times. We were therefore in a position to immediately take advantage of SuperMike, when it was installed at LSU in August of 2002. Table 2 shows the performance that we have achieved on SuperMike for problems of comparable size to the ones we had tested earlier on Blue Horizon at SDSC. The timings and the degree of scaling depicted in Table 2 are amazingly similar to the timings reported in Table 1, especially for simulations requiring large grids. For example, on 128 processors, both Blue Horizon and SuperMike require 170  180 ns/cell; and this number drops to approximately 100 ns/cell when twice as many processors are invoked.
FDH
Code Timings
on LSU's Linux Cluster, SuperMike
(nanoseconds per cell per integration timestep) 

66´130´128
[ 1.1 M cells ] 
130´98´128
[ 1.6 M cells ] 
130´98´256
[ 3.3 M cells ] 
162´322´128
[ 6.7 M cells ] 
258´162´256
[ 10.7 M cells ] 
258´162´512
[ 21.4 M cells ] 

16      1074        
32        609      
64  378  349    341  302  326  
128    208    208  179  182  
256          104  106 
(ii) A Significantly Improved Poisson Solver
It has been clear to us for quite some time that one of the most computationally demanding components of our FDH code is the algorithm used to calculate boundary values for the gravitational potential that must be fed into the Poisson solver in order to be able to solve for the Newtonian gravitational potential throughout the volume of the fluid. Historically, we have used an algorithm based on a multipole expansion in terms of spherical harmonics (Tohline 1980; see also Stone & Norman 1992), but spherical harmonics are not a natural basis set when simulations are being conducted on a cylindrical coordinate mesh. Recently we have discovered that the traditional Green's function expansion in cylindrical coordinates which involves an infinite integral over products of Bessel Functions can be written in a much more compact form (Cohl & Tohline 1999) that is extremely wellsuited for implementation on cylindrical coordinate meshes. A parallel implementation of this "compact cylindrical Green's function" technique is now part of our FDH code. It will be used in all of our future simulations because it is both faster and more accurate than the one based on a multipole expansion in terms of spherical harmonics. In addition, its implementation has permitted us to bring the top (and bottom) cylindrical grid boundary down closer to the rotationally flattened mass distribution and reduce the computational lattice size in the vertical direction.
(iii) Establishing a Heterogeneous Computing Environment
As we have described above in §2.c, we have constructed a heterogeneous computing environment through which we are able to routinely create volumerendered images and 3D animation sequences of our evolving fluid configurations. Movies 46, pointed to above, are products of this environment. Initially we developed this heterogeneous computing environment at the SDSC, routinely exchanging data between the Cray T3E and the SGI machines in SDSC's VizLab. We consider the successful establishment of this environment – which smoothly couples highend, SGI visualization tools with our simulation efforts on PACI (and now, LSU) machines – to have been a significant accomplishment.
Our first two years of research utilizing NPACI facilities were focused on the problem of binary star formation. As detailed in the accompanying attachment entitled, "Publications Resulting from NPACI Research," the results of this work have been summarized in a number of different publications, including one feature article in the AprilJune, 1999 issue of NPACI's EnVision Magazine.
In the spring of 2000, while on sabbatical at Caltech, the PI collaborated with K. Thorne and L. Lindblom to model the nonlinear growth of an unstable "rmode" in isolated, rotating neutron stars, and to simulate the tidal disruption of a neutron star that is in orbit about a more massive, Kerr black hole. Through this collaboration, the PI gained access to the HP Exemplar at CACR, a partner in the NPACI alliance. Our initial application of the PI's simulation tools (as described here in §2) to the "rmode" problem have proven to be very successful (Lindblom, Tohline, & Vallisneri 2001, 2002). The results of this work (and images produced with our visualization tools) were featured in the AprilJune 2001 issue of NPACI & SDSC's EnVision Magazine.
Over the past three years, we have turned our attention to problems associated with masstransfer in binary star systems during the late stages of stellar evolution, as described in this proposal. Using NPACI facilities both at SDSC and Austin, TX, we have invested considerable effort finetuning the capabilities of our SCF and FDH algorithms so that they will be able to perform the required modeling tasks with a high degree of quantitative accuracy. Motl et al. (2002) describe in detail how well our code presently performs on a variety of benchmark simulations. One of these simulations – a stable, detached binary system with mass ratio q = 0.84 – was evolved through over five orbital periods on a grid having 130 × 98 × 256 zones in R, Z, q respectively. This evolution is illustrated by Movie3, above, which shows how the equatorialplane density distribution of the system changes with time: Only extraordinarily small changes are observed in this stable system. Even after five orbits, the centersofmass of the two stars and of the system as a whole essentially have not moved from their initial positions. And although a careful examination reveals very lowamplitude, epicyclic motion, quantitative measurements show that the orbits of the stars are circular to the level of two parts in 10^{4}. This provides strong confirmation that our SCF code produces excellent initial states and that the hydrodynamical equations are being integrated forward in time in a physically realistic manner.
We are now in a position to study masstransfer instabilities in unequalmass binaries with a degree of quantitative accuracy that heretofore has not been possible. We not only have improved on the simulation capabilities of New & Tohline (1997), but as Motl et al. (2002) detail, the epicyclic amplitude of our binary orbits are an order of magnitude smaller and angular momentum conservation is an order of magnitude better than in the recently published equalmass binary simulations of Swesty, Wang, and Calder (2000). As Motl et al. (2002) have concluded, with our current simulation tools we should be able to follow masstransfer events in which the fraction of the donor's mass that is transferred is greater than a few × 10^{5} per orbit for approximately 20 orbits. In particular, we are in a position to accurately determine masstransfer instability boundaries of the type discussed elsewhere in this project proposal.
This past year we have begun to utilize this improved simulation tool in a full production mode. Following the evolution of several semidetached systems, we have begun our investigation of both stable and unstable masstransfer systems. Movies 46, discussed above, were all produced from these simulations on LSU's SuperMike. (We originally planned to run these models on Blue Horizon, but SuperMike was far less heavily subscribed so we realized a much higher throughput on SuperMike and it became our computing platform of choice. We actually returned a large fraction of our 2002/03 NRAC allocation to the allocations committee because it became clear that our local LSU resources were adequate for the year.) Each of these simulations utilized 128 processors and a grid containing 4.06M cells. Each integration timestep required approximately 0.8 seconds and each binary orbit required approximately 100,000 timesteps, that is, each orbit required approximately 22 hours of wallclock time. To follow one evolution through 10 full orbits (typical of the evolutionary times required to complete a dynamically unstable masstransfer event) therefore required 910 full days of dedicated access to 128 processors. We expect to be able to continue to carry out 5 or 6 such simulations on SuperMike each calendar year.
However, in order to investigate the stability of binary systems whose initial mass ratio, q, is smaller than the ones discussed above, we will be forced to go to a higherresolution grid. As detailed, below, we estimate that we will need approximately 15 days of dedicated access to 512 nodes of NCSA's "Tungsten" Linux SuperCluster in order to complete the most important binary masstransfer simulations.
4. Justification of Resources Requested
i) Required Platforms
To complete the tasks outlined for this project, we are requesting an allocation of 400,000 SUs on "Tungsten" (the Xeon Linux SuperCluster) and 100,000 SUs on "Titan" (the IA64 Linux SuperCluster) at NCSA (see Table 3 for details). Tungsten will be used for production runs that are designed to further our understanding of nearcritical masstransfer events in binary systems that are governed by Newtonian gravity; we propose to use Titan for a variety of codedevelopment tasks. We feel that Tungsten is the best platform on which to conduct production runs this year because its basic architecture, operating system, and compilers are very similar to the LSU Linux SuperCluster – "SuperMike" – on which we have been carrying out the majority of our simulations over the past year. Tungsten and SuperMike both are Intel Xeon DP processorbased systems with Myrinet 2000 interconnects. It will be an advantage for us to use Tungsten (instead of SuperMike) to carry out our most demanding model simulations primarily for two reasons: (1) Tungsten has 3.06 GHz Xeon DP processors whereas SuperMike has 1.8 GHz processors; hence, on Tungsten we expect to benefit from a roughly 70% speedup in execution times over the numbers posted for SuperMike in Table 2. (2) Because Tungsten has many more hardware nodes than does SuperMike, we will be much more likely to secure 512 processors for dedicated runs on Tungsten and therefore complete our most demanding simulations in a reasonable number of days. In anticipation that the IA64 architecture will be the computing architecture of choice in the near future – especially in connection with the NSF TeraGrid project – we propose to conduct the majority of our NCSA code development work on Titan.
ii) Code Requirements and Optimization Issues
Since we have discussed code optimization issues fairly thoroughly in earlier sections of this proposal, we will not comment further on them here. From the information provided above – in §3.a.ii and Table 2, for example – we can estimate how much computing time will be required to make significant progress on this project. Typically, each binary simulation must be followed through approximately 10 orbits before the physical outcome of the evolution can be ascertained. Utilizing 128 processors and a grid resolution of 162´98´256 [4.1 M grid cells] – as has been typical of the simulations that we have completed over the past year on SuperMike – each integration timestep requires approximately (200 ns/cell × 4.1 M cells) 0.82 seconds. Since, at this resolution, a single binary orbit requires approximately 100,000 integration timesteps, one complete simulation takes approximately (0.82 s/step × 100,000 steps × 10 orbits ÷ 3600 s/hour) 225 wallclock hours = 9.4 days. On NCSA's Tungsten, we expect each simulation to speed up by a factor of 1.7, and we expect to be able to utilize 4 times as many, i.e., 512 processors at a time. Hence, a simulation carried out with the same spatial resolution could be completed in approximately (225 hours ÷ 1.7 ÷ 4) 33 hours = 1.4 days.
The particular, low "q" binary masstransfer simulation that we need to pursue, however, will demand somewhat higher resolution in R and Z, and twice the azimuthal resolution (specifically, on SuperMike we have found that 194´130´512 = 12.9 M grid cells will suffice) and approximately 350,000 (instead of 100,000) timesteps per orbit. (More timesteps are required because the Courantlimited timestep is smaller.) Utilizing 128 processors on SuperMike, we would need (200 ns/cell/timestep × 12.9 M cells × 350,000 steps/orbit × 10 orbits ÷ 3600 s/hour) 2500 wallclock hours = 105 days (!) to follow this system through 10 orbits. Utilizing 512 processors on NCSA's Tungsten, however, we expect to gain the factor of 1.7 × 4 = 6.8 in execution time, so we estimate that one evolution can be completed in 15 days. This estimate is also summarized below in Table 3 where we conclude that one model evolution will demand approximately 189,000 SU on Tungsten.Our proposed simulations at a resolution of 194×130×512 will require approximately 12 GB of RAM. Hence, when spread across 512 processors, there will be no problem fitting the code into Tungsten's available memory.
iii) Proposed Production Runs
As we have indicated in Table 3, we propose to conduct 4 simulations on Tungsten in the coming year: Two, relatively short (no more than 4 orbits) binary masstransfer evolutions will be run at the same grid resolution as has been used on SuperMike in order to ensure that we are getting the same results (and estimated performance gains) on Tungsten. (Only 22,000 SUs are requested for these "trial" evolutions.) Then we propose to run two highresolution models through 10 orbits, utilizing 512 processors on Tungsten. One of these evolutions will start from a mass ratio q = 0.2, which is what our modified linear analysis estimates is the true critical value of q for n = 3/2 polytropic stars. The second, highresolution evolution will either be used to repeat an earlier run that was conducted at low resolution, as a measure of the convergence property of the evolutionary code, or it will be used to identify the critical value of q for a binary system having stars with another polytropic index. Each of these evolutions will require approximately 189,000 SUs on Tungsten.
Proposed Simulations on NCSA's Xeon Linux Supercluster (Tungsten)  N_{models}  N_{orbits}  SU  

A.  Porting and Testing Code  
Grid resolution of 194×130×256
[6.5 M cells] on
128procs: Each orbit requires 100,000 time steps @ 120 ns/cell/timestep ⇒ 2800 SU (22 wallclockhours) per orbit. 
2  4  22,000  
B.  Searching for Critical Initial MassRatio  
Grid resolution of 194×130×512
[12.9 M cells] on
512procs: Each orbit requires 350,000 time steps @ 294 ns/cell/timestep ⇒ 18,900 SU (36.9 wallclockhours) per orbit. 
2  10  378,000  
Total requirement.  400,000 

Proposed Effort on NCSA's IA64 Linux Supercluster (Titan)  SU  
C.  Development of DAMR and Relativistic Hydrodynamics Capabilities  150,000 
As discussed earlier, the request for 150,000 SUs on Titan is being made in conjunction with our commitment (with NSF/ITR support) to expand the capabilities of our FDH code to handle general relativistic fluid flows on fixed spacetime metrics, and to include dynamic AMR capabilities.
v) Special Requirements
Our primary simulation tools (the SCF code and the FDH code) have been developed entirely by our group at LSU and require no special packages other than standard mathematics utilities and messagepassing libraries that are available on NCSA's Linux clusters. Our development work on Titan will be largely selfcontained as well, but we plan to build our relativistic hydro and adaptivemesh algorithms in such a way that they can be integrated smoothly into "Cactus." Hence, we will want to access Cactus utilities on Titan.
5. Local Computing Environment
Through LSU's Concurrent Computing Laboratory for Materials Simulation we have routine access to a dualprocessor SGI Octane and an SGI Onyx that drives an ImmersaDesk system. On this SGI equipment we have installed Maya (TM), the AliasWavefront software that will be used to routinely render 3D images and generate movie sequences from our dynamical simulations, as explained in §2.c. In LSU's department of Physics & Astronomy, we also have access to a variety of Linux PC platforms to conduct our daily serial computing tasks.
In 2001, the Louisiana legislature funded a new, statewide information technology initiative through which LSU received $7M in new, recurring funding. For two years, the PI (Tohline) served as interim director of the Center for Applied Information Technology and Learning (LSU CAPITAL) that has been established at LSU through this initiative. A portion of the first year's funding was used to purchase a 1024processor, Linux SuperCluster, nicknamed "SuperMike": 1.8 GHz Intel P4 Xeon processors; Myrinet 2000 network; 1 GByte RAM per processor (see www.lsu.edu/ lsucapital/ beowulf.html for more details). When this hardware system was installed in August, 2002, it was clocked at 2.2 TeraFlops on standard highperformance linpack benchmarks and it was ranked by Top500.org as the 11th fastest supercomputer in the world. As described elsewhere in this proposal, this system has proven to be a superb workhorse for our simulation efforts over the past year. We will continue to have access to this system over the coming year and expect to perform (lower resolution) simulations on SuperMike that will complement the project described herein.
6. Other Supercomputer Support
Our group does not presently have access to any other supercomputing resources, beyond the PACI and LSU hardware architectures mentioned elsewhere in this proposal.
7. Project Team Qualifications
A major component of the PI's research work over the past twenty years has involved the largescale numerical simulation of multidimensional gas flows in astrophysical systems utilizing a variety of different high performance computing platforms. In the early 1980s, the PI held a postdoctoral research position in group T6 at Los Alamos National Laboratories at which time he broadened his exposure to stateoftheart CFD techniques and learned to develop efficient algorithms for vector supercomputers. From the mid1980s through the early 1990s, Tohline was PI on a large account at the Cornell Theory Center during which time he and his students experimented with the parallel implementation of CFD algorithms on the IBM 3090 and various accompanying FPS Array Processors. When an 8Knode MasPar MP1 system arrived at LSU in 1991, the PI abandoned the CTC facilities because his group was having considerable success utilizing the MasPar system for its largest simulation efforts. As has been detailed elsewhere in this proposal, in recent years the PI and his students have gained experience developing gravitational CFD algorithms for execution on a variety of parallel platforms including MasPar's MP2, the CM5, the Cray T3E, the IBM SP2 & SP3, and large Linux Clusters. The group also has developed and implemented a heterogeneous computing environment (at LSU and at the SDSC) that tightly couples its parallel simulation efforts to sophisticated visualization tools.
CoPI J. Frank has broad experience in studies of the astrophysics of interacting binary systems, particularly when such interactions involve fluid flow and accretion events. He is, for example, lead author of a widely referenced technical CUP publication entitled, "Accretion Power in Astrophysics." His expertise within this project helps ensure that the results of the proposed complex multidimensional and nonlinear simulations are physically correct and quantitatively meaningful when placed in the context of the extensive literature on this subject.
CoPI Lehner brings to the team nine years of experience in the implementation of the Einstein equations using finite difference approximations; he received the Nicholas Metropolis Award in 1999 for his work on implementing the Einstein equations numerically. He has worked with different approaches for obtaining the Einstein equations in a manner suitable for numerical simulations (Cauchy and characteristic based approaches). He has published over twenty papers in the subject of the present proposal; among them an invited review of the field of numerical relativity for Classical and Quantum Gravity. He has also been a user on NPACI allocations from 1998 and 1999 under PIs Richard Matzner and Matthew Choptuik at The University of Texas at Austin. Lehner also is lead investigator on the NSFfunded, ITR project in partial support of which this request for NPAC resources is being made. (Tohline and Liebling are CoPIs on this NSFfunded project.)
CoPI Liebling has experience modeling gravitational collapse, with an emphasis on black hole critical phenomena. These studies make essential use of computational techniques including adaptive mesh refinement. He has also begun development of a distributed adaptive mesh infrastructure using small allocations on both NPACI (AST030002) and NCSA machines (PHY030008N). He has also been a user on NPACI allocations from 1998 and 1999 under PIs Richard Matzner and Matthew Choptuik at The University of Texas at Austin.
Shangli Ou and Mario D'Souza are graduate students enrolled in the Ph.D. physics program at LSU. They both will be heavily involved in the simulation and code development work outlined in this proposal. Both students are in the fifth year of their graduate studies and will be relying upon the request PACI resources to complete their dissertation research projects.