A REQUEST FOR NSF PACI RESOURCES
Joel E. Tohline, Juhan Frank, & Patrick Motl
Department of Physics & Astronomy,
Louisiana State University
This entire document, along with several illustrative movies is available online at http://www.phys.lsu.edu/faculty/tohline/NRAC_2005/ 
1. Summary of Proposed Research, Including Objectives and Goals
Over half of all stars in the sky are multiple star systems and most of these are binary stars. We recently have completed a review of the mechanisms that have been proposed to explain the origin of binary star systems (Tohline 2002). Approximately one half of the binary stars are close enough to one another for mass to be exchanged between the components at some point in their evolution (Trimble 1983). The components of these interacting binaries influence one another in a variety of ways including tidal forces, irradiation, and in the most extreme manifestation of the interaction, through mass transfer and mass exchange. As a result, the evolution and observational properties of the binary components and the binary itself are dramatically different from the behavior expected in the absence of mass transfer, even when this transfer proceeds at a leisurely pace.
Interacting binaries of one type or another, especially those in which the accretor is a compact object (Frank, King & Raine 2002), play central roles in many of the astrophysical sources of high energy photons, and are involved in the generation of gravitational and highenergy particle radiation. Most of the interacting binary types known observationally are presently undergoing longterm, stable masstransfer on timescales dictated by secular angular momentum losses (e.g. gravitational radiation and magnetic braking) and/or nuclear and/or thermal evolution. Wellknown examples of the above are cataclysmic variables (CVs) [Warner 1995, Beuermann 2002], lowmass Xray binaries (LMXBs) [White et al. 1995; van Paradijs & McClintock 1995; Verbunt & van den Heuvel 1995], soft Xray transients (SXTs) or Xray novae [Chen, Shrader & Livio 1997], Algols (Batten 1989, Vesper Honeycutt & Hunt 2001; Zavala et al. 2002) and W Serpentis stars (Wilson 1989), contact binaries (W Ursae Majoris stars) [Rucinski 1989; Qian 2001], supersoft Xray sources [Greiner 2000, Rappaport, Di Stefano & Smith 1994], and the ultraluminous Xray sources in nearby galaxies [King et al. 2001]. In these systems, the fraction of the donor star's mass that is transferred during one orbit is typically » 10^{12} to 10^{9}, many orders of magnitude less than what current numerical 3D hydrocodes can resolve. All of the above systems, however, must have descended from binaries in which the accretor of today was initially the more massive component who evolved first off the mainsequence. The mass transfer in these progenitor systems was in many instances dynamically unstable, yielding masstransfer rates many orders of magnitude above the currently observed rates and leading in some cases to a common envelope phase (see e.g. Warner 1995; Verbunt & van den Heuvel 1995; Nelson & Eggleton 2001) or a thermal masstransfer phase (King & Schenker 2002).
In addition, there is a wide class of binary star systems not presently undergoing masstransfer for which the astrophysical scenarios that have been proposed to explain their origin or ultimate fate involve dynamical or thermal masstransfer in a close binary. Examples of such systems are millisecond pulsars (Bhattacharya 1995), some central stars of planetary nebulae (Iben & Livio 1993), double degenerate white dwarf binaries, perhaps leading to supernovae of type Ia through a merger (Iben & Tutukov 1984), or subdwarf sd0, sdB stars (Iben 1990), and double neutron star binaries, perhaps yielding gray bursts in a fireball when the neutron stars coalesce (Paczynski 1986, Mészáros & Rees 1992, Ruffert et al 1997, Mészáros 2001). The evolutionary scenarios that are drawn upon to explain the existence of some of these systems call for events in which 10% or more of the donor's mass can be transferred during a single orbit.
We are conducting a multiyear project that focuses on these relatively rapid phases of masstransfer, investigating the conditions required for stable/unstable masstransfer and following numerically their evolution in unstable cases. The dynamical masstransfer that ensues in unstable binaries rapidly reaches nonlinear amplitudes and may lead to the formation of a massive disklike object, maybe a common envelope phase (Iben & Livio 1993), some mass loss from the system, and in some cases, end in a merger. These phenomena are truly threedimensional in character and require largescale numerical simulations to further our theoretical understanding of them. While the direct observation of these phases – with the possible exception of gravitational wave emission from mergers – remains unlikely in the near future, these investigations will help us understand the origin of these binaries and their mutual relationships. With our present numerical simulation tools, we are in a position to study both the tidal and the masstransfer stability of close binaries in which the component stars obey a polytropic equation of state (EOS) but, in general, possess nonuniform entropy. These simple structural approximations enable us to treat, with some limitations, a very diverse set of binary systems containing main sequence (MS) and white dwarf (WD) stars. In addition, we are embarking on a collaborative project to extend our numerical algorithms to handle relativistic fluid flows and to incorporate adaptivemeshrefinement (AMR) capabilities. In the future, this expanded toolset will permit us to spatially resolve accretion disk structures as they form around various types of compact stellar objects, and to accurately simulate masstransfer in neutron starneutron star (NSNS), neutron starblack hole (NSBH), and WDBH systems – systems which are especially relevant to the LIGO and LISA experiments as sources of gravitational radiation.
In an effort to illustrate our early successes with this longterm, numerically intensive project, throughout this proposal we will reference a variety of short animation sequences (movies) that we have produced from the results of various extended simulations. If you're reading this proposal online, each movie can be accessed by clicking on the relevant "movie icon." The URL of each movie is also listed in the "Attachments" section at the end of the proposal text.
We distinguish two types of dynamical instability that can occur in close binaries: 1) A dynamical instability which arises in close proximity of extended binary components with sufficiently stiff EOS due to the presence of tidal forces, even before contact and onset of masstransfer; and 2) A dynamical instability which arises upon contact, even when total angular momentum and mass are conserved, as mass is transferred from one component to the other. We shall refer to these instabilities as the tidal and masstransfer instabilities, respectively.
MOVIE1 Tidal Instability 



mpeg
(701K) 
In binaries with unequal components, in general one of the stars will reach contact with its Roche lobe first (the donor) and masstransfer to its companion star (the accretor) will commence either because the donor star is evolving or because the binary separation is reduced by systemic angular momentum losses. What happens next, i.e. the stability of binaries to masstransfer, depends crucially on the reaction of the donor star to massloss, and on the changes in Roche geometry that ensue from the masstransfer. Important factors determining the evolution of the Roche geometry are: 1) whether all the mass transferred by the donor is accreted (conservative masstransfer); and 2) if some of the mass transferred is lost from the system, what are the consequential angular momentum losses (Webbink 1985; King & Kolb 1995). If upon contact and onset of masstransfer, the radius of the donor and the radius of the Roche lobe vary in such a way that contact becomes deeper, masstransfer will proceed at an ever increasing rate and the situation is unstable. If, however, the contact becomes shallower, the donor will detach and masstransfer will cease. In the latter case, if some physical effect other than masstransfer insists on driving the system back into contact, stable masstransfer may be possible for as long as the driving is present. (As mentioned above, in CVs and LMXBs, systemic angular momentum losses drive stable secular masstransfer from a less massive donor onto a compact object.) In a series of simulations that we have completed over the past few years, we have focused our investigation on the effect that the initial system mass ratio q º M_{donor}/M_{accretor} has on the stability of such masstranfer events.
Before outlining what evolutionary scenarios are likely to develop as we follow the nonlinear evolution of systems that are found to be unstable toward either a tidal or masstransfer instability, it is important to point out that throughout the execution of this project we also are making a concerted effort to demonstrate the degree to which our nonlinear simulation techniques are able to accurately represent the equilibrium properties of close binary systems that are dynamically stable.
MOVIE2 q = 1 Contact Binary 



Quicktime (1,380K) 
MOVIE3 Stable, q = 0.84 Binary 



Quicktime (8,451K) 
In those cases where the masstransfer is initially unstable, we now are able to follow the evolution through ~ 20 orbits to investigate how initial conditions (e.g., mass ratio, entropy gradients, or polytropic index) influence the nonlinear growth of the masstransfer and the final outcome. Depending on the assumed conditions, some binaries may become stable after a certain mass excess is transferred, some may exchange mass, and still others may evolve through a common envelope phase. This schematic division is adopted for simplicity since there may well exist cases where a combination of the above processes operate sequentially.
When the donor is more massive than the accretor (q > 1) and has a deep convective envelope, the masstransfer almost certainly will be dynamically unstable. This is because, as mass is transferred from the more massive star to the less massive one, the orbital separation should actually decrease with time while the radius of the donor is increasing. It is generally thought that once the rate of transfer is high enough, the accretor is swamped and driven out of thermal equilibrium, and the matter transferred forms a common envelope. Most calculations of the subsequent evolution have dealt with a relatively compact star (the accretor) orbiting or plunging into the convective (polytropic) atmosphere of a giant star (Rasio & Livio 1996; Terman, Taam & Hernquist 1994; 1995 and references therein). Some calculations by other groups have specifically focused on the ejection of the envelope (Taam et al. 1997; Sandquist et al. 1998).
MOVIE4 q = 1.32 



Quicktime (137 MB) 
MOVIE5 q = 0.5 



Quicktime (46 MB) 
The coalescence of WDWD and NSNS binaries is important to examine since this process may produce a number of astrophysically interesting objects and events. Double white dwarf binary mergers have been suggested as precursors to some Type Ia Supernovae (Iben & Tutukov 1984; Iben 1988, 1991; Iben & Webbink 1989; Branch et al. 1995) and to long gray bursts (Katz & Canel 1996). WDWD mergers may also lead to the formation of massive single white dwarfs or neutron stars (Colgate & Petschek 1982; Saio & Nomoto 1985; Iben & Tutukov 1986; Kawai, Saio, & Nomoto 1987; Chen & Leonard 1993); to the formation of subdwarf stars; or to the formation of hydrogen deficient, highly luminous stars (Iben 1990 and references therein; Webbink 1984). NSNS mergers may result in bursts of grays and neutrinos, in the production of rprocess elements, and in the formation of black holes (BH) (Eichler et al. 1989; Meyer 1989; Narayan, Paczynski, & Piran 1992; Rasio & Shapiro 1992; Davies, Benz, Piran, & Thielemann 1994; Katz & Canel 1996; Lipunov et al. 1995; Ruffert, Janka, & Schäfer 1996; New & Tohline 1997; Swesty, Wang & Calder 2000; Rosswog, Davies, Thielemann & Piran 2000). Merging compact binaries are also expected to be relatively strong sources of gravitational radiation (Abramovici et al. 1992; Cutler et al. 1993; Thorne 1995).
Most of the numerical simulations of mergers that have been carried out in recent years have been focused on the NSNS or NSBH problem because of their direct relevance to LIGO (and its sister instruments, like TAMA and VIRGO); and most, although not all (cf. Rasio & Shapiro 1994; Zhuge et al. 1996; Rosswog et al. 2000), have focused on equalmass (q = 1) systems. Even when relativistic effects are ignored in such simulations, the stars usually merge on a dynamical timescale because one star (or both simultaneously, in the equalmass case) encounters a tidal instability before either star fills its Roche lobe (Rasio & Shapiro 1994; New & Tohline 1997). This occurs, in turn, because the EOS of a NS is relatively stiff. As Rasio & Shapiro (1994) have summarized, the existence of a Roche limit and any associated masstransfer instability has little importance for most NS star binaries because this "dynamical instability always arises before the Roche limit." When mass transfer begins, the system is already in "a state of dynamical coalescence," so the masstransfer rates can be much larger than 10% per orbit. One such evolution, taken from New & Tohline (1997), was highlighted above as Movie1.
Although we are interested in continuing to pursue this type of merger problem, the focus of our present research is on systems in which the secondary star has a relatively soft EOS (e.g., n = 3/2 polytrope) and the star fills its Roche lobe before the system encounters a tidal instability. As we have discovered – see, for example, Movie4, above – these masstransfer events can also lead to a merger of the two stars.
2. Computational Methodology/Algorithms
In carrying out a number of additional binary masstransfer simulations this year, as outlined in §2.d, below, we will follow the same procedures as have lead to the various simulation results discussed above and illustrated through Movies 15. For each simulation, we will initially utilize a 3D selfconsistent field (SCF) technique to generate accurate equilibrium structures of synchronously rotating binaries in circular orbits. Each rotationally flattened and tidally distorted binary model will then serve as input into (initial conditions for) a dynamical simulation. Each dynamical simulation will be performed using a 3D, finitedifference hydrodynamic (FDH) technique that follows in a selfconsistent fashion the motion of every fluid element in both stars. Our present SCF and FDH codes, as well as forerunners of these codes, have been effectively used to study gravitationally driven instabilities in protostellar gas clouds (Durisen et al. 1986; Woodward, Tohline & Hachisu 1994; Cazes & Tohline 2000); rapidly rotating, compact stellar objects (New, Centrella & Tohline 2000; Lindblom, Tohline & Vallisneri 2001, 2002; Ou & Tohline 2004); and close binary systems (New & Tohline 1997; Motl, Tohline, & Frank 2002). A detailed description of the present structure of both codes and their capabilities is given in Motl, Tohline, & Frank (2002). Here we briefly summarize the principal elements of both.
It is important that we have the capability to construct accurate equilibrium structures as initial input into each dynamical simulation so that the simulations are not dominated by dynamical transients that depend heavily on initial conditions. We use the iterative SCF technique described by Hachisu (1986a,b) in its most general form in order to construct unequalmass polytropic binaries with the internal structure of both stellar components fully resolved. Models are constructed on a uniformly zoned, cylindrical mesh that conforms exactly to the mesh that is used in the FDH code (see below) so that no interpolation is needed when the converged initial state is introduced into the dynamical code. To date, we have used this SCF technique to generate a wide variety of synchronous, polytropic binaries with mass ratios 0.1 < q < 1.5 in which both stars have uniform (but different) entropy distributions (homentropic). We can readily build detached, semidetached or, in the limit of identical components, contact systems. These models have been constructed on computational grids ranging in size from ~ 128^{3} to ~ 256^{3} as necessary to conform to the chosen resolution of the FDH code. As measured by the global virial error, these models typically converge to better than one part in 10^{5}. As a further test, we have compared parameters from our equilibrium binaries with those calculated from the point mass Roche model by Mochnacki (1984). As expected, our results differ from Mochnacki's because we use a selfconsistent gravitational potential as opposed to the point mass potential as is appropriate in the Roche model. Our results converge to Mochnacki's in cases where the mass ratio becomes small and/or the separation increases.
In addition to polytropic binaries, we are able to construct selfconsistent, Newtonian models of compact binaries by choosing an appropriate equation of state such as the zero temperature WD equation of state (Hachisu 1986) or any of several realistic NS equations of state as described by New & Tohline (1997). Binary models using homentropic equations of state are, however, inadequate for examining the stability of mass transfer in main sequence stars without deep convective envelopes (stars with masses greater than but on the order of one solar mass). By allowing the specific entropy to be a function of the density, we are able to construct equilibrium configurations that are both convectively stable and that reproduce the desired massradius relationship. In the context of this present proposal, however, we will confine our study to homentropic binaries with n = 3/2 polytropic equations of state.
Our primary dynamical simulation tool is an mpi (message passing interface) based algorithm designed to perform multidimensional, gravitational CFD (computational fluid dynamic) simulations efficiently on parallel computing platforms. As has been described in detail by Motl et al. (2002), the evolution of a system is governed through a finitedifference representation of the multidimensional equations that describe the dynamics of inviscid, Newtonian selfgravitating, compressible gas flows. More specifically, the employed finitedifference hydrodynamics (FDH) algorithm is based on the secondorder accurate, van Leer monotonic interpolation scheme described by Stone and Norman (1992), but extended to three dimensions on a uniform, cylindrical coordinate mesh. Most of our simulations, to date (e.g., those associated with Movies 47), have been performed on a computational lattice containing 162´98´256 grid zones in the R, Z, and q directions, respectively, with no assumed geometric symmetries. This has been sufficient to resolve both binary components, along with the masstransfer stream. In order to properly resolve both stars in a system with a mass ratio q < 0.2 – and thereby test the dynamical stability criterion alluded to above – we will find it necessary to adopt a lattice containing significantly more grid zones because in such a low q system the volume of the Rochelobe surrounding the lessmassive (donor) star is significantly smaller than the Rochelobe of the more massive, accretor. (As discussed further, below, a lattice having dimensions 194´130´512 appears to be sufficient.) The fluid equations are advanced in time via an explicit integration scheme so an upper limit to the size of each integration time step is set by the familiar Courant condition. At each time step, the Poisson equation is solved implicitly using a combined Fourier transformation and iterative ADI scheme (Cohl, Sun & Tohline 1997) in order to determine, in a selfconsistent fashion, how fluid elements are to be accelerated in response to the fluid's own Newtonian selfgravity. Boundary values for the Poisson equation are determined via a compact cylindrical Green's function technique (Cohl & Tohline 1999). By performing our simulations in a frame of reference that is rotating with an angular velocity W_{0} equal to the initial angular velocity of the binary orbit, we are able to minimize the effects of numerical viscosity (see New & Tohline 1997; Swesty, Wang and Calder 2000), which invariably plays a role in any FDH algorithm.
We have constructed a heterogeneous computing environment through which we are able to routinely create volumerendered images and 3D animation sequences of our evolving fluid configurations. Relatively small amounts of data are regularly passed across the network from the parallel machine that is performing the FDH simulation to a machine that is more suited to visualization tasks; images that will become part of an extended animation sequence are created from this data; then the 3D data sets are immediately destroyed in order to minimize the impact on disk storage. We are utilizing the scriptable attributes of Maya (formerly AliasWavefront) software on SGI systems in order to perform these critical visualization tasks. (The 3D "Movie" sequences referred to throughout this proposal have been produced in this fashion. This visualization tool and the heterogeneous computing environment in which it functions will provide a critical component of the simulations being proposed here.
This is a computationally intensive, multiyear research project that will provide a more complete understanding of tidal and masstransfer instabilities in close binary systems. With the allocation of PACI resources that we are requesting this year, we expect to ascertain whether any n = 3/2 polytropic binary system that is initially unstable toward masstransfer can return to stability before the system merges. As explained above, linear theory predicts that the critical mass ratio is q_{crit} = 2/3. However, simulations that we have completed that have started with a massratio near the value 2/3 have not regained stability; instead, they have each led to progressively higher mass transfer rates and, ultimately, to merger of the binary system. (These simulations each resembled the much higher massratio evolution referenced above as Movie4.) It appears as though these relatively highq simulations do not return to stability for two reasons: First, because the accretion stream directly impacts the accretor, angular momentum carried by the stream material is robbed from the orbit and deposited onto the accretor. Hence, the assumption that orbital angular momentum is conserved during the masstransfer event, which is built into the linear theory prediction, is invalidated. Second, the masstransfer rates grow to such high (nonlinear) levels during the evolution of these dynamically unstable systems that the transfer is difficult to shut off even when q drops to a level below which the binary separation starts to increase.
MOVIE6 q = 0.41 



Quicktime (60 MB) 
ii) Code Development
Through NSF's mediumITR program, we have been awarded funds to significantly expand the capabilities of our existing numerical tools. A primary goal is to be able to couple our fluidflow simulations to the generalrelativistic (GR) Einstein equations to permit us to accurately study accretion in NSNS, NSBH, and WDBH systems. We have made a commitment to incorporate adaptive mesh refinement (AMR) techniques that execute efficiently in parallel computing environments and the ability to transition to general relativistic settings by implementing the general relativistic hydrodynamic equations. We will build upon our own experience, described above, of accurately simulating accretion flows in nonrelativistic, Newtonian gravitational fields, and will draw upon the expertise of our relativity colleagues (who are CoPIs on the NSF/ITR grant) in the implementation of AMR and General Relativity codes.
Perhaps the main problem faced by most general relativistic simulations in three dimensions is the lack of computational resources. Such evolutions require a tremendous amount of storage as well as a huge number of floating point operations. As an example, consider a model to solve the Einstein equations with roughly 100 field variables. To model a black hole of mass M with boundaries at a distance about 100M away with a resolution of some small fraction of M, say M/10, requires roughly 1,000 points per side. The total storage required is then 100 fields × 1000^{3 (dimensions)}× 8 Bytes ~ 750 GB. Such an estimate is quite rough and assumes this is enough for a scheme to be capable of evolving the system in a stable way. Perhaps a much larger number of fields will be necessary; and a numerically stable scheme may ultimately require a much improved resolution. Furthermore, the numerical problem involves a large disparity in scales requiring improved resolution. For example, one must simultaneously resolve the nonlinear distortions around a black hole (small fractions of M) as well as the outgoing gravitational waves ~ 10's of M.
These huge computational demands outstrip the ability of current conventional supercomputers, and, even if they did not, such demands will likely increase at a rate faster than that of computer hardware. Add to such demands the fact that once we have good codes, we have a huge parameter space to explore and the turnaround for tests/experiments becomes a huge burden. Thankfully, two computational methods provide some hope. Adaptive Mesh Refinement: AMR is an algorithm by which improved resolution (over the initial grid) is added where and when needed (e.g., Berger & Oliger 1984). The dynamical creation and destruction of fine subgrids limits the computational work so that it scales with the dynamics one models. Additionally, by breaking the computational problem into pieces – each of which can then be solved separately on different computational nodes – one takes advantage of massively parallel architectures. Achieving efficient distributed AMR requires minimizing the communication between nodes and keeping them busy instead of waiting for information (socalled load balancing). These are nontrivial tasks that we will address as part of our ITR goals. We will draw from the experience that has been gained by other groups in the development of efficient AMR algorithms for a variety of problems – for example, the codes ENZO, CHOMBO, PARAMESH, and CARPET (a thorn of CACTUS) all utilize AMR and we have extensive experience utilizing ENZO for cosmological simulations.
At the same time, we will work on modifying the Newtonian code to incorporate the General Relativistic Hydrodynamic equations. In the first stage this will be done assuming a fixed background spacetime metric (corresponding to a single black hole). This will serve as a preliminary project with our ultimate, multiyear goal being to model accretion flows in fully dynamical spacetime settings. A significant amount of computer time is required to develop and test this implementation. Once this stage is passed we will also incorporate distributed AMR technique into the relativistic FDH code. This segment of our proposed project should be considered developmental work. Nevertheless, a considerable amount of computer time will be required to test the scalability, performance and efficiency of the distributed AMR techniques in a realistic setting (i.e. when dealing with the full equations rather than a toy model). Additionally, the development of general relativistic hydrodynamical codes requires substantial computer time for development and testing.
The development work described in the preceding few paragraphs is being done as we look toward future years of research when we will begin to focus more of our simulation efforts on relativistic systems. At the present time, we are not requesting PACI resources to directly support these development efforts as we anticipate that LSU's large Linux Cluster (SuperMike) will support the development work adequately. Nevertheless, we felt that it was important to include a brief outline of this work in the context of our request for an allocation of PACI resources because the efforts are tightly linked.
3. Preliminary Results and Progress to Date
Over the past eight years, we have been awarded a total of approximately 1,200,000 SUs at the SDSC, UTAustin, and most recently NCSA to study dynamically evolving astrophysical systems in connection with our NSF and NASAsponsored research into the birth and destruction of binary star systems. Our current request for PACI resources will allow us to build directly upon the successful simulation work that has grown out of this earlier, multiyear support. It is important to note, as well, that over the past several years the State of Louisiana has provided generous support to LSU in the broad field of information technology, and through this support LSU has built a 2.2 TeraFlop Linux SuperCluster (called "SuperMike") initially based on Intel's 1.8GHz Xeon DP processors and the Myrinet 2000 interconnect. (See §5, below, for a more complete description of SuperMike.) As we shall show in what follows, in 2002 this Linux SuperCluster generally performed as well as Blue Horizon (the IBM SP3) at the SDSC and, because it was dedicated to the support of LSU reseachers, a larger number of processors were routinely available to us on SuperMike than we could secure on Blue Horizon. So, in 2002 SuperMike became the system of choice for our development and simulation work, and our experience with this system has led us to select NCSA's "tungsten" as our preferred PACI hardware system.
i) Parallel CFD Algorithm
Our initial simulations on the Cray T3E at the SDSC were carried out with a version of our code that had been developed on other parallel platforms and written in highperformancefortran (HPF). Using the available Cray HPF compiler (PGHPF, developed by the Portland Group), we found that relatively few unexpected code modifications were required in order to successfully compile and execute our code on the SDSC system. We discovered, however, that the PGHPF compiler was producing executable code that performed computational instructions on each T3E processor with extraordinarily low efficiency. In 1998, we successfully rewrote our entire FDH algorithm, incorporating explicit message passing instructions via mpi. We found that this new, mpibased code executed four to five times faster than the old HPF version of the code! We also found that the code scaled very well as the number of processors was increased from 4 to 128.
Shortly thereafter, we migrated our mpibased FDH code to the SP3 at SDSC. Table 1 illustrates the level of performance and degree of scalability of the code on the SDSC SP3, as it was configured in 2002. The first column lists the number of processors used in each test; and the numbers at the top of every other column specify each test's grid size as well as the total number of grid cells [in millions of cells]. The numbers in the body of the table specify the amount of time that is required to take one integration timestep in a typical simulation; the values quoted are in units of nanoseconds per grid cell. (The wallclock time required to take one timestep is obtained by multiplying the tabulated number by the number of grid cells; for example, running on 128 SP3 processors, a simulation with a grid resolution of 258^{2}×256 requires 166 ns/cell × 17M cells = 2.8 seconds per timestep.) Listed in this manner, if our code scaled perfectly linearly with problem size, the numbers across a given row would all be the same; and if our code parallelized perfectly, the numbers down each column would drop by exactly a factor of two each time the number of processors is doubled. In both directions, the scaling gets closer to linear as we take fuller advantage of the machine's available memory, that is, as we perform simulations with larger and larger grid sizes.
FDH
Code Timings
on the SDSC SP3 in 2002
(nanoseconds per cell per integration timestep) 

130´66´128
[ 1.1 M cells ] 
130^{2}´128
[ 2.2 M cells ] 
130^{2}´256
[ 4.3 M cells ] 
258´130´256
[ 8.6 M cells ] 
258^{2}´256
[ 17.0 M cells ] 
258^{2}´512
[ 34.0 M cells ] 

32  550  516  570  581     
64  392  315  308  312  306   
128    216  192  169  166  187 
256      150  124  101  100 
In the summer of 2001, we participated in a short workshop held at NCSA that introduced users to an early Linux SuperCluster (512 nodes of 1.0GHz Intel Xeon DP processors). Within a matter of two days, our FDH code was successfully ported to this cluster and obtained reasonably good run times. We were therefore in a position to immediately take advantage of SuperMike, when it was installed at LSU in August of 2002. Table 2 shows the performance that we have achieved on SuperMike for problems of comparable size to the ones we had tested earlier on Blue Horizon at SDSC. The timings and the degree of scaling depicted in Table 2 are amazingly similar to the timings reported in Table 1, especially for simulations requiring large grids. For example, on 128 processors, both Blue Horizon and SuperMike require 170  180 ns/cell; and this number drops to approximately 100 ns/cell when twice as many processors are invoked.
FDH
Code Timings
on LSU's Linux Cluster, SuperMike, in 2003
(nanoseconds per cell per integration timestep) 

66´130´128
[ 1.1 M cells ] 
130´98´128
[ 1.6 M cells ] 
130´98´256
[ 3.3 M cells ] 
162´322´128
[ 6.7 M cells ] 
258´162´256
[ 10.7 M cells ] 
258´162´512
[ 21.4 M cells ] 

32        609      
64  378  349    341  302  326  
128    208    208  179  182  
256          104  106 
In April, 2004, we received an allocation of 400,000 SUs on NCSA's Linux SuperCluster, "tungsten." However, tungsten did not become fully available for production runs until the Fall. Since that time, we have made significant use of its resources, including one very successful weeklong dedicated run on 256 processors in early January, 2005. Table 3 contains the timing information that we have gathered this month (January, 2005) on tungsten. A comparison with data provided in Table 2 shows that, generally speaking, tungsten provides us with a 510% speedup over SuperMike. (The 133 ns/cell/timestep reported in the last row of Table 3 has been calculated from the dedicated production run that contained 12.9 M cells. It is a bit slower than what we would obtain in a standard test run because the production run included a significant amount of data I/O.)
FDH Code
Timings on NCSA's tungsten, in 2005
(nanoseconds per cell per integration timestep) 

130´66´128
[ 1.1 M cells ] 
130^{2}´128
[ 2.2 M cells ] 
130^{2}´256
[ 4.3 M cells ] 
258´130´256
[ 8.6 M cells ] 
258^{2}´256
[ 17.0 M cells ] 
258^{2}´512
[ 34.0 M cells ] 

32    401    571  496   
64    335    352  297   
128    190    158  160   
256        133   
(ii) A Significantly Improved Poisson Solver
It has been clear to us for quite some time that one of the most computationally demanding components of our FDH code is the algorithm used to calculate boundary values for the gravitational potential that must be fed into the Poisson solver in order to be able to solve for the Newtonian gravitational potential throughout the volume of the fluid. Historically, we have used an algorithm based on a multipole expansion in terms of spherical harmonics (Tohline 1980; see also Stone & Norman 1992), but spherical harmonics are not a natural basis set when simulations are being conducted on a cylindrical coordinate mesh. In the late '90s, we discovered that the traditional Green's function expansion in cylindrical coordinates which involves an infinite integral over products of Bessel Functions can be written in a much more compact form (Cohl & Tohline 1999) that is extremely wellsuited for implementation on cylindrical coordinate meshes. A parallel implementation of this "compact cylindrical Green's function" technique is now part of our FDH code. It will be used in all of our future Newtonian simulations because it is both faster and more accurate than the one based on a multipole expansion in terms of spherical harmonics. In addition, its implementation has permitted us to bring the top (and bottom) cylindrical grid boundary down closer to the rotationally flattened mass distribution and reduce the computational lattice size in the vertical direction.
(iii) Establishing a Heterogeneous Computing Environment
As we have described above in §2.c, we have constructed a heterogeneous computing environment through which we are able to routinely create volumerendered images and 3D animation sequences of our evolving fluid configurations. Movies 47, pointed to above, are products of this environment. Initially we developed this heterogeneous computing environment at the SDSC, routinely exchanging data between the Cray T3E and the SGI machines in SDSC's VizLab. We consider the successful establishment of this environment – which smoothly couples highend, SGI visualization tools with our simulation efforts on PACI (and now, LSU) machines – to have been a significant accomplishment.
Our first two years of research utilizing NPACI facilities were focused on the problem of binary star formation. As detailed in the accompanying attachment entitled, "Publications Resulting from NPACI Research," the results of this work have been summarized in a number of different publications, including one feature article in the AprilJune, 1999 issue of NPACI's EnVision Magazine. In the spring of 2000, while on sabbatical at Caltech, the PI collaborated with K. Thorne and L. Lindblom to model the nonlinear growth of an unstable "rmode" in isolated, rotating neutron stars, and to simulate the tidal disruption of a neutron star that is in orbit about a more massive, Kerr black hole. Through this collaboration, the PI gained access to the HP Exemplar at CACR, a partner in the NPACI alliance. Our initial application of the PI's simulation tools (as described here in §2) to the "rmode" problem proved to be very successful (Lindblom, Tohline, & Vallisneri 2001, 2002). The results of this work (and images produced with our visualization tools) were featured in the AprilJune 2001 issue of NPACI & SDSC's EnVision Magazine. For his dissertation work, LSU graduate student Shangli Ou has extended this work to examine the nonlinear development of the secular "barmode" instability in rotating neutron stars (Ou 2004; Ou & Tohline 2004).
Over the past several years, we have turned our primary attention to problems associated with masstransfer in binary star systems during the late stages of stellar evolution, as described in this proposal. Using NPACI facilities both at SDSC and Austin, TX, we invested a considerable amount of effort finetuning the capabilities of our SCF and FDH algorithms so that they would be able to perform the required modeling tasks with a high degree of quantitative accuracy. Motl et al. (2002) describe in detail how well our code presently performs on a variety of benchmark simulations. One of these simulations – a stable, detached binary system with mass ratio q = 0.84 – was evolved through over five orbital periods on a grid having 130 × 98 × 256 zones in R, Z, q respectively. This evolution is illustrated by Movie3, above, which shows how the equatorialplane density distribution of the system changes with time: Only extraordinarily small changes are observed in this stable system. Even after five orbits, the centersofmass of the two stars and of the system as a whole essentially have not moved from their initial positions. And although a careful examination reveals very lowamplitude, epicyclic motion, quantitative measurements show that the orbits of the stars are circular to the level of two parts in 10^{4}. This provides strong confirmation that our SCF code produces excellent initial states and that the hydrodynamical equations are being integrated forward in time in a physically realistic manner.
For the past two years, we have been in a position to study masstransfer instabilities in unequalmass binaries with a degree of quantitative accuracy that heretofore has not been possible. We not only have improved on the simulation capabilities of New & Tohline (1997), but as Motl et al. (2002) detail, the epicyclic amplitude of our binary orbits are an order of magnitude smaller and angular momentum conservation is an order of magnitude better than in the equalmass binary simulations of Swesty, Wang, and Calder (2000). As Motl et al. (2002) predicted, and as our most recent simulations have demonstrated, with our current simulation tools we are able to follow masstransfer events in which the fraction of the donor's mass that is transferred is greater than or on the order of a few × 10^{3} per orbit for at least 20 orbits! We are now in a position to accurately determine masstransfer instability boundaries of the type discussed elsewhere in this project proposal.
MOVIE7 q = 0.5 stream 



Quicktime (8.5 MB) 
In the q = 0.5 evolution, the binary separation decreased for a while but after 20 orbits it appears as though the separation is beginning to increase. In contrast to this, in the q = 0.4 evolution, the binary separation steadily increases throughout the evolution but the masstransfer rate has continued to increase as well. Neither model has reached a phase of evolution where the masstransfer rate is decreasing, so neither model has yet identified the massratio below which stability against masstransfer occurs. (Recall that linear theory predicted stability at any q < 2/3.) This past year we also have begun to follow an evolution that has been the most computationally demanding, to date. It is a binary system that starts with a very low massratio, q = 0.2, and is almost certain to be in the "stable" masstransfer regime. Due to the inherent properties of such a system, however, each time step is significantly shorter than in the q = 0.4 or q = 0.5 model evolutions. (See further discussion, below.) Even after a weeklong, dedicated run on 256 tungsten processors, this model has evolved through only 2.1 orbits. A major focus of our efforts this coming year will be to push this evolution to a point where its stability against masstransfer has been confirmed; this will require evolving the system through approximately 8 additional orbits. In addition, we propose to follow both the q = 0.4 and q = 0.5 models far enough in time so that their mass ratios drop below the critical stability limit.
4. Justification of Resources Requested
i) Required Platforms
To complete the tasks outlined for this project, we are requesting an allocation of 466,000 SUs on "tungsten" (the Xeon Linux SuperCluster) at NCSA (see Table 3 for details). We feel that Tungsten is the best platform on which to conduct production runs this year because we have demonstrated good performance of our simulation code on this machine, and its basic architecture, operating system, and compilers are very similar to the LSU Linux SuperCluster – "SuperMike" – so maintainance of our code will be minimized as we move back and forth between these machines. Tungsten and SuperMike both are Intel Xeon DP processorbased systems with Myrinet 2000 interconnects. It will be an advantage for us to use Tungsten (instead of SuperMike) to carry out our most demanding model simulations primarily for two reasons: (1) Tungsten has slightly faster Intel processors than SuperMike, and tungsten's recent hardware upgrade ensures robustness of all arithmetic operations. (2) Because Tungsten has many more hardware nodes than does SuperMike, we will be much more likely to secure 256512 processors for dedicated runs on Tungsten and therefore complete our most demanding simulations in a reasonable number of days.
ii) Code Requirements and Optimization Issues
Since we have discussed code optimization issues fairly thoroughly in earlier sections of this proposal, we will not comment further on them here. From the information provided above – in §3.a.ii and Table 3, for example – we can estimate how much computing time will be required to make significant progress on this project. On 128 processors, both our q = 0.4 and q = 0.5 models require approximately 160 ns/cell/timestep and both models have been run with a grid resolution of 162´98´256 [4.1 M grid cells]. Hence, each integration timestep requires approximately 0.65 seconds. Since, at this resolution, a single binary orbit for model q = 0.4 (q = 0.5) requires approximately 100,000 (45,000) integration timesteps, extending the q = 0.4 (q = 0.5) evolution through an additional 20 orbits will require approximately 360 (160) wallclock hours, that is, approximately 46,000 (20,000) SUs.
The particular, low "q" binary masstransfer simulation that we need to pursue, however, demands somewhat higher spatial resolution in R and Z, and twice the azimuthal resolution (specifically, we have found that 194´130´512 = 12.9 M grid cells will suffice) and approximately 400,000 time steps per orbit. (More timesteps are required because the Courantlimited timestep is smaller.) Utilizing 256 processors on tungsten, this evolution has required 133 ns/cell/step, that is, 1.7 s/timestep, so each additional orbit will require approximately 190 hours (8 days), or about 50,000 SUs. To evolve this system through 8 additional orbits will therefore require 400,000 SUs. (In order to reduce the number of weeks that are required to complete this model evolution, we may run the model on 512 processors, in which case 400,000 service units may only allow us to complete 6 or 7 additional orbits.)Our largest proposed simulation, at a resolution of 194×130×512 will require approximately 12 GB of RAM. Hence, when spread across 256 or 512 processors, there will be no problem fitting the code into Tungsten's available memory.
iii) Proposed Production Runs
Table 4 summarizes the SU requirements for the three key production simulations that we have just described.
Proposed Simulations on NCSA's Xeon Linux Supercluster (Tungsten)  N_{models}  N_{orbits}  SU  

A.  Extending the q = 0.4 and q = 0.5 Model Evolutions  
Grid resolution of 162×98×256
[4.1 M cells] on
128procs: Each orbit requires 100,000 (45,000) time steps @ 160 ns/cell/timestep ⇒ 2300 (1000) SU per orbit. 
2  20  66,000  
B.  Searching for Critical Initial MassRatio: q = 0.2  
Grid resolution of 194×130×512
[12.9 M cells] on
256procs: Each orbit requires 400,000 time steps @ 133 ns/cell/timestep ⇒ 50,000 SU per orbit. 
1  8  400,000  
Total requirement.  466,000 

v) Special Requirements
Our primary simulation tools (the SCF code and the FDH code) have been developed entirely by our group at LSU and require no special packages other than standard mathematics utilities and messagepassing libraries that are available on NCSA's Linux clusters.
5. Local Computing Environment
In the astrophysics group at LSU, we have access to two, dualprocessor SGI Octanes on which we have installed Maya (TM), the AliasWavefront software that is being used to routinely render 3D images and generate movie sequences from our dynamical simulations, as explained in §2.c. Within the group, we also have acquired a half a dozen Linux PC platforms on which most of our daytoday work is done and through which we gain access to all remote computing facilities.
In 2001, the Louisiana legislature funded a new, statewide information technology initiative through which LSU received $7M in new, recurring funding. For two years, the PI (Tohline) served as interim director of the Center for Applied Information Technology and Learning (LSU CAPITAL; now called "CCT") that has been established at LSU through this initiative. A portion of the first year's funding was used to purchase a 1024processor, Linux SuperCluster, nicknamed "SuperMike": 1.8 GHz Intel P4 Xeon processors; Myrinet 2000 network; 1 GByte RAM per processor (see www.cct.lsu.edu/hpcresources/resources/helixtd.php for more details.) When this hardware system was installed in August, 2002, it was clocked at 2.2 TeraFlops on standard highperformance linpack benchmarks and it was ranked by Top500.org as the 11th fastest supercomputer in the world. As described elsewhere in this proposal, this system has proven to be a superb workhorse for most of our development efforts and some of our simulation efforts over the past couple of years. We will continue to have access to this system over the coming year and expect to perform (lower resolution) simulations as well as conduct a lot of code development on SuperMike that will complement the project described herein.
6. Other Supercomputer Support
Our group does not presently have access to any other supercomputing resources, beyond the PACI and LSU hardware architectures mentioned elsewhere in this proposal.
7. Project Team Qualifications
A major component of the PI's research work over the past twentyfive years has involved the largescale numerical simulation of multidimensional gas flows in astrophysical systems utilizing a variety of different high performance computing platforms. In the early 1980s, the PI held a postdoctoral research position in group T6 at Los Alamos National Laboratories at which time he broadened his exposure to stateoftheart CFD techniques and learned to develop efficient algorithms for vector supercomputers. From the mid1980s through the early 1990s, Tohline was PI on a large account at the Cornell Theory Center during which time he and his students experimented with the parallel implementation of CFD algorithms on the IBM 3090 and various accompanying FPS Array Processors. When an 8Knode MasPar MP1 system arrived at LSU in 1991, the PI abandoned the CTC facilities because his group was having considerable success utilizing the MasPar system for its largest simulation efforts. As has been detailed elsewhere in this proposal, in recent years the PI and his students have gained experience developing gravitational CFD algorithms for execution on a variety of parallel platforms including MasPar's MP2, the CM5, the Cray T3E, the IBM SP2 & SP3, and large Linux Clusters. The group also has developed and implemented a heterogeneous computing environment (at LSU and at the SDSC) that tightly couples its parallel simulation efforts to sophisticated visualization tools.
CoPI J. Frank has broad experience in studies of the astrophysics of interacting binary systems, particularly when such interactions involve fluid flow and accretion events. He is, for example, lead author of a widely referenced technical CUP publication entitled, "Accretion Power in Astrophysics." His expertise within this project helps ensure that the results of the proposed complex multidimensional and nonlinear simulations are physically correct and quantitatively meaningful when placed in the context of the extensive literature on this subject.
CoPI Motl is presently a postdoc in the LSU Astrophysics group. He developed and tested the earliest mpibased version of our primary CFD simulation code while he was a graduate student in our group at LSU. Subsequently, while holding postdoctoral positions at Missouri and CASA (Colorado), he worked extensively with the ENZO code (developed principally by G. Bryan in M. Norman's group) to simulate cosmological fluid flows. He brings to the group broad experience in largescale fluidflow simulations that include AMR techniques.
Mario D'Souza, Xiaomeng Peng, Ilsoon Park, and Wes Even are graduate students enrolled in the Ph.D. physics program at LSU. D'Souza, in his 6^{th} year of graduate work, will be heavily involved in the simulation and code development work outlined in this proposal as his dissertation research focuses on masstransferring binary systems. The three other students have just begun to utilize our simulation tools, but expect to become experienced users by the end of this coming year.