Attachment C1 " MFlops, as Measured by 'pat' " This performance report was generated using "pat," a performance analysis tool that accesses the hardware performance monitors on the T3E. Although "pat" can be used to probe a number of different aspects regarding an individual code's performance, here we have used the tool to specifically measure "FpOps," that is, information pertaining only to floating point operations. Given that the implementation of "pat" on the T3E-600 at SDSC has been reported to sometimes give erroneous results we have also performed ourexperiments on another T3E. Performance measurements are reported here for the fluid dynamics code only run on a single processor and compiled with either the Portland Group's HPF compiler or Cray's f90 compiler. We examined the impact of the following factors on our single processor performance: -- on two separate computing platforms (the T3E-600 at SDSC, and the T3E-900 at the NAVO MSRC); -- on two different grid sizes: one using power-of-2 arrays (64x32x32) and another without power-of-2 arrays (67x34x32); -- with the exception of the HPF code at SDSC we performed experiments with both "streams = OFF" and with "streams = ON". -- for the f90 code we examined the speedup gained by using 32 bit words as the default size thus effectively doubling the cache size. ----------------------------------------------------------------------------- SDSC (T3E_600) -------------- PGHPF Compiler size = 64x32x32 ---------------- streams OFF ---------------- -O3 ---------------- 64 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 317901.78 13553.05 12.79 5080.06 4.79 size = 67x34x32 ---------------- streams OFF ---------------- -O3 ---------------- 64 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 346503.19 15080.46 13.06 5693.37 4.93 Cray f90 Compiler size = 64x32x32 -------------------- streams ON -------------------- -O3,aggress -lmfastv -------------------- 64 bit word size -------------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 147062.13 7649.54 15.61 2513.46 5.13 size = 67x34x32 --------------------- streams ON --------------------- -O3, aggress -lmfastv --------------------- 64 bit word size --------------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 70995.41 8540.94 36.09 1851.38 7.82 size = 64x32x32 --------------------- streams OFF --------------------- -O3, aggress -lmfastv --------------------- 64 bit word size --------------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 147065.81 7649.54 15.61 2513.26 5.13 size = 67x34x32 --------------------- streams OFF --------------------- -O3, aggress -lmfastv --------------------- 64 bit word size --------------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec 0( 0) 103220.18 8540.94 24.83 1839.07 5.35 size = 64x32x32 ---------------- streams ON ---------------- -O3, aggress ---------------- 32 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec 0( 0) 104569.15 7574.19 21.73 2167.01 6.22 size = 67x34x32 ---------------- streams ON ---------------- -O3, aggress ---------------- 32 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec 0( 0) 58931.29 8457.61 43.06 1331.20 6.78 size = 64x32x32 ---------------- streams OFF ---------------- -O3, aggress ---------------- 32 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 120580.14 7574.19 18.85 2172.66 5.41 size = 67x34x32 ---------------- streams OFF ---------------- -O3, aggress ---------------- 32 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 74439.10 8457.61 34.09 1334.34 5.38 ---------------------------------------------------------------------------------- NAVOCEANO MSRC (T3E_900) ------------------------ PGHPF Compiler size = 64x32x32 ---------------- streams OFF ---------------- -O3 ---------------- 64 bit word size ---------------- Performance counters for FpOps PE cycles operations ops/sec dcache misses/sec misses 0 404864.66 13553.05 15.07 4936.16 5.49 size = 64x32x32 ---------------- streams ON ---------------- -O3 ---------------- 64 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 264287.38 13553.05 23.08 4957.90 8.44 size = 67x34x32 ---------------- streams OFF ---------------- -O3 ---------------- 64 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 440551.66 15080.46 15.41 5523.72 5.64 size = 67x34x32 ---------------- streams ON ---------------- -O3 ---------------- 64 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 291136.25 15080.46 23.31 5536.29 8.56 Cray f90 Compiler size = 64x32x32 --------------------- streams ON --------------------- -O3, aggress -lmfastv --------------------- 64 bit word size --------------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 144570.08 7652.70 23.82 2390.66 7.44 size = 67x34x32 --------------------- streams ON --------------------- -O3, aggress -lmfastv --------------------- 64 bit word size --------------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 81835.32 8543.85 46.99 1762.50 9.69 size = 64x32x32 --------------------- streams OFF --------------------- -O3, aggress -lmfastv --------------------- 64 bit word size --------------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 197358.50 7652.70 17.45 2399.36 5.47 size = 67x34x32 --------------------- streams OFF --------------------- -O3, aggress -lmfastv --------------------- 64 bit word size --------------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 137073.91 8543.85 28.05 1742.07 5.72 size = 64x32x32 ---------------- streams ON ---------------- -O3, aggress ---------------- 32 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 130190.28 7577.74 26.19 2142.20 7.41 size = 67x34x32 ---------------- streams ON ---------------- -O3, aggress ---------------- 32 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 64241.70 8460.40 59.27 1309.46 9.17 size = 64x32x32 ---------------- streams OFF ---------------- -O3, aggress ---------------- 32 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 155566.05 7577.74 21.92 2129.07 6.16 size = 67x34x32 ---------------- streams OFF ---------------- -O3, aggress ---------------- 32 bit word size ---------------- Performance counters for FpOps Values given are in MILLIONS. PE(id) cycles operations ops/sec dcache misses/sec misses 0( 0) 91529.52 8460.40 41.60 1305.65 6.42