Attachment C1 " MFlops, as Measured by 'pat' " This performance report was generated using "pat," a performance analysis tool that accesses the hardware performance monitors on the T3E. Although "pat" can be used to probe a number of different aspects regarding an individual code's performance, here we have used the tool to specifically measure "FpOps," that is, information pertaining only to floating point operations. Performance measurements are reported here for the identical code run: -- on two separate computing platforms (the T3E_600 at SDSC, and the T3E_900 at the NAVO MSRC); -- on two different grid sizes: one using power-of-2 arrays (64x32x32) and another without power-of-2 arrays (67x34x32); -- with two separate compilers (F90 and PGHPF); -- at the NAVO MSRC, with "streams = OFF" and with "streams = ON". ----------------------------------------------------------------------------- SDSC (T3E_600) -------------- F90 Compiler 64x32x32 -------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 752089.75 18053.32 7.20 5378.23 2.15 67x34x32 -------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 279151.91 20079.69 21.58 3609.80 3.88 ----------------------------------------------------------------------------- SDSC (T3E_600) -------------- PGHPF Compiler 64x32x32 -------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 317901.78 13553.05 12.79 5080.06 4.79 67x34x32 -------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 346503.19 15080.46 13.06 5693.37 4.93 ----------------------------------------------------------------------------- NAVOCEANO MSRC (T3E_900) ------------------------ F90 Compiler size = 16x16x128 --------------- Streams off ----------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 227216.36 11638.58 23.05 2690.08 5.33 Streams on ----------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 154847.20 11638.58 33.83 2685.32 7.80 size = 64x32x32 --------------- Streams off ----------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 1049616.75 18053.32 7.74 5309.15 2.28 Streams on ----------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 917836.50 18053.32 8.85 5307.61 2.60 size = 67x34x32 --------------- Streams off ----------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 363185.19 20079.69 24.88 3640.89 4.51 Streams on ---------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 236645.63 20079.69 38.19 3633.72 6.91 ---------------------------------------------------------------------------------- NAVOCEANO MSRC (T3E_900) ------------------------ PGHPF Compiler size = 64x32x32 --------------- Streams off ----------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 404864.66 13553.05 15.07 4936.16 5.49 Streams on ---------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 264287.38 13553.05 23.08 4957.90 8.44 size = 67x34x32 --------------- Streams off ----------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 440551.66 15080.46 15.41 5523.72 5.64 Streams on ---------- Performance counters for FpOps Values given are in MILLIONS. PE cycles operations ops/sec dcache misses/sec misses 0 291136.25 15080.46 23.31 5536.29 8.56 -----------------------------------------------------------------------------