Improving the scalabiliy of neutron cross-section lookup codes on multicore NUMA system
We use the XSBench proxy application, a memory-intensive OpenMP program, to explore the source of on-node scalability degradation of a popular Monte Carlo (MC) reactor physics benchmark on non-uniform memory access (NUMA) systems. As background, we present the details of XSBench, a performance abstraction "proxy app" for the full MC simulation, as well as the internal design of the Linux kernel. We explain how the physical memory allocation inside the kernel affects the multicore scalability of XSBench. On a sixteen-core, two-socket NUMA testbed, the scaling efficiency is improved from a nonoptimized 70 optimized 95 nonoptimized version. In addition to the NUMA optimization we evaluate a page-size optimization to XSBench and observe a 1.5x performance improvement, compared with a nonoptimized one.
READ FULL TEXT