Codestitcher: Inter-Procedural Basic Block Layout Optimization
Modern software executes a large amount of code. Previous techniques of code layout optimization were developed one or two decades ago and have become inadequate to cope with the scale and complexity of new types of applications such as compilers, browsers, interpreters, language VMs and shared libraries. This paper presents Codestitcher, an inter-procedural basic block code layout optimizer which reorders basic blocks in an executable to benefit from better cache and TLB performance. Codestitcher provides a hierarchical framework which can be used to improve locality in various layers of the memory hierarchy. Our evaluation shows that Codestitcher improves the performance of the original program by 3% to 25% (on average, by 10%) on 5 widely used applications with large code sizes: MySQL, Clang, Firefox, Apache, and Python. It gives an additional improvement of 4% over LLVM's PGO and 3% over PGO combined with the best function reordering technique.
READ FULL TEXT