The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V

by   Christopher Celio, et al.

This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the simplicity and cost-effectiveness that underpins the original RISC goals. We begin by comparing the dynamic instruction counts and dynamic instruction bytes fetched for the popular proprietary ARMv7, ARMv8, IA-32, and x86-64 Instruction Set Architectures (ISAs) against the free and open RISC-V RV64G and RV64GC ISAs when running the SPEC CINT2006 benchmark suite. RISC-V was designed as a very small ISA to support a wide range of implementations, and has a less mature compiler toolchain. However, we observe that on SPEC CINT2006 RV64G executes on average 16 than IA-32, 9 ARMv7. CISC x86 implementations break up complex instructions into smaller internal RISC-like micro-ops, and the RV64G instruction count is within 2 retired micro-op count. RV64GC, the compressed variant of RV64G, is the densest ISA studied, fetching 8 observed that much of the increased RISC-V instruction count is due to a small set of common multi-instruction idioms. Exploiting this fact, the RV64G and RV64GC effective instruction count can be reduced by 5.4 compressed RISC-V ISA extension with macro-op fusion provides both the densest ISA and the fewest dynamic operations retired per program, reducing the motivation to add more instructions to the ISA. This approach retains a single simple ISA suitable for both low-end and high-end implementations, where high-end implementations can boost performance through microarchitectural techniques.


Extending the RISC-V ISA for exploring advanced reconfigurable SIMD instructions

This paper presents a novel, non-standard set of vector instruction type...

Revec: Program Rejuvenation through Revectorization

Modern microprocessors are equipped with Single Instruction Multiple Dat...

InstructEval: Systematic Evaluation of Instruction Selection Methods

In-context learning (ICL) performs tasks by prompting a large language m...

Variable Instruction Fetch Rate to Reduce Control Dependent Penalties

In order to overcome the branch execution penalties of hard-to-predict i...

Realize special instructions on clustering VLIW DSP: multiplication-accumulation instruction

BWDSP is a 32bit static scalar digital signal processor with VLIW and SI...

Understanding Evolutionary Potential in Virtual CPU Instruction Set Architectures

We investigate fundamental decisions in the design of instruction set ar...

Puppeteer: A Random Forest-based Manager for Hardware Prefetchers across the Memory Hierarchy

Over the years, processor throughput has steadily increased. However, th...

Please sign up or login with your details

Forgot password? Click here to reset