GCC Newspaper
JUNE 15, 2026
Date
/
Architectures
Components
Topics
News & Policy
Other
gcc Performance Win

x86_64: Improves Cost Estimation for SSE 128-bit Rotations

Corrects the cost estimation for 128-bit rotations using SSE instructions on x86_64, fixing a performance regression in code generation.

The cost estimation for 128-bit rotations using SSE instructions on x86_64 architectures was tweaked. The previous cost was underestimated, leading to suboptimal code generation, and causing a test case (gcc.target/i386/rotate-2.c) to fail when compiled with -march=cascadelake. The corrected cost reflects the actual number of instructions required for the rotation, resolving the performance regression.

In Details

This commit modifies compute_convert_gain in config/i386/i386-features.cc to correct the cost of 128-bit rotations using SSE. The previous cost, COSTS_N_INSNS(1), was likely a typo and has been updated to COSTS_N_INSNS(4) to reflect the 4-5 instructions actually required (shuffles, shifts, and OR). This change fixes a regression introduced by recent STV improvements.

For Context

This commit addresses a performance issue related to how GCC generates code for x86 processors using SSE (Streaming SIMD Extensions) instructions. SSE allows the processor to perform the same operation on multiple data elements simultaneously. This change specifically corrects the estimated cost of rotating 128-bit values using SSE, ensuring that GCC makes better decisions about when to use these instructions, leading to faster code execution particularly on newer microarchitectures.

Filed Under: x86_64sseoptimizationcode generation