i386: Disable gather optimizations for Diamond Rapids

Gather optimizations are now disabled for Diamond Rapids processors to improve pipeline utilization.

By liuhongt May 26, 2026 committed

On Diamond Rapids (DMR) architecture, gather emulation achieves optimal pipeline utilization and parallelism with 2/4-element vectors This commit disables the use_gather_2parts and use_gather_4parts optimizations for the Diamond Rapids architecture. This adjustment aims to improve performance on DMR by using more efficient gather implementations.

In Details

This commit modifies x86-tune.def to disable X86_TUNE_USE_GATHER_2PARTS and X86_TUNE_USE_GATHER_4PARTS for m_DIAMONDRAPIDS. The tuning definitions control code generation strategies for different x86 microarchitectures. The change suggests that the compiler's default gather implementations are more efficient than the 2/4-part versions on DMR.

For Context

The GCC compiler can generate different code sequences based on the specific type of x86 processor it is targeting. These CPU-specific optimizations are controlled by tuning definitions. This commit disables specific gather optimizations (use_gather_2parts and use_gather_4parts) for Diamond Rapids processors, implying that a different code sequence is more efficient for this particular microarchitecture.

Filed Under: i386optimizationdiamond rapidsgather

View Commit →