Enable AVX-512 epilogue optimisations for ZNVER6.

The compiler can now use AVX-512 optimisations that avoid zeroing registers at function ends on AMD Zenver6 CPUs.

By Richard Biener April 28, 2026 committed

The AVX-512 masked and two-epilogue optimisations were enabled for the upcoming AMD Zenver6 core, which allows the compiler to generate more efficient code when using AVX-512 intrinsics, especially at the end of functions where registers need to be cleared. This can result in smaller code size and faster execution times for AVX-512 heavy workloads.

In Details

The AMD ZNVER6 architecture now has the avx512_two_epilogues and avx512_masked_epilogues tuning flags set, enabling these AVX-512-aware epilogue optimisations. These flags select alternative code sequences at function return, avoiding explicit zeroing of upper register portions when AVX-512 is in use. This is handled in emit_stack_restore in function.cc, triggered by TARGET_AVX512_TWO_EPILOGUES and TARGET_AVX512_MASKED_EPILOGUES.

For Context

Modern CPUs have special instructions to perform multiple calculations at once (SIMD). These instructions use wide registers that can hold multiple values. AVX-512 is a set of SIMD instruction extensions for x86 CPUs. This commit enables instructions that optimise how the compiler manages these wide registers, specifically at the end of functions. The compiler can avoid having to set the unused bits of the registers to 0 when returning from a function, which reduces code size and improves performance. This is important for newer AMD CPUs that support the AVX-512 instruction set.

Filed Under: gccx86avx-512optimization

View Commit →