Allow Single-Lane SLP Fallback When Limit Is Exhausted
GCC now allows single-lane SLP vectorization to proceed even when the multi-lane SLP discovery limit is exhausted, fixing missed optimizations in loops with co…
GCC’s SLP vectorizer now attempts single-lane vectorization even after the multi-lane vectorization limit is reached. Previously, the limit check blocked all SLP discovery, including single-lane fallbacks, leading to missed optimizations in code with multiple independent conditional reductions. This change improves performance (3.8% on EMR, 1.4% on Znver5 for 731.astcenc_r with -Ofast).
In Details
Scalar replacement via SLP (straight-line-program) vectorization groups scalars into vectors. vect_analyze_slp_reduction previously bailed out early if the SLP discovery limit was exhausted, blocking single-lane SLP. The fix moves the limit check to only guard chain analysis, as single-lane trees don't cause exponential growth. The interaction with IPA is that missed SLP optimizations can prevent inlining, but a toolchain dev outside SLP might not know the nuance of the single-lane fallback path.
For Context
SLP vectorization in GCC identifies opportunities to perform the same operation on multiple data elements simultaneously using SIMD instructions — e.g. replacing four float additions with a single vector addition. This commit fixes a limitation in the SLP vectorization process. The vectorizer has a limit on how much it will search; after reaching this limit, it would fail to vectorize even when a simpler, single-lane vectorization was possible. This change allows single-lane vectorization to proceed even after the limit is reached, improving performance.