SLP vectorizer improves error recovery by prioritizing operand swaps.

The SLP vectorizer in GCC now prioritizes attempting operand swaps during child discovery to prevent premature scalar fallback and improve vectorization succes…

By Zhongyao Chen June 3, 2026 committed

GCC’s SLP vectorizer now includes an improved strategy for handling failed child operand discovery. Previously, failures could lead to an immediate fallback to scalar operations, even if swapping operands might have resolved the issue. The compiler now tracks the distance to potential swap opportunities, ensuring that a retry with swapped operands is attempted before resorting to less efficient scalar code generation. This change enhances the vectorizer’s ability to optimize commutative operations within basic blocks, potentially leading to more vectorized code.

In Details

This commit refines the Superword Level Parallelism (SLP) vectorizer within tree-vect-slp.cc, specifically targeting vect_build_slp_tree_2. When attempting to build the operand zero of a commutative Basic Block (BB) SLP node, a failed child discovery could prematurely trigger a scalar fallback, bypassing a potential fix via operand swapping. The new least_upthread_swappable_op_distance mechanism tracks the proximity of operand swap opportunities. By only allowing scalar fallback when this distance is greater than one, the vectorizer ensures that an operand swap retry is attempted first,…

For Context

Compilers often try to optimize code by using a technique called "vectorization." This involves taking operations that happen on individual data elements (scalars) and converting them into operations that work on multiple data elements simultaneously (vectors), which modern CPUs can execute much faster. One particular vectorization strategy is Superword Level Parallelism (SLP), which looks for groups of independent scalar operations that can be combined into a single vector instruction. This commit improves how GCC's SLP vectorizer handles situations where it's trying to combine operations but initially fails to find a suitable match for a part of the operation. Previously, it might have given up too quickly and just used slower, individual operations. Now, it's smarter: if it recognizes that simply swapping the order of the inputs to an operation could fix the problem, it will try that swap before resorting to the slower method. This means GCC can potentially vectorize more of your c…

Filed Under: optimizationvectorizationslpperformance

View Commit →