GCC Newspaper
JUNE 15, 2026
Date
/
Architectures
Components
Topics
News & Policy
Other
gcc Performance Win

Simplifies vec_duplicates of vec_duplicates

This commit simplifies nested `vec_duplicate` operations in GCC's RTL, folding them into a single `vec_duplicate` to improve optimization.

GCC’s RTL simplifier now optimizes expressions that involve duplicating replicated vectors. Previously, a vec_duplicate of another vec_duplicate could occur unnecessarily. This change modifies simplify_context::simplify_unary_operation_1 to detect and fold these nested operations into a single vec_duplicate, improving the efficiency of the intermediate representation and potentially affecting downstream optimizations. This was primarily observed and fixed in AArch64 SVE specific patterns.

In Details

The simplify_context::simplify_unary_operation_1 function in simplify-rtx.cc now handles nested vec_duplicate RTL expressions. Previously, a VEC_DUPLICATE from a source that was itself a VEC_DUPLICATE might not be optimally represented. This patch explicitly folds such nested structures into a single VEC_DUPLICATE operation, leveraging the fact that vec_duplicate_p was initially designed for scalar elements. This simplification affects patterns like *aarch64_vec_duplicate_subvector and contributes to cleaner RTL prior to subsequent optimization passes.

For Context

Inside GCC, code is represented using an intermediate language called RTL. When you have code that creates a vector by copying a single value multiple times, and then another operation that copies that entire vector multiple times to make an even larger vector, the compiler might represent this inefficiently as two separate "duplicate vector" operations. This change teaches the compiler to recognize these nested duplication operations and combine them into a single, more efficient "duplicate vector" operation. This makes the internal representation of the code simpler, which can help the compiler generate faster and smaller machine code, especially for vectorized operations like those found in AArch64 SVE.

Filed Under: optimizationrtlaarch64