I386: Use vpaddq + vpermilpd for some non-const permutations
The compiler now uses `vpaddq` and `vpermilpd` to implement certain vector permutations on x86, improving performance.
GCC now uses vpaddq and vpermilpd instructions to implement non-constant permutations for V2DI and V2DF modes when targeting AVX-enabled x86. This change avoids more complex instruction sequences, resulting in faster code. New test cases have been added to verify the correctness of the new implementation.
In Details
For TARGET_AVX, ix86_expand_vec_perm now handles V2DImode and V2DFmode using vpaddq and vpermilpd for one-operand shuffles. This avoids a sequence involving vpunpcklqdq, vpand, vpsllq, vpshufb, and vpaddb. The change targets PR125357 and includes new test cases, avx-pr125357-2.c and avx2-pr125357-2.c.
For Context
Vector permutation shuffles the elements within a vector register according to a specified mask. Modern x86 processors with AVX extensions have specialized instructions for this, but generating optimal code can be tricky. This commit teaches GCC to use vpaddq (Vector Add Packed Quadword) and vpermilpd (Vector Permute with In-Lane control of Pairs of Double-Precision Floating-Point Elements) for certain common cases, resulting in faster code sequences.