AArch64: Fix SVE vec_perm for VL2048 VNx16QI.
Fixes an issue with SVE vector permutations for VL2048 that could lead to incorrect results due to truncation.
This commit addresses a correctness issue in the AArch64 SVE vec_perm instruction when used with VL2048 vectors. The vec_perm instruction has two expansions: one optimized for selectors that only refer to the first vector, and a general case using a SUB instruction. For VL2048, the SUB instruction can be truncated, leading to incorrect results when the selector should only pick from the first vector. This commit ensures the optimized expansion is used in this specific case, fixing the bug.
In Details
The fix modifies aarch64_expand_sve_vec_perm in config/aarch64/aarch64.cc to check if all indices of a variable selector refer to the first values vector. The SVE vec_perm pattern is restricted to constant VLs, and the optimized expansion uses a single TBL instruction. The fallback expansion uses a five-instruction sequence that includes a SUB of nunits and two TBLs. The issue occurs when nunits is 256 (for VL2048), causing the SUB to be truncated to zero in the general case.
For Context
SVE (Scalable Vector Extension) is an extension to the ARM architecture that allows vector lengths to be scaled at runtime. The vec_perm instruction, or vector permutation, rearranges elements within a vector according to a selector vector. This commit fixes a bug in how the compiler handles vec_perm for very long vectors (VL2048) on AArch64 processors. The issue was specific to cases where the selector vector should only have chosen elements from the first input vector, but due to an optimization gone wrong, elements from the second input vector were also included, causing incorrect results.