Fold `popcount(x ^ (x - 1))` to `ctz(x) + 1` when supported
GCC now transforms `popcount(x ^ (x - 1))` into `ctz(x) + 1` when compiling for architectures that directly support the count trailing zeros instruction.
The compiler can now replace popcount(x ^ (x - 1)) with ctz(x) + 1 during optimization. This transformation applies when compiling for architectures that directly support the count trailing zeros (ctz) instruction. The change improves performance by utilizing a specialized instruction, where available, to compute the number of trailing zeros in x.
In Details
GCC's match.pd uses pattern matching to simplify expressions during compilation. This commit adds a rule to fold popcount(x ^ (x - 1)) to ctz(x) + 1 when x is nonzero and CTZ is directly supported. The transformation leverages the property that x ^ (x - 1) generates a mask consisting of the trailing zeros of x plus the least significant bit, meaning that the popcount of the result is equivalent to ctz(x) + 1.
For Context
Compilers often perform optimizations by recognizing common patterns in code and replacing them with equivalent but more efficient expressions. This commit introduces a new pattern-matching rule in GCC. Specifically, when the compiler encounters the C++ code pattern popcount(x ^ (x - 1)), it can now replace this with ctz(x) + 1. The popcount function counts the number of set bits, ^ is the exclusive OR operator, and ctz counts the number of trailing zeros. This optimization relies on the target architecture having an instruction to count trailing zeros, which is then one-upped.