x86 backend now prefers MOVSBL for sign extension
GCC's i386 backend now always emits MOVSBL for QImode-to-HImode sign extensions, avoiding partial register writes and the 0x66 prefix.
The GCC i386 backend has been updated to consistently use the MOVSBL instruction for sign-extending an 8-bit value (QImode) to a 16-bit value (HImode). This change eliminates two performance-impacting issues: it avoids inefficient partial register writes, which can cause stalls, and it removes the need for the 0x66 operand size prefix. By standardizing on MOVSBL, the generated code for such sign extensions becomes both smaller and potentially faster on i386 architectures.
In Details
This commit optimizes the i386 backend's instruction selection for sign-extension from QImode (8-bit integer) to HImode (16-bit integer), specifically in config/i386/i386.md's extendqihi2 pattern. Previously, MOVSBW was used, which incurs a partial register write penalty because it only writes to the lower 16 bits of a 32-bit (or 64-bit) register, potentially requiring the CPU to merge the new value with the existing upper bits. Additionally, MOVSBW requires the 0x66 operand size prefix, adding an extra byte to the instruction. By switching to MOVSBL and targeting the 32-bit r…
For Context
When a computer program needs to convert a small number (like an 8-bit byte) into a larger number type (like a 16-bit short integer), and the small number can be negative, the computer needs to perform "sign extension." This means making sure the negative sign is correctly preserved when expanding the number. In the GCC compiler for Intel/AMD (i386) processors, there are different ways to do this. This commit changes the compiler to always use a specific instruction, MOVSBL, for this 8-bit to 16-bit conversion. This is better because the older MOVSBW instruction caused two problems: it would only write to part of a CPU register, which can slow things down, and it required an extra byte in the instruction itself. By using MOVSBL, the compiler generates more efficient machine code that can run faster and takes up less space.