GCC Newspaper
JUNE 15, 2026
Date
/
Architectures
Components
Topics
News & Policy
Other
gcc/arm Performance Win

Improved RTX costs from -mthumb on ARM.

Fixes wildly inaccurate instruction costs for Thumb-1, cutting generated code from 11 to 5 instructions in some shifts.

The ARM backend’s thumb1_rtx_costs function returned unrealistic costs for basic operations like addition, shifts, and bitwise ops, causing the optimiser to select terrible instruction sequences. This patch overhauls costs for PLUS, MINUS, shifts, and logical ops across SI/DI/HI/QImode, bringing them into line with what the Thumb-1 ISA actually supports. A 64-bit shift-add sequence that previously compiled to 11 instructions now emits 5, because the optimiser can finally see that a left-shift-by-one is cheaper than repeatedly adding and subtracting. The fix also addresses a host compiler warning in comp_not_to_clear_mask_str_un.

In Details

The ARM backend's thumb1_rtx_costs hook feeds the middle-end's cost model during RTL optimisation passes like combine and CSE. Thumb-1 is the original 16-bit encoding with an 8-register subset and restricted immediate forms; it lacks many Thumb-2 features (32-bit encodings, wide immediates, IT blocks). Bad costs here meant that expensive multi-instruction expansions looked cheaper than single Thumb-1 insns, breaking idiom recognition and causing the optimiser to emit redundant move/add/sub chains instead of simple shifts or add-with-carry.

For Context

Modern compilers use a cost model to decide between equivalent instruction sequences: should we expand (a << 33) + a as a shift-left plus an add, or synthesise it with repeated additions? The backend's RTX cost hook assigns a numeric score to each intermediate operation so the optimiser can pick the cheapest path. ARM Thumb-1 is a compact 16-bit instruction encoding designed for microcontrollers; it has only 8 accessible registers and limited immediate operands compared to full 32-bit ARM or the later Thumb-2 hybrid. If the cost function returns wildly wrong numbers—say, claiming that a shift is more expensive than four add/subtract pairs—the optimiser will generate bloated sequences that the hardware never needed. This patch corrects those scores across all scalar integer modes (byte, halfword, word, doubleword), ensuring that GCC's instruction selection matches what Thumb-1 silicon can actually execute efficiently.

Filed Under: armthumb-1rtx-costscode-generationoptimization