pru: Inline muldi3 when optimizing for speed

Inlines 32-bit multiplication operations for PRU when optimizing for speed, improving performance.

By Dimitar Dimitrov March 25, 2026 committed

When optimizing for speed, the compiler will now inline 32-bit multiplication sub-operations instead of calling a library function, saving instruction cycles at the cost of increased text section size. This optimization defines new patterns for umulsidi3 and muldi3, avoiding the overhead of function calls.

In Details

This commit introduces inlining for muldi3 when optimizing for speed in the PRU backend, achieved by adding new patterns for umulsidi3 and muldi3 in pru.md. Constraint changes in constraints.md prevent allocating r27 as the SImode destination for the mulsi3 pattern. The register class MULDST_REGS is expanded in pru.h to accommodate DImode. This optimization trades code size for speed by avoiding function call overhead.

For Context

The PRU (Programmable Real-time Unit) is a coprocessor often used in embedded systems where speed is critical. Multiplication of 32-bit numbers can be done either by calling a general-purpose library function, or by inserting the multiplication instructions directly into the code (inlining). This commit changes the compiler to insert the instructions directly when the programmer asks for the fastest possible code. This makes the code run faster, but it also makes the compiled program a little bit bigger.

Filed Under: pruoptimizationinlining

View Commit →