RFC: GCC support for masked compress stores and VPCOMPRESS-style codegen

Project / Subsystem

gcc / rfc

Date

2026-05-14

Proposer

Raghesh Aloor <Raghesh.Aloor@amd.com>

Source type

public_inbox

Consensus

Proposed

Sentiment

/10

Technical tradeoffs

  • Increased compiler complexity to recognize and optimize the specific loop pattern.
  • Potential for increased code size if VPCOMPRESS is not profitable in all cases.
  • Requires careful tuning of heuristics to determine when VPCOMPRESS is beneficial.
  • May introduce new dependencies on AVX-512 instruction set.

All attributes

project
gcc
subsystem
rfc
patch_id
discussion_id
20260514062146.3464666-1-Raghesh.Aloor@amd.com
source_type
public_inbox
title
RFC: GCC support for masked compress stores and VPCOMPRESS-style codegen
headline
RFC: GCC support for masked compress stores and VPCOMPRESS codegen
tldr
Proposes extending GCC's loop vectorizer to generate AVX-512 VPCOMPRESS instructions for predicated stores into a buffer.
proposer
Raghesh Aloor <Raghesh.Aloor@amd.com>
consensus
Proposed
outcome
proposed
sentiment_score
technical_tradeoffs
  • Increased compiler complexity to recognize and optimize the specific loop pattern.
  • Potential for increased code size if VPCOMPRESS is not profitable in all cases.
  • Requires careful tuning of heuristics to determine when VPCOMPRESS is beneficial.
  • May introduce new dependencies on AVX-512 instruction set.
series_id
series_role
standalone
series_parts
[]
tags
  • gcc
  • loop vectorization
  • AVX-512
  • VPCOMPRESS
  • code generation
bugzilla_url
date
2026-05-14T00:00:00.000Z

RFC: GCC support for masked compress stores and VPCOMPRESS-style codegen

This RFC proposes extending GCC’s loop vectorizer to recognize loops with predicate-guarded stores into a buffer with an offset incremented under the same predicate, enabling the backend to emit AVX-512 VPCOMPRESS instructions when profitable. This change addresses PR tree-optimization/91198, where GCC failed to generate AVX-512 compress/expand instructions in relevant cases. The initial prototype patch for phase 1 will be sent to the gcc-patches list soon, with the aim of improving performance on code that can benefit from masked compress stores.