CVPR 2026

Tunable Soft Equivariance with Guarantee

A framework for injecting tunable approximate symmetry into any pretrained model. No additional parameters. Projects the weights through soft equivariant filters controlled by a single scalar b.

Md Ashiqur Rahman  ·  Lim Jun Hao  ·  Jeremiah Jiang  ·  Teck-Yian Lim  ·  Raymond A. Yeh
Five softness levels visualized: convolution kernels, feature maps of two rotated inputs, and equivariance error heatmaps
Each column represents one softness value b, from strict equivariance (b = 0, left) to the original pretrained model (b = 1, right). Top row shows patch embedding weights, 2nd and 3rd rows show final feature maps for two rotated versions of the same image, and bottom row shows the pointwise equivariance error (blue = low, red = high). At b = 0 the feature maps are nearly identical and the error is negligible; both degrade continuously and predictably as b increases.
The Problem

Two extremes, both costly

Pretrained vision models are powerful but symmetry-unaware - rotate the input and the output changes unpredictably. Strict equivariance fixes this but often imposes compute burden, does not scale and over-constraining the model in settings where only approximate symmetry holds.

Neither extreme is satisfactory: one ignores symmetry entirely; the other enforces it too rigidly at too high a cost.

Our Solution

Symmetry as a dial, not a switch

  • Project weights via soft equivariant filters.
  • Single scalar b tunes equivariance level with provable bound.
Contributions
  • General framework: plug-in soft equivariance for any model.
  • Theoretical bounds on equivariance error controlled by b.
  • Validated on ViT, DINOv2, ResNet, SegFormer; improves off-the-shelf models.
New parameters
None
Control range
b ∈ [0, 1]
Groups supported
All
Method

Weight projection with controllable equivariance error

A function F is equivariant to group G if it satisfies: F(ρX(g)x) = ρY(g)F(x). This constraints is satisfied by a small subspace of all possible functions, i.e., 0 equivariance error. We build a projection operator that can project the weights of F into different subspace with different degree of equivariance error controlled by parameter b.

Invariant filter — scalar outputs
For y = wᵀx, the Lie-algebra representation dρ(A) tells how infinitesimal action changes the vector. Decompose dρ(A) = UΣVᵀ and keep only directions ui with small singular values σi < b — these are the least action-sensitive directions. The invariant filter Binv = Σσi<b uiui projects w into that subspace.
Equivariant filter — vector outputs
For y = Wx, vectorising gives the Kronecker constraint L·vec(W) = 0 where L = dρXᵀ ⊗ I − I ⊗ dρY. The null space of L is the exact equivariant subspace. Equivariant filter Beq created following the same procedure as the invariant filter from L.
Stage 1 — Filter Construction  (once, offline)
# Build Kronecker constraint matrix:
L  =  dρ_X ⊗ Id′  −  Id ⊗ dρ_Y       ∈ ℝ^{dd′ × dd′}
# SVD of L — sort singular values ascending:
U, Σ, Vᵀ  =  svd(L)      σ₁ ≤ σ₂ ≤ … ≤ σ_{dd′}
# Build projection matrix B_eq (keep σᵢ < b):
B_eq  =  Σσᵢ < b  ui ui∈ ℝ^{dd′ × dd′}
Stage 2 — Forward Pass  (every inference / training step)
# Flatten pretrained weights to a vector:
w_flat  =  vec(W)                  W ∈ ℝ^{d′×d} → ℝ^{dd′}
# Project into the equivariant subspace:
w_proj  =  B_eq · w_flat           low-sensitivity directions only
# Reshape back and run the layer:
W_b     =  reshape(w_proj,  d′, d)
y       =  W_b · x                  output is η-soft equivariant
Theorem — Equivariance Guarantee
‖F(ρX(g)x) − ρY(g)F(x)‖‖JFF · ‖x‖ ≤ b · √(nG · rG) + εG
bSVD cut-off — the softness dial. b↓ → stricter equivariance
nGDimension of the Lie algebra
rGInjective radius of the group G
εGNormalized Taylor approximation error

The same recipe applies across architectures and groups:

1
Load pretrained backbone
ViT, ResNet, DINOv2, SegFormer, GNN or MLP
2
Wrap target layers
Wrap target modules with soft equivariant filters (see code FilteredConv2d, FilteredLinear). Projectors are computed once from the group structure.
3
Set b, optionally fine-tune
Choose b ∈ [0, 1] to fix the symmetry level. Optionally fine-tune with b held fixed to recover task performance at the chosen operating point.
Novel Contribution

Fast Schur Filter — Scalable Construction

The naive SVD approach for constructing equivariant projectors is intractable for large representations. Our Schur filter exploits real Schur decomposition to decouple the constraint matrix into small independent blocks — reducing cost by orders of magnitude while producing the identical projection. Below we explain the method with fully worked examples.

The scalability problem

The equivariance constraint vectorises to L · vec(Θ) = 0, where L = dρXᵀ ⊗ I − I ⊗ dρY. For a layer mapping d-dim input to d′-dim output, L is dd′ × dd′. SVD of L costs O((dd′)³).

For the 4th-order tensor representation T(4) of O(5): d = 5⁴ = 625 — L would have 390,625 rows × 390,625 columns. Naive SVD is completely infeasible.

The Schur insight

1
Schur decompose separately
Compute the real Schur decomposition of each Lie algebra representation independently:
X = UX ΣX UX  and  Y = UY ΣY UY
where Σ is block-diagonal with 1×1 and 2×2 blocks. Cost: O(d³) + O(d′³).
2
Change basis
Transform weights into the Schur basis: Θ′ = UYᵀ Θ UX. The joint constraint becomes ΣYΘ′ = Θ′ΣX, where ΣX, ΣY are block-diagonal.
3
Blocks decouple via Schur's lemma
Each block Θ′lk is independently constrained by a small Sylvester equation — no large SVD needed. Total cost: O(max(d, d′)³).

Complexity: O(5) T(4)

For the 4th-order tensor representation of O(5), d = 625:

Naive SVD≈ 6 × 10¹⁶ ops
L ∈ ℝ390,625 × 390,625 — infeasible in memory and compute
Schur filter≈ 2,744 ops
Max irrep block dim = 14 → 14³ = 2,744 ops — tractable in seconds

Both yield the identical projection matrix.

Schur decomposition — block structure

The decomposition dρ = U Σ Uᵀ reveals a block-diagonal Σ. Each 2×2 block has Schur value λ = √(a² + b²):

Σ = a b −b a c d −d c 0 0 S₁ S₂ λ = √(a²+b²) λ = √(c²+d²)

Real Schur Decomposition

For any real square matrix M, there exists an orthogonal U such that UMU = Σ is block-diagonal with 1×1 and 2×2 blocks. The canonical 2×2 form:

ab−ba
Schur values a ± ib,  λ = √(a² + b²)

From equivariance to Sylvester equation

The equivariance condition for a linear layer W requires:

Y(A) · W = W · dρX(A)
↓ Apply Schur: dρX = UXΣXUXᵀ,  dρY = UYΣYUY
UYΣYUYᵀ · W = W · UXΣXUX
↓ Let Θ′ = UYᵀ W UX
ΣY · Θ′ = Θ′ · ΣX  ← Sylvester equation

Schur's Lemma — the block rule

Since ΣX, ΣY are block-diagonal, this decouples into independent Sylvester equations per block pair (Tl, Sk):

Tl ≇ Sk Different Schur values

Θ′lk = 0 — forced to zero. No free parameters.

Tl ≃ Sk Matching Schur values

Θ′lk takes a constrained rotation form — 2 free parameters (α, β) per matching 2×2 pair, or 1 parameter (γ) per matching 1×1.

Soft Projection

For Blocks with λTl + λSk > : Schur lemma is followed strictly. Blocks below b remain unconstrained — equivalent to the SVD approach but computed block-by-block.

Block structure visualization

4×4 weight with two input blocks (S₁, S₂) and two output blocks (T₁, T₂):

Θ′ = S₁ Schur values a±ib S₂ Schur values c±id T₁ a±ib T₂ c±id α₁ β₁ −β₁ α₁ T₁ ≃ S₁ → 2 free params 0 T₁ ≇ S₂ → forced zero 0 T₂ ≇ S₁ → forced zero α₂ β₂ −β₂ α₂ T₂ ≃ S₂ → 2 free params

Dense 4×4 = 16 params. Equivariant = 4 params (α₁, β₁, α₂, β₂). Block sparsity is read directly from Schur value comparisons — no large SVD.

Problem: Consider a neural network layer processing 3D point cloud data. We want this layer to be equivariant with respect to rotation about the z-axis — if the input point cloud is rotated around z, the output should rotate accordingly. The weight matrix W ∈ ℝ³ˣ³ maps 3D input vectors to 3D output vectors. The z-rotation Lie algebra generator is Az.

SVD approach

Az =

Kronecker constraint L = Azᵀ ⊗ I₃ − I₃ ⊗ Az ∈ ℝ⁹ˣ⁹:

L =

Singular values: {0×3, 1×4, 2×2}. Null space → equivariant form:

Weq =
(3 free params)

With b = 1.5, the 7 vectors with σ ∈ {0, 1} are retained. Basis matrices V₁–V₇:

V₁–V₃: exactly equivariant (σ=0). V₄–V₇: mildly break equivariance (σ=1), coupling xy-plane to z-axis.

Schur approach (same result, faster)

Az is already in real Schur form (UX=UY=I₃). Blocks:

S₁ =
λ=1 S₂ =
λ=0
BlockSizeλTST≃S?Equivariant form
Θ′₁₁2×22Yes[[α,β],[−β,α]]
Θ′₁₂2×11No0
Θ′₂₁1×21No0
Θ′₂₂1×10Yesγ (scalar)

Numerical projection (b = 1.5)

Θ′₁₁ has λ=2 ≥ 1.5 → symmetrized. All other blocks λ < 1.5 → unchanged.

Θ =
W =

α=(2+4)/2=3, β=(3+1)/2=2. Top-left 2×2 constrained; rest untouched.

Both SVD and Schur yield the identical projection. Schur avoids forming L — cost O(d³) vs O(d⁶).

Setup: 4D input → 4D output

Input and output Lie algebra representations share the same Schur block structure. The full Schur forms are:

ΣX = ΣY =

Block-diagonal with two 2×2 blocks: S₁ = T₁ (Schur values a±ib) and S₂ = T₂ (Schur values c±id).

BlockSizeMatch?Constraint
Θ′₁₁ (T₁,S₁)2×2YesBoth a±ib → [[α₁,β₁],[−β₁,α₁]]
Θ′₁₂ (T₁,S₂)2×2Noa±ib vs c±id → zero
Θ′₂₁ (T₂,S₁)2×2Noc±id vs a±ib → zero
Θ′₂₂ (T₂,S₂)2×2YesBoth c±id → [[α₂,β₂],[−β₂,α₂]]

Resulting Θ′ (4×4) — block diagonal

Θ′ = S₁ (a±ib) S₂ (c±id) T₁ a±ib T₂ c±id α₁ β₁ −β₁ α₁ T₁ ≃ S₁ → 2 free params 0 forced zero 0 forced zero α₂ β₂ −β₂ α₂ T₂ ≃ S₂ → 2 free params
16 4 Dense 4×4 = 16 parameters. Block-diagonal equivariant form has only 4: (α₁, β₁, α₂, β₂).

Recovering W from Θ′

After constraining Θ′ to its equivariant form, recover the projected weight in the original basis:

W = UY · Θ′ · UX

UX, UY are the orthogonal matrices from the Schur decomposition. This basis change is computed once; the resulting W is used directly in the forward pass.

Interactive Demo

Drag each slider — watch the symmetry change

Each animation sweeps the input image through a full 360° rotation. The top row shows fb(x) — features of the input x. The bottom row shows Rθ−1fb(Rθx) — the inverse rotation applied to features of the rotated input. The bottom-left error map shows their mean discrepancy.

Rows match exactly → equivariant.  Rows diverge → symmetry broken. Each model has an independent slider — compare how different architectures respond to the same softness level.
Vision Transformer (ViT)
ViT s=0.0 ViT s=0.2 ViT s=0.6 ViT s=1.0

Final-layer feature maps under 90° rotation.

Softness b b = 0.0
0.0 0.2 0.6 1.0
b = 0.0 — Strict equivariance. Feature rows are nearly identical for every rotation angle; error map is near zero.
ResNet
ResNet s=0.0 ResNet s=0.2 ResNet s=0.6 ResNet s=1.0

Final-layer feature maps under 90° rotation.

Softness b b = 0.0
0.0 0.2 0.6 1.0
b = 0.0 — Strict equivariance. Feature rows are nearly identical for every rotation angle; error map is near zero.
DINOv3-ViT
DINOv3 s=0.0 DINOv3 s=0.2 DINOv3 s=0.6 DINOv3 s=1.0

Final-layer feature maps under 90° rotation.

Softness b b = 0.0
0.0 0.2 0.6 1.0
b = 0.0 — Strict equivariance. Feature rows match across all rotation angles; error map is uniformly black.
Beyond 2D

Examples: SO(3) and O(5)

The weight projection framework extends naturally to equivariant MLPs for scientific computing. Swapping in the group-specific Lie algebra generators (or forward differences for discrete groups) is all that changes.

Quick Start

Wrap any pretrained model in three lines

Specify the symmetry group and the control value b. No architecture changes, no new parameters, no modifications to the training objective.

Python github.com/ashiq24/soft-equivariance
from standalone.vit_soft_equivariance_standalone import monkeypatch_vitembeddings
from standalone.resnet_soft_equivariance_standalone import convert_cnn_to_filtered

filter_config = {
  "n_rotations": 4,
  "soft_thresholding": 0.2,
  "soft_thresholding_pos": 0.2,
  "group_type": "rotation",
}

# For ViT embeddings
monkeypatch_vitembeddings(model.vit.embeddings, filter_config)

# For CNNs
# convert_cnn_to_filtered(model, filter_config)

Standalone single-file demos are in standalone/. Group-specific notebooks with full derivations are in notebooks/. In code the control parameter is named softness and corresponds to b here. See the repo for training scripts and filter factory docs.

Citation

BibTeX

If this work is useful to your research, please cite:

@InProceedings{rahman2026tunable,
  author    = {Rahman, Md Ashiqur and Hao, Lim Jun and Jiang, Jeremiah
               and Lim, Teck-Yian and Yeh, Raymond A},
  title     = {Tunable Soft Equivariance with Guarantee},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer
               Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}