Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
Md Ashiqur Rahman1, Chiao-An Yang1, Michael N. Cheng1, Lim Jun Hao2, Jeremiah Jiang2, Teck-Yian Lim2, Raymond A. Yeh1
1 Purdue University 2 DSO National Laboratories
Global vs. Local Scaling
🌍 Global Scaling
Uniformly resizes the entire image . Widely studied but rarely encountered in isolation.

🎯 Local Scaling
Different objects change size independently. Only the bird changes size, everything else remains the same.

Key Contributions
Local Scale Equivariance
We model realistic, spatially-varying scale changes rather than uniform global scaling.
Monotone Scaling Group
We approximate local scaling using invertible monotonic functions that form a transformation group.
Deep Equilibrium Canonicalizer (DEC)
We introduce a novel canonicalizer using deep equilibrium models instead of explicit optimization.
Plug-and-Play Boost
Our method improves scale robustness and accuracy across models with minimal overhead.
Monotone Scaling: A Clean Approximation
1D Intuition: Stretching a Rubber Band
Consider a function \( f(x) \) on \([0,1]\). Global scaling rescales the domain uniformly:
\( R_a[f](x) = f(a^{-1} x) \)
To capture non-uniform stretch, we introduce a strictly increasing warp \( l \) and define:
\( S(f; l)(x) = f(l^{-1}(x)) \)
Intuition — imagine a rubber band: instead of pulling both ends evenly, you stretch some segments more than others, but never fold it. The derivative \( \frac{dl}{dx} \) reveals the local stretch / compression rate.


2D Extension: Warping an Image Grid
We extend 1D monotone scaling to 2D images \( I(x, y) \in [0,1]^2 \to \mathbb{R} \) by applying a smooth, invertible warp \( l(x, y) = (l_X(x, y), l_Y(x, y)) \).
\( S(I; l)(x, y) = I(l^{-1}(x, y)) \)
Here, \( l_X \) controls local horizontal scaling and \( l_Y \) controls vertical scaling. The result is a gentle deformation of the image where pixel order is preserved: grid lines bend smoothly, but never cross.
Unlike rigid global rescaling, this allows different parts of the image to stretch or compress independently, mimicking realistic viewpoint and depth-based distortions.
Theorem  The set of such 2D monotone maps with commuting, symmetric positive-definite Jacobians forms a group under composition. This means: transformations can be composed (closure), undone (invertibility), and include an identity (no change)—a perfect fit for equivariant modeling.
The Jacobian \( J_l(x, y) \) describes how locally the grid is scaled. If each local scaling stretches smoothly and consistently (i.e. SPD Jacobians that commute), then stacking such scalings remains valid and invertible.
Analogy: Imagine your image as an elastic fabric. Monotone scaling is like pulling this fabric gently in different directions without tearing or folding it. No matter how many times you apply such a stretch, the fabric remains smooth and untangled.


Deep Equilibrium Canonicalizer (DEC)
To handle local scaling equivariantly, we seek a canonical representation of features under spatial warps. Rather than solving an optimization per input, we use a Deep Equilibrium Model to find a fixed point:
\( \Phi_k = g_\theta(F_k, \Phi_k) \)
This fixed point approximates the solution to an energy minimization: \[ \Phi_k \approx \arg\min_\Phi \mathcal{E}(F_k; \Phi) \] where \( \Phi_k \) is a warp field that brings \( F_k \) into a canonical, scale-normalized frame.
The network then performs its usual processing on the canonicalized feature \( \hat{F}_k = S^{-1}(F_k; \Phi_k) \), and re-applies \( \Phi_k \) afterward to preserve equivariance downstream.

Qualitative Results


For full quantitative results, see our ICCV 2025 paper.