autocast.encoders.dc

autocast.encoders.dc#

class DCEncoder(in_channels, out_channels, hid_channels=(64, 128, 256), hid_blocks=(3, 3, 3), kernel_size=3, stride=2, pixel_shuffle=True, norm='layer', attention_heads=None, ffn_factor=1, spatial=2, patch_size=1, periodic=False, dropout=None, checkpointing=False, identity_init=True, ffn_out_scale=None, saturation=None, saturation_scale=5.0)[source]#

Bases: EncoderWithCond

Deep Compressed (DC) encoder module.

Progressively downsamples input to latent representation using residual blocks with optional attention.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of output (latent) channels.
hid_channels (Sequence[int]) – Number of channels at each depth level.
hid_blocks (Sequence[int]) – Number of residual blocks at each depth level.
kernel_size (int | Sequence[int]) – Kernel size for convolutions.
stride (int | Sequence[int]) – Stride for downsampling operations.
pixel_shuffle (bool) – Whether to use pixel shuffling (patchify) for downsampling.
norm (str) – Type of normalization (‘layer’ or ‘group’).
attention_heads (dict[int, int] | None) – Dict mapping depth index to number of attention heads.
ffn_factor (int) – Channel expansion factor in FFN blocks.
spatial (int) – Number of spatial dimensions (2 for 2D, 3 for 3D).
patch_size (int | Sequence[int]) – Patch size for patchifying at the start.
periodic (bool) – Whether spatial dimensions are periodic (use circular padding).
dropout (float | None) – Dropout rate.
checkpointing (bool) – Whether to use gradient checkpointing.
identity_init (bool) – Initialize down/upsampling convolutions as identity.
ffn_out_scale (float | None) – Optional multiplicative scale applied to each ResBlock FFN output conv.
saturation (str | None) – Optional latent saturation mode. Supported: {“softclip2”, “softclip”, “tanh”, “arcsinh”, “rmsnorm”}.
saturation_scale (float) – Saturation scale B used by soft clipping/tanh variants.

Note

Based on the implementation from: - Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models (Chen et al., 2024), https://arxiv.org/abs/2410.10733v1 - Lost in Latent Space: An Empirical Study of Latent Diffusion Models

for Physics Emulation (Rozet et al., 2024), https://arxiv.org/abs/2507.02608, PolymathicAI/lola

channel_axis: int = -1#

encoder_model: Module#

encode(batch)[source]#

Encode input batch to latent representation.

Parameters:: batch (Batch) – Input batch containing input_fields with shape (B, T, spatial…, C_i).
Returns:: Encoded latent tensor with shape (B, T, spatial_reduced…, C_o).
Return type:: Float[Tensor, ‘batch time spatial *spatial channel’]

encode_tensor(x)[source]#

Forward pass through encoder (for direct tensor input).

Parameters:: x (Float[Tensor, 'batch time spatial *spatial channel']) – Input tensor with shape (B, T, spatial..., C_i).
Returns:: Encoded latent tensor.
Return type:: Float[Tensor, ‘batch time spatial *spatial channel’]

autocast.encoders.dc

Contents

autocast.encoders.dc#