Configuration¶
ESA OpenSR relies on YAML files to control every aspect of the training pipeline. This page documents the available keys and how they influence the underlying code. Use opensr_srgan/configs/config_20m.yaml and opensr_srgan/configs/config_10m.yaml as starting points.
File structure¶
A typical configuration contains the following top-level sections:
Each section maps directly to parameters consumed inside opensr_srgan/model/SRGAN.py, the dataset factory, or the training script.
Data¶
| Key | Default | Description |
|---|---|---|
train_batch_size |
12 | Mini-batch size for the training dataloader. Falls back to batch_size if set. |
val_batch_size |
8 | Batch size for validation. |
num_workers |
6 | Number of worker processes for both dataloaders. |
prefetch_factor |
2 | Additional batches prefetched by each worker. Ignored when num_workers == 0. |
dataset_type |
ExampleDataset |
Dataset selector consumed by opensr_srgan.data.dataset_selector.select_dataset. |
normalization |
'sen2_stretch' |
Normalisation strategy applied to input tensors. Accepts a string alias or a mapping (see below). |
Normalization policies¶
The :class:~opensr_srgan.data.utils.normalizer.Normalizer centralises all
normalisation logic. Pick one of the built-in aliases that matches your data:
| Method | Description |
|---|---|
sen2_stretch |
Multiply by 10/3 for a light Sentinel-2 contrast stretch. |
normalise_10k / reflectance |
Scale Sentinel-2 style 0–10000 reflectance values to [0, 1]. |
normalise_10k_signed / reflectance_signed |
Scale 0–10000 reflectance to [-1, 1] (/5000 - 1). |
normalise_s2 |
Symmetric Sentinel-2 stretch used during training (maps to [-1, 1] and back). |
zero_one |
Clamp incoming values to [0, 1] without otherwise changing them. |
zero_one_signed |
Convert [0, 1] inputs to [-1, 1] via the common tensor * 2 - 1 rule. |
identity / none |
Leave tensors unchanged (use when data is already normalised). |
Aliases such as reflectance, sentinel2, or zero_to_one map to the canonical
entries above. Call :meth:opensr_srgan.data.utils.normalizer.Normalizer.available_methods
to inspect the current list programmatically.
Using custom callables¶
When you need a bespoke policy, provide a mapping instead of a string. The normaliser will import and wrap your functions:
Data:
normalization:
name: custom
normalize: my_package.normalization:scale_to_unit
denormalize: my_package.normalization:unit_to_scale
# Optional keyword arguments applied when calling the functions
normalize_kwargs:
clip: true
The callables receive a single torch.Tensor argument and must return a tensor.
If you need to reuse the same function for both directions (for example
opensr_srgan.utils.radiometrics.normalise_10k), add normalize_kwargs /
denormalize_kwargs with the appropriate stage parameter.
Model¶
| Key | Default | Description |
|---|---|---|
in_bands |
6 | Number of input channels expected by the generator and discriminator. |
continue_training |
False |
Path to a Lightning checkpoint for resuming training (Trainer.fit(resume_from_checkpoint=...)). |
load_checkpoint |
False |
Path to a checkpoint used solely for weight initialisation (no training state restored). |
Training¶
Warm-up and adversarial scheduling¶
| Key | Default | Description |
|---|---|---|
pretrain_g_only |
True |
Enable generator-only warm-up before adversarial updates. |
g_pretrain_steps |
10000 |
Number of optimiser steps spent in the warm-up phase. |
adv_loss_ramp_steps |
5000 |
Duration of the adversarial weight ramp after the warm-up. |
label_smoothing |
True |
Replaces target value 1.0 with 0.9 for real examples to stabilise discriminator training. |
Generator EMA (Training.EMA)¶
Maintaining an exponential moving average (EMA) of the generator smooths out sharp weight updates and usually yields sharper yet
stable validation imagery. The EMA is fully optional and controlled through the Training.EMA block:
| Key | Default | Description |
|---|---|---|
enabled |
False |
Turns EMA tracking on/off. When enabled, the EMA weights automatically replace the live generator during evaluation/inference. |
decay |
0.999 |
Smoothing factor applied at every update. Values closer to 1.0 retain longer history. |
update_after_step |
0 |
Defers EMA updates until the given optimiser step. Useful when you want the generator to warm up before tracking. |
device |
null |
Stores EMA weights on a dedicated device ("cpu", "cuda:1", …). null keeps the weights on the same device as the generator. |
use_num_updates |
True |
Enables PyTorch’s bias correction so the EMA ramps in smoothly during the first few updates. |
Generator content loss (Training.Losses)¶
| Key | Default | Description |
|---|---|---|
adv_loss_beta |
1e-3 |
Target weight applied to the adversarial term after ramp-up. |
adv_loss_schedule |
cosine |
Ramp shape (linear or cosine). |
adv_loss_type |
bce |
Adversarial objective (bce for classic SRGAN logits, wasserstein for a non-saturating critic-style loss). |
r1_gamma |
0.0 |
Strength of the R1 gradient penalty applied to real images (useful with Wasserstein critics). |
l1_weight |
1.0 |
Weight of the pixelwise L1 loss. |
sam_weight |
0.05 |
Weight of the spectral angle mapper loss. |
perceptual_weight |
0.1 |
Weight of the perceptual feature loss. |
perceptual_metric |
vgg |
Backbone used for perceptual features (vgg or lpips). |
tv_weight |
0.0 |
Total variation regularisation strength. |
max_val |
1.0 |
Peak value assumed by PSNR/SSIM computations. |
ssim_win |
11 |
Window size for SSIM metrics. Must be an odd integer. |
Generator¶
| Key | Default | Description |
|---|---|---|
model_type |
SRResNet |
Generator family (SRResNet, stochastic_gan, or esrgan). |
block_type |
standard |
SRResNet variant (standard, res, rcab, rrdb, lka). Ignored for stochastic_gan/esrgan. |
large_kernel_size |
9 |
Kernel size for input/output convolution layers. |
small_kernel_size |
3 |
Kernel size for residual/attention blocks. |
n_channels |
96 |
Base number of feature channels (RRDB/ESRGAN trunk width). |
n_blocks |
32 |
Number of residual/attention blocks (RRDB count when model_type: esrgan). |
scaling_factor |
8 |
Super-resolution scale factor (2, 4, 8, ...). |
growth_channels |
32 |
ESRGAN-only: growth channels inside each RRDB block. |
res_scale |
0.2 |
Residual scaling used by stochastic/ESRGAN variants. |
out_channels |
Model.in_bands |
ESRGAN-only: override the number of output bands. |
Discriminator¶
| Key | Default | Description |
|---|---|---|
model_type |
standard |
Discriminator architecture (standard, patchgan, or esrgan). |
n_blocks |
8 |
Number of convolutional blocks. PatchGAN defaults to 3 when unspecified (ignored by esrgan). |
base_channels |
64 |
ESRGAN-only: base number of feature maps. |
linear_size |
1024 |
ESRGAN-only: hidden dimension of the fully connected head. |
use_spectral_norm |
False |
Apply spectral normalization to the SRGAN discriminator layers for improved Lipschitz control. |
Suggested settings¶
Generator presets¶
The defaults in the YAML configs intentionally balance stability and fidelity for Sentinel-2 data. Start here before performing sweeps:
- Keep
n_channelsaround 96 for residual-style backbones so feature widths match the initial convolution used by the flexible generator factory. - Depth drives detail. Begin with
n_blocks = 32for flexible variants and reduce to 16 when training budgets are tight or when using the conditional generator, which already injects stochasticity via latent noise. - Set
scaling_factoraccording to your target resolution (2×/4×/8×); all bundled generators support those values out of the box.
| Generator type | Recommended n_channels |
Recommended n_blocks |
Typical scaling_factor |
Notes |
|---|---|---|---|---|
SRResNet (block_type: standard) |
64 | 16 | 4× | Canonical baseline with batch-norm residual blocks; scale can be 2×/4×/8× as needed. |
SRResNet (block_type: res) |
96 | 32 | 4×–8× | Lightweight residual blocks without batch norm; works well for high-scale (8×) Sentinel data. |
SRResNet (block_type: rcab) |
96 | 32 | 4×–8× | Attention-enhanced residual blocks; keep depth high to exploit channel attention. |
SRResNet (block_type: rrdb) |
96 | 32 | 4×–8× | Dense residual blocks expand receptive field; expect higher VRAM use at 32 blocks. |
SRResNet (block_type: lka) |
96 | 24–32 | 4×–8× | Large-kernel attention blocks stabilise at moderate depth; drop to 24 blocks if memory bound. |
stochastic_gan |
96 | 16 | 4× | Latent-modulated residual stack; pair with noise_dim ≈ 128 and res_scale ≈ 0.2 defaults. |
esrgan |
64 | 23 | 4× | ESRGAN-style RRDB trunk; tune growth_channels (typically 32) and keep res_scale ≈ 0.2 for stability. |
Discriminator presets¶
Tune discriminator depth to match the generator capacity—too shallow and adversarial loss underfits, too deep and the training loop destabilises. These starting points mirror the architectures bundled with the repo:
| Discriminator type | Recommended depth parameter | Additional notes |
|---|---|---|
standard |
n_blocks = 8 |
Mirrors the original SRGAN CNN with alternating stride-1/stride-2 blocks before the dense head. |
patchgan |
n_blocks = 3 |
Maps to the 3-layer PatchGAN (a.k.a. n_layers); increase to 4–5 for larger crops or when the generator is particularly sharp. |
esrgan |
base_channels = 64, linear_size = 1024 |
Deep VGG-style discriminator from ESRGAN; keep base width aligned with the generator feature count. |
When adjusting these presets, scale generator and discriminator together and monitor adversarial loss ramps defined in Training.Losses to keep training stable.
Note
When you pick model_type: esrgan or stochastic_gan, SRResNet-only keys such as block_type, large_kernel_size, or small_kernel_size are automatically ignored. The model factory prints a console notice so you know which settings were overridden.
Optimisers¶
The trainer instantiates independent Adam optimisers for the generator and discriminator and enables a Two-Time-Scale Update Rule (TTUR) setup by default. The discriminator learning rate automatically defaults to a slower schedule than the generator, which keeps adversarial updates balanced without extra configuration.
| Key | Default | Description |
|---|---|---|
optim_g_lr |
1e-4 |
Learning rate for the generator Adam optimiser. |
optim_d_lr |
0.5 * optim_g_lr |
Learning rate for the discriminator. Falls back to half of the generator LR (TTUR) when not explicitly set. |
betas |
(0.0, 0.99) |
GAN-friendly Adam momentum pair that favours fast response from the second moment term while removing generator bias from the first moment. |
eps |
1e-7 |
Lower epsilon that matches common GAN recipes and prevents plateau-induced numerical noise. |
weight_decay_g |
0.0 |
Weight decay applied to generator parameters that are not normalisation affine/bias terms. |
weight_decay_d |
0.0 |
Weight decay applied to discriminator parameters that are not normalisation affine/bias terms. |
gradient_clip_val |
0.0 |
Global gradient-norm clipping threshold applied to both optimisers (set to 0 to disable). |
Weight decay exclusions are handled automatically: batch/instance/group-norm layers and bias parameters are filtered into a no-decay group so regularisation only touches convolutional kernels and dense weights. This mirrors best practices for GAN training and keeps normalisation statistics stable.
Schedulers¶
Both optimisers share the same configuration keys because they use torch.optim.lr_scheduler.ReduceLROnPlateau.
| Key | Default | Description |
|---|---|---|
metric |
val_metrics/l1 |
Validation metric monitored for plateau detection. |
metric_g |
— | Optional override for the generator scheduler monitor. |
metric_d |
— | Optional override for the discriminator scheduler monitor. |
patience_g |
100 |
Epochs with no improvement before reducing the generator LR. |
patience_d |
100 |
Epochs with no improvement before reducing the discriminator LR. |
factor_g |
0.5 |
Multiplicative factor applied to the generator LR upon plateau. |
factor_d |
0.5 |
Multiplicative factor applied to the discriminator LR upon plateau. |
cooldown |
0 |
Number of epochs to wait after an LR drop before resuming plateau checks. |
min_lr |
1e-7 |
Minimum learning rate allowed for both schedulers. |
verbose |
True |
Enables scheduler logging messages. |
g_warmup_steps |
2000 |
Number of optimiser steps used for generator LR warmup. Set to 0 to disable. |
g_warmup_type |
cosine |
Warmup curve for the generator LR (cosine or linear). |
g_warmup_steps applies a step-wise warmup through torch.optim.lr_scheduler.LambdaLR before resuming the standard
ReduceLROnPlateau schedule. Cosine warmup is smoother for most runs, but a linear ramp (especially for 1–5k steps) remains
available for experiments that prefer a steady rise. Both generator and discriminator schedulers expose Plateau parameters,
including a shared cooldown period (epochs to wait before resuming plateau checks) and a min_lr floor so the learning rate
never collapses to zero. Separate monitor keys (metric_g, metric_d) can be provided when generator and discriminator use
different validation metrics.
Logging¶
| Key | Default | Description |
|---|---|---|
num_val_images |
5 |
Number of validation batches visualised and logged to Weights & Biases each epoch. |
Tips for managing configurations¶
- Version control your YAML files. Tracking them alongside experiment logs makes it easy to reproduce results.
- Leverage OmegaConf interpolation. You can reference other fields (e.g., reuse a base path) to avoid duplication.
- Use descriptive filenames. Include dataset, scale, and generator type in the config name to keep experiments organised.
- Override selectively. When launching through scripts or notebooks, you can load a base config and override specific fields at
runtime using
OmegaConf.merge.
With a clear understanding of these fields, you can rapidly iterate on architectures, datasets, and training strategies without modifying the underlying code.