IEEE GRSL (Q1) PUBLISHED RESEARCH · VNU INTERNATIONAL SCHOOL

FORECAST THE RESERVOIR.
FROM ORBIT.
WITHIN 2.4 CENTIMETRES.

Colubrid-Net is a cross-modal deep learning framework that fuses Sentinel-2 satellite imagery with daily hydrological records to predict the next-day water level of the An Khe Reservoir — cutting forecast error by 79.3% against the previous published best.

True-color Sentinel-2 scene of the An Khe reservoir, February 2021 — SENTINEL-2 · TRUE COLOR · 2021_02

Binary water-extent mask of the An Khe reservoir derived by the model — BINARY WATER MASK · WHAT THE NETWORK SEES

2022 · OBSERVED vs FORECAST— OBSERVED - - PREDICTED

0.0242 mMAE

0.969R²

0.972KGE

7.5MPARAMETERS

WHY CENTIMETRES MATTER

Vietnam's monsoon delivers roughly 80% of the year's rainfall in just six months. The country's 818 hydroelectric facilities supply 28.4% of national generating capacity, and every one of them faces the same daily decision: hold water for power, or release it for safety.

The An Khe–Ka Nak complex (173 MW) on the Ba River guards the An Khe plain in Gia Lai Province. During the September–December flood season, mean daily inflow runs 3–4× the annual average — and operational forecasting errors of several tens of centimetres over multi-day horizons are currently accepted as unavoidable at Vietnamese plants.

SHALLOW SPATIAL ENCODERS

Existing multi-modal reservoir models rely on generic pre-trained CNNs (typically VGG) that were never designed for multispectral satellite data.

A FREQUENCY MISMATCH

Satellites observe monthly; gauges record daily. No prior work resolved this gap in an end-to-end trainable way.

SINGLE-METRIC BLINDNESS

Most studies report one accuracy number, masking whether a model gets the amplitude and timing of floods right.

“A forecast consistently within 2–3 cm of the observed level — with correct seasonal amplitude — enables earlier, more conservative spill decisions and better downstream flood warning.”

FOUR YEARS. TWO MODALITIES. ONE RESERVOIR.

Every input is freely available anywhere in Southeast Asia: open Copernicus satellite imagery and standard plant gauge records. January 2019 – December 2022.

CARD AFROM ORBIT

›28 cloud-screened Sentinel-2 scenes (Copernicus Data Space Ecosystem, Level-2A surface reflectance)
›Gap-filled into 48 monthly composites at 320 × 320 px
›Bands: near-infrared, red, green — the most informative for water delineation
›Native: 13 bands · 10–60 m resolution · ~5-day revisit

CARD BFROM THE GAUGE

›1,461 daily records aggregated from hourly plant measurements
›Water level (m a.s.l.) · upstream inflow (m³/s) · total release discharge (m³/s)
›Engineered features: 1–3-day lags of level & inflow, sin/cos day-of-year seasonal encoding, and a discharge-to-inflow ratio encoding the dam's operational state (>1 = drawdown, <1 = accumulation)

STUDY SITE

An Khe Reservoir, Ba River, Gia Lai Province (~13°57′N, 108°39′E)

CATCHMENT

~1,360 km² of steep forested terrain

OPERATING BAND

≈ 426 – 429 m a.s.l. (full supply level ≈ 429 m)

FLOOD SEASON

September – December

INFLOW RANGE

< 20 m³/s baseflow → 531.68 m³/s recorded peak

TRAIN SPLIT

Jan 2019 – May 2021 · 882 days (~70%)

VALIDATION SPLIT

Jun – Dec 2021 · 214 days (~20%)

TEST SPLIT

Jan – Dec 2022 · 365 days (~10%) — one full wet + dry cycle

LEAKAGE CONTROL

Strictly chronological split; scalers fit on training statistics only

OBSERVED WATER LEVEL · 2019–2022 (m a.s.l.)

OBSERVEDFLOOD SEASON

TRAIN (Jan 2019–May 2021) · VALIDATION (Jun–Dec 2021) · TEST (2022). SEP–DEC WINDOWS TINTED.

SENSE. FUSE. FORECAST.

Three purpose-built components, trained end-to-end with an MSE + L2 objective. Total: 7,505,920 parameters.

STEP 1

UMSFE / THE EYE

5.9M params · 92.6% of the network

A U-Net-based Multi-Scale Spatial Feature Extractor. Four encoder stages (64→512 channels) meet a C3 Swin Transformer bottleneck (window 7, 8 heads) that gives the network a global receptive field at linear cost. Every activation is a Dynamic Snake function — x + sin²(αx)/α with learnable α — a periodicity-aware nonlinearity built for the monsoon's annual rhythm. Seven multi-scale feature maps are compressed with learnable GeM pooling and projected to a 1024-d spatial embedding.

STEP 2

MTFE / THE MEMORY

0.39M params

A 4-layer MLP that lifts each day's standardized hydrological feature vector — levels, lags, inflow, discharge, season, operational ratio — into a 1024-d temporal embedding.

STEP 3

TF-BiLSTM / THE FORECASTER

0.16M params

Spatial + temporal embeddings concatenate into a 2048-d fused vector per day. A 2-layer Bidirectional LSTM (hidden 128/direction) reads a 4-day lookback window and emits the next-day water level in metres above sea level.

THE BRIDGE · MONTHLY → DAILY

The unsolved problem in prior work: satellites see the lake monthly, but operators need daily forecasts. Colubrid-Net's seasonal interpolation module expands each monthly spatial embedding to daily resolution with a deterministic day-of-year sinusoid (amplitude 0.05) plus calibrated Gaussian noise (σ = 0.02) — turning 28 scenes into 882 daily training signals while staying end-to-end differentiable.

PyTorch 1.12RTX 4070 Ti SuperAdam lr 1e-3L2 λ=1e-4batch 32lookback τ=4 daysearly stop @10dropout 0.2

0.0 cmMEAN ABSOLUTE ERROR ON AN UNSEEN FULL YEAR

0.0%ERROR VS. PREVIOUS PUBLISHED BEST ON THIS DATASET

0.000R² — VARIANCE EXPLAINED

0.000KLING-GUPTA EFFICIENCY

0DAILY RECORDS · 2019–2022

0CLOUD-SCREENED SENTINEL-2 SCENES

0.0MTRAINABLE PARAMETERS

0 MWINSTALLED CAPACITY PROTECTED

ONE MODEL. EVERY METRIC.

Nineteen baselines — classical regressors, gradient-boosted trees, deep recurrent hybrids, and the Temporal Fusion Transformer — evaluated on the same held-out 2022 test year. Colubrid-Net is the only model that leads simultaneously on error magnitude (MAE/RMSE), variance explanation (R²), and hydrological reliability (KGE).

MODEL	SPATIAL
Linear Regression	✗	1.730	2.094	0.175	0.118
Ridge Regression	✗	1.730	2.094	0.175	0.118
Lasso Regression	✗	1.726	2.117	0.157	0.084
SV Regression	✗	0.419	0.605	0.931	0.972
Random Forest	✗	0.732	0.947	0.831	0.894
Gradient Boosting	✗	0.493	0.665	0.916	0.959
XGBoost	✗	2.941	4.275	-2.436	0.140
LightGBM	✗	0.577	0.744	0.895	0.950
CatBoost	✗	0.537	0.710	0.905	0.957
AdaBoost	✗	0.200	0.306	0.911	0.954
Decision Tree	✗	2.828	4.463	-2.746	-0.051
K-Nearest Neighbors	✗	3.025	4.282	-2.448	0.205
MLP	✗	4.744	6.768	-7.612	-0.665
VGG19 + GRU	✓	0.235	0.360	0.929	0.967
U-Net + GRU	✓	0.215	1.364	0.482	0.633
VGG19 + Bi-LSTM	✓	0.026	0.047	0.936	0.965
U-Net + Bi-LSTM	✓	0.032	0.049	0.967	0.965
Temporal Fusion Transformer	✗	0.026	0.030	0.381	0.703
Chau et al. (prior best, multi-modal)	✓	0.117	0.145	—	—
COLUBRID-NET (OURS)	✓	0.024	0.046	0.969	0.972

WHY KGE MATTERS

The Temporal Fusion Transformer matches Colubrid-Net's MAE order (0.026 vs 0.024) — but its variability ratio α ≈ 0.62 reveals systematic amplitude compression: it hedges toward the seasonal mean and misses flood magnitudes. Colubrid-Net's α ≈ 0.97 means it reproduces flood amplitude nearly exactly. That is the difference between a number that looks right and a forecast an operator can act on.

EVERY DESIGN CHOICE, STRESS-TESTED.

A — FUSION DECODER

DECODER	MAE (m)	RMSE (m)
MLP (no recurrence)	0.1235	0.1296
GRU	0.0236	0.0523
Bi-GRU	0.0415	0.0595
LSTM	0.0255	0.0485
Bi-LSTM	0.0242	0.0464

Bidirectionality helps LSTM but hurts GRU — the benefit is architecture-dependent. Bi-LSTM wins on RMSE and overall balance.

B — ACTIVATION FUNCTIONS (ENC / DEC)

ENC / DEC	MAE (m)	RMSE (m)
ReLU / ReLU	0.2481	0.2673
ReLU / Snake	0.2848	0.2971
Snake / ReLU	0.3015	0.3234
Snake / Snake	0.0242	0.0464

Mixing activations breaks skip-connection feature statistics. Applied symmetrically, Dynamic Snake cuts MAE by ~90% versus all-ReLU.

PICK A DATE. WATCH THE MODEL WORK.

Choose any date in the study period and the model forecasts that day's water level from the four preceding days of fused satellite + gauge features. Compare it against what the gauge actually recorded.

SELECT DATESIMULATED DATA

October 2022

SELECT A DATE TO RUN THE MODEL

Try the quick-picks — the 2022 dates are unseen by the model.

WHAT POWERS THIS DEMO — THE EXACT ARTIFACTS BEHIND THE 2.4 CM RESULT

tf_bilstm_state.ptTHE WEIGHTS

The trained Temporal-Fusion Bi-LSTM: 2 bidirectional layers, hidden size 128 per direction, reading a 4-day window of 2048-d fused vectors. These are the exact parameters that produced MAE 0.0242 m / R² 0.969 / KGE 0.972 on the unseen 2022 test year. Every forecast on this page is live inference through these tensors — nothing is hard-coded.

combined_features.npyTHE FEATURES

A 1,461 × 2048 matrix: for every day of the study, a 1024-d UMSFE satellite embedding (U-Net + C3 Swin Transformer + Dynamic Snake + GeM pooling) concatenated with a 1024-d MTFE hydrological embedding. The image branch is 92.6% of the network's 7.5M parameters — freezing its per-day output lets the forecaster run on a free CPU in milliseconds.

scaler_y.pklTHE SCALER

The MinMax transform that maps the model's normalized output back to physical metres above sea level. It was fitted on training data only — the guarantee against data leakage. Without it, a raw output like 0.83 means nothing; with it, it becomes 428.91 m a.s.l.

predictions.jsonTHE FALLBACK

Every forecastable day (≈1,457 records), pre-computed offline from the same weights. If the live API is asleep or unreachable, the demo silently degrades to this file — same numbers, zero downtime. A forecast demo that crashes during a defense is worse than no demo.

ONE RESERVOIR, EVERY SEASON.

The An Khe reservoir's dendritic shape — carved by the Ba River through the Central Highlands — expands and contracts with the monsoon. Hover any scene to see what the network sees: the binary water extent.

WHAT IT CAN'T DO YET.

Serious engineering names its failure modes. From the thesis error analysis:

FLOOD PULSES

MAE rises to ≈3.8 cm on days immediately after inflow surges above 150 m³/s (4 episodes in the test year). The 4-day lookback may be too short for multi-week antecedent conditions.

CLOUDS

Optical Sentinel-2 imagery degrades exactly when forecasting matters most: the monsoon. A ~5 cm systematic underestimate appears during the cloudy full-pool plateau (Sep–mid-Nov).

ONE SITE

Trained and validated on a single reservoir; cross-catchment generalisation is unproven.

ROADMAP

SENTINEL-1 SAR (CLOUD-IMMUNE)

14–30 DAY LOOKBACK + RAINFALL FORECASTS

CROSS-ATTENTION FUSION

BAYESIAN UNCERTAINTY BANDS

KNOWLEDGE DISTILLATION FOR REAL-TIME EDGE DEPLOYMENT

MULTI-SITE TRANSFER ACROSS THE CENTRAL HIGHLANDS

PEER-REVIEWED. PUBLISHED. Q1.

Q1 · IF 4.4

Colubrid-Net: A Unified Cross-Modal Framework for Hydrological Forecasting in An Khe Reservoir, Vietnam

IEEE Geoscience and Remote Sensing Letters

vol. 23, pp. 1–5

DOI: 10.1109/LGRS.2025.3645672

VIEW ON IEEE XPLORE

AUTHOR

NGUYEN MINH ANH

PROGRAM: Graduation Project · Applied Information and Technology
INSTITUTION: Vietnam National University, Hanoi — International School (VNU-IS)
SUPERVISOR: Assoc. Prof., PhD Tran Thi Ngan
COLLABORATION: CoMI & CVR labs (VNU-IS) and the Center for Environmental Intelligence, VinUniversity.

GOT QUESTIONS?

Yes. The Live Forecast section calls a FastAPI service running the exact TF-BiLSTM weights trained in the thesis, over the same pre-computed fused feature tensors. Nothing is hard-coded.

FORECAST THE RESERVOIR.FORECAST THE RESERVOIR.FROM ORBIT.FROM ORBIT.WITHIN 2.4 CENTIMETRES.WITHIN 2.4 CENTIMETRES.