VSRELL

Abstract

We propose an integrated learning scheme of Video Super Resolution and Enhancement in Low-Light environment, named VSRELL, which aims to recover Well-Illuminated High-Resolution (WIHR) sequence from Low-Light Low-Resolution (LLLR) counterparts. Due to the complex coupling of joint degradations, this joint task has received relatively little attention. Our approach jointly models illumination enhancement and spatial-temporal super-resolution to disentangle intertwined degradations. Specifically, we introduce an Illumination-Noise Co-Optimization (INCO) network that employs a dynamic window partitioning strategy to explicitly model physical priors of illumination variations and noise distributions within individual frames of a long-term sequence. This effectively suppresses cross-frame noise accumulation and illumination flickering, achieving simultaneous optimization of motion compensation and brightness correction. Additionally, an Illumination-Sensitive Feature Propagation (ISFP) mechanism is introduced, which utilizes a hierarchical illumination-sensing gating unit to adaptively modulate feature channel responses. By adjusting feature propagation intensity and using a memory feature attenuation strategy, it can enhance the weighting of high-quality features, suppress error accumulation propagation, and improve transmission efficiency. Experiments show that VSRELL can explicitly strengthen the brightness continuity and texture fidelity of the restored output, maintaining temporal consistency across the video.

For code and technical details, please refer to VSRELL.zip and the Appendix.

Comparison

Performance Gain

Performance Comparison

Performance comparison of different methods including cascading and All-in-One on REDS4 [24]. * represents retraining using the same training dataset as the proposed VSRELL. Red and blue colors indicate the best and second-best performance, respectively.

Methods	#Params (M)	Runtime (s)	CLIP 000	CLIP 011	CLIP 015	CLIP 020	Average
Methods	#Params (M)	Runtime (s)	PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑
Single Image Super-Resolution + Low-Light Enhancement
SwinIR [19]+ KinD [46]	11.90+8.54	1.045+0.346	20.44/0.7012	20.34/0.7641	19.51/0.8291	20.57/0.7863	20.22/0.7702
SwinIR [19]+ Zero-DCE [12]	11.90+0.07	1.045+0.002	21.39/0.7012	20.09/0.7589	18.68/0.8093	19.00/0.7573	19.79/0.7567
SwinIR [19]+ SCI [23]	11.90+0.00	1.045+0.001	18.75/0.7011	20.11/0.7713	16.58/0.8274	14.96/0.7534	17.60/0.7633
HAT [8]+ KinD [46]	20.80+8.54	1.469+0.135	20.49/0.7017	20.34/0.7723	19.69/0.8404	20.64/0.7905	20.29/0.7762
HAT [8]+ Zero-DCE [12]	20.80+0.07	1.469+0.002	21.51/0.7038	20.06/0.7739	18.71/0.8182	19.00/0.7623	19.82/0.7646
HAT [8]+ SCI [23]	20.80+0.00	1.469+0.001	18.87/0.7033	20.15/0.7694	16.57/0.8353	14.95/0.7584	17.64/0.7666
Single Image Low-Light Enhancement + Super-Resolution
KinD [46]+ SwinIR [19]	8.54+11.90	0.020+1.120	20.82/0.6103	20.29/0.7558	20.43/0.8130	20.75/0.7563	20.57/0.7339
KinD [46]+ HAT [8]	8.54+20.80	0.020+1.500	20.90/0.6155	20.28/0.7636	20.44/0.8165	20.75/0.7582	20.60/0.7385
Zero-DCE [12]+ SwinIR [19]	0.07+11.90	0.001+1.794	20.86/0.6933	21.08/0.7694	19.98/0.8205	20.64/0.7764	20.64/0.7649
Zero-DCE [12]+ HAT [8]	0.07+20.80	0.001+1.535	20.84/0.6943	21.02/0.7713	19.91/0.8163	20.64/0.7785	20.60/0.7651
SCI [23]+ SwinIR [19]	0.03+11.90	0.001+1.707	18.28/0.6744	19.42/0.7526	16.37/0.8016	14.62/0.7194	17.17/0.7370
SCI [23]+ HAT [8]	0.03+20.80	0.001+1.489	18.24/0.6754	19.36/0.7523	16.33/0.7986	14.61/0.7193	17.14/0.7364
Video Low-Light Enhancement + Super-Resolution
FastLLVE [18]+ SwinIR [19]	11.10+11.90	0.014+0.904	17.18/0.4343	18.42/0.5341	21.21/0.5914	18.20/0.4964	18.75/0.5141
FastLLVE [18]+ HAT [8]	11.10+20.80	0.014+1.420	17.05/0.4253	18.37/0.5153	20.99/0.5614	18.15/0.4834	18.64/0.4964
StableLLVE [45]+ SwinIR [19]	4.31+11.90	0.018+0.948	18.35/0.5673	16.76/0.6445	15.17/0.7096	14.85/0.6443	16.28/0.6414
StableLLVE [45]+ HAT [8]	4.31+20.80	0.018+1.425	18.37/0.5693	16.79/0.6522	15.18/0.7122	14.85/0.6451	16.30/0.6447
Video Super-Resolution + Low-Light Enhancement
BasicVSR++ [5]+ KinD [46]	31.42+8.54	3.926+0.133	14.91/0.4602	19.75/0.7273	17.94/0.7600	18.89/0.7731	17.87/0.6802
BasicVSR++ [5]+ Zero-DCE [12]	31.42+0.07	3.926+0.002	14.84/0.4533	19.50/0.7113	17.14/0.7383	17.73/0.7433	17.30/0.6616
BasicVSR++ [5]+ SCI [23]	31.42+0.00	3.926+0.001	13.33/0.4560	19.18/0.7266	15.46/0.7535	14.61/0.7357	15.65/0.6680
EDVR [36]+ KinD [46]	20.60+8.54	0.070+0.129	20.23/0.7024	20.32/0.7704	19.63/0.8356	20.76/0.7876	20.24/0.7740
EDVR [36]+ Zero-DCE [12]	20.60+0.07	0.070+0.002	21.41/0.7036	20.03/0.7646	18.65/0.8108	19.09/0.7754	19.80/0.7636
EDVR [36]+ SCI [23]	20.60+0.00	0.070+0.001	18.65/0.7034	19.98/0.7644	16.46/0.8284	14.89/0.7686	17.50/0.7662
IART [40]+ KinD [46]	13.40+8.54	0.552+0.132	20.84/0.7038	20.48/0.7759	19.83/0.8503	20.97/0.7914	20.53/0.7084
IART [40]+ Zero-DCE [12]	13.40+0.00	0.552+0.002	21.96/0.7046	20.27/0.7726	18.84/0.8384	19.39/0.7885	20.12/0.7760
IART [40]+ SCI [23]	13.40+0.00	0.552+0.001	19.27/0.7035	20.41/0.7747	16.61/0.8443	15.07/0.7911	17.84/0.7784
All-in-One
PromptIR* [25]	33.32	0.133	16.05/0.5642	19.78/0.6744	21.74/0.6882	18.48/0.5942	19.01/0.6303
AdaIR* [9]	29.09	0.364	16.20/0.5623	20.41/0.6863	22.38/0.7072	18.22/0.5941	19.30/0.6375
MoCE-IR* [43]	24.26	0.156	17.31/0.5876	20.56/0.6846	20.85/0.7025	17.81/0.5906	19.13/0.6413
Ours	6.290	0.069	24.43/0.7050	25.34/0.7776	28.37/0.8508	25.62/0.7920	25.94/0.7813

Visual Comparison

VSRELL: A Simple Baseline for Video Super-Resolution and Enhancement in Low-Light environment

Abstract

Comparison