VSRELL: A Simple Baseline for Video Super-Resolution and Enhancement in Low-Light environment

Tianjin University
CVPR 2026, Denver
x2lq Resolution: --
x2_output Resolution: --
GT Resolution: --
x4lq Resolution: --
x4_output Resolution: --
MoCEIR
PromptIR
StableLLVE+SwinIR
KinD+HAT
SwinIR+KinD
BasicVSR+++KinD
Ours
GT_Video

Move around the center to rotate the split line.

Image A
Image B
Image A
Image B
Image A
Image B
Image A
Image B
Image A
Image B
Image A
Image B

Abstract

We propose an integrated learning scheme of Video Super Resolution and Enhancement in Low-Light environment, named VSRELL, which aims to recover Well-Illuminated High-Resolution (WIHR) sequence from Low-Light Low-Resolution (LLLR) counterparts. Due to the complex coupling of joint degradations, this joint task has received relatively little attention. Our approach jointly models illumination enhancement and spatial-temporal super-resolution to disentangle intertwined degradations. Specifically, we introduce an Illumination-Noise Co-Optimization (INCO) network that employs a dynamic window partitioning strategy to explicitly model physical priors of illumination variations and noise distributions within individual frames of a long-term sequence. This effectively suppresses cross-frame noise accumulation and illumination flickering, achieving simultaneous optimization of motion compensation and brightness correction. Additionally, an Illumination-Sensitive Feature Propagation (ISFP) mechanism is introduced, which utilizes a hierarchical illumination-sensing gating unit to adaptively modulate feature channel responses. By adjusting feature propagation intensity and using a memory feature attenuation strategy, it can enhance the weighting of high-quality features, suppress error accumulation propagation, and improve transmission efficiency. Experiments show that VSRELL can explicitly strengthen the brightness continuity and texture fidelity of the restored output, maintaining temporal consistency across the video.

For code and technical details, please refer to VSRELL.zip and the Appendix.

Architecture Overview
Architecture Overview

Comparison

LL SR
Performance Gain

Performance Comparison

Performance comparison of different methods including cascading and All-in-One on REDS4 [24]. * represents retraining using the same training dataset as the proposed VSRELL. Red and blue colors indicate the best and second-best performance, respectively.

Methods #Params (M) Runtime (s) CLIP 000 CLIP 011 CLIP 015 CLIP 020 Average
PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑
Single Image Super-Resolution + Low-Light Enhancement
SwinIR [19]+ KinD [46] 11.90+8.54 1.045+0.346 20.44/0.7012 20.34/0.7641 19.51/0.8291 20.57/0.7863 20.22/0.7702
SwinIR [19]+ Zero-DCE [12] 11.90+0.07 1.045+0.002 21.39/0.7012 20.09/0.7589 18.68/0.8093 19.00/0.7573 19.79/0.7567
SwinIR [19]+ SCI [23] 11.90+0.00 1.045+0.001 18.75/0.7011 20.11/0.7713 16.58/0.8274 14.96/0.7534 17.60/0.7633
HAT [8]+ KinD [46] 20.80+8.54 1.469+0.135 20.49/0.7017 20.34/0.7723 19.69/0.8404 20.64/0.7905 20.29/0.7762
HAT [8]+ Zero-DCE [12] 20.80+0.07 1.469+0.002 21.51/0.7038 20.06/0.7739 18.71/0.8182 19.00/0.7623 19.82/0.7646
HAT [8]+ SCI [23] 20.80+0.00 1.469+0.001 18.87/0.7033 20.15/0.7694 16.57/0.8353 14.95/0.7584 17.64/0.7666
Single Image Low-Light Enhancement + Super-Resolution
KinD [46]+ SwinIR [19] 8.54+11.90 0.020+1.120 20.82/0.6103 20.29/0.7558 20.43/0.8130 20.75/0.7563 20.57/0.7339
KinD [46]+ HAT [8] 8.54+20.80 0.020+1.500 20.90/0.6155 20.28/0.7636 20.44/0.8165 20.75/0.7582 20.60/0.7385
Zero-DCE [12]+ SwinIR [19] 0.07+11.90 0.001+1.794 20.86/0.6933 21.08/0.7694 19.98/0.8205 20.64/0.7764 20.64/0.7649
Zero-DCE [12]+ HAT [8] 0.07+20.80 0.001+1.535 20.84/0.6943 21.02/0.7713 19.91/0.8163 20.64/0.7785 20.60/0.7651
SCI [23]+ SwinIR [19] 0.03+11.90 0.001+1.707 18.28/0.6744 19.42/0.7526 16.37/0.8016 14.62/0.7194 17.17/0.7370
SCI [23]+ HAT [8] 0.03+20.80 0.001+1.489 18.24/0.6754 19.36/0.7523 16.33/0.7986 14.61/0.7193 17.14/0.7364
Video Low-Light Enhancement + Super-Resolution
FastLLVE [18]+ SwinIR [19] 11.10+11.90 0.014+0.904 17.18/0.4343 18.42/0.5341 21.21/0.5914 18.20/0.4964 18.75/0.5141
FastLLVE [18]+ HAT [8] 11.10+20.80 0.014+1.420 17.05/0.4253 18.37/0.5153 20.99/0.5614 18.15/0.4834 18.64/0.4964
StableLLVE [45]+ SwinIR [19] 4.31+11.90 0.018+0.948 18.35/0.5673 16.76/0.6445 15.17/0.7096 14.85/0.6443 16.28/0.6414
StableLLVE [45]+ HAT [8] 4.31+20.80 0.018+1.425 18.37/0.5693 16.79/0.6522 15.18/0.7122 14.85/0.6451 16.30/0.6447
Video Super-Resolution + Low-Light Enhancement
BasicVSR++ [5]+ KinD [46] 31.42+8.54 3.926+0.133 14.91/0.4602 19.75/0.7273 17.94/0.7600 18.89/0.7731 17.87/0.6802
BasicVSR++ [5]+ Zero-DCE [12] 31.42+0.07 3.926+0.002 14.84/0.4533 19.50/0.7113 17.14/0.7383 17.73/0.7433 17.30/0.6616
BasicVSR++ [5]+ SCI [23] 31.42+0.00 3.926+0.001 13.33/0.4560 19.18/0.7266 15.46/0.7535 14.61/0.7357 15.65/0.6680
EDVR [36]+ KinD [46] 20.60+8.54 0.070+0.129 20.23/0.7024 20.32/0.7704 19.63/0.8356 20.76/0.7876 20.24/0.7740
EDVR [36]+ Zero-DCE [12] 20.60+0.07 0.070+0.002 21.41/0.7036 20.03/0.7646 18.65/0.8108 19.09/0.7754 19.80/0.7636
EDVR [36]+ SCI [23] 20.60+0.00 0.070+0.001 18.65/0.7034 19.98/0.7644 16.46/0.8284 14.89/0.7686 17.50/0.7662
IART [40]+ KinD [46] 13.40+8.54 0.552+0.132 20.84/0.7038 20.48/0.7759 19.83/0.8503 20.97/0.7914 20.53/0.7084
IART [40]+ Zero-DCE [12] 13.40+0.00 0.552+0.002 21.96/0.7046 20.27/0.7726 18.84/0.8384 19.39/0.7885 20.12/0.7760
IART [40]+ SCI [23] 13.40+0.00 0.552+0.001 19.27/0.7035 20.41/0.7747 16.61/0.8443 15.07/0.7911 17.84/0.7784
All-in-One
PromptIR* [25] 33.32 0.133 16.05/0.5642 19.78/0.6744 21.74/0.6882 18.48/0.5942 19.01/0.6303
AdaIR* [9] 29.09 0.364 16.20/0.5623 20.41/0.6863 22.38/0.7072 18.22/0.5941 19.30/0.6375
MoCE-IR* [43] 24.26 0.156 17.31/0.5876 20.56/0.6846 20.85/0.7025 17.81/0.5906 19.13/0.6413
Ours 6.290 0.069 24.43/0.7050 25.34/0.7776 28.37/0.8508 25.62/0.7920 25.94/0.7813
Visual comparision image 2
Visual Comparison