Move around the center to rotate the split line.
We propose an integrated learning scheme of Video Super Resolution and Enhancement in Low-Light environment, named VSRELL, which aims to recover Well-Illuminated High-Resolution (WIHR) sequence from Low-Light Low-Resolution (LLLR) counterparts. Due to the complex coupling of joint degradations, this joint task has received relatively little attention. Our approach jointly models illumination enhancement and spatial-temporal super-resolution to disentangle intertwined degradations. Specifically, we introduce an Illumination-Noise Co-Optimization (INCO) network that employs a dynamic window partitioning strategy to explicitly model physical priors of illumination variations and noise distributions within individual frames of a long-term sequence. This effectively suppresses cross-frame noise accumulation and illumination flickering, achieving simultaneous optimization of motion compensation and brightness correction. Additionally, an Illumination-Sensitive Feature Propagation (ISFP) mechanism is introduced, which utilizes a hierarchical illumination-sensing gating unit to adaptively modulate feature channel responses. By adjusting feature propagation intensity and using a memory feature attenuation strategy, it can enhance the weighting of high-quality features, suppress error accumulation propagation, and improve transmission efficiency. Experiments show that VSRELL can explicitly strengthen the brightness continuity and texture fidelity of the restored output, maintaining temporal consistency across the video.
For code and technical details, please refer to VSRELL.zip and the Appendix.
Performance Comparison
Performance comparison of different methods including cascading and All-in-One on REDS4 [24]. * represents retraining using the same training dataset as the proposed VSRELL. Red and blue colors indicate the best and second-best performance, respectively.
| Methods | #Params (M) | Runtime (s) | CLIP 000 | CLIP 011 | CLIP 015 | CLIP 020 | Average |
|---|---|---|---|---|---|---|---|
| PSNR↑/SSIM↑ | PSNR↑/SSIM↑ | PSNR↑/SSIM↑ | PSNR↑/SSIM↑ | PSNR↑/SSIM↑ | |||
| Single Image Super-Resolution + Low-Light Enhancement | |||||||
| SwinIR [19]+ KinD [46] | 11.90+8.54 | 1.045+0.346 | 20.44/0.7012 | 20.34/0.7641 | 19.51/0.8291 | 20.57/0.7863 | 20.22/0.7702 |
| SwinIR [19]+ Zero-DCE [12] | 11.90+0.07 | 1.045+0.002 | 21.39/0.7012 | 20.09/0.7589 | 18.68/0.8093 | 19.00/0.7573 | 19.79/0.7567 |
| SwinIR [19]+ SCI [23] | 11.90+0.00 | 1.045+0.001 | 18.75/0.7011 | 20.11/0.7713 | 16.58/0.8274 | 14.96/0.7534 | 17.60/0.7633 |
| HAT [8]+ KinD [46] | 20.80+8.54 | 1.469+0.135 | 20.49/0.7017 | 20.34/0.7723 | 19.69/0.8404 | 20.64/0.7905 | 20.29/0.7762 |
| HAT [8]+ Zero-DCE [12] | 20.80+0.07 | 1.469+0.002 | 21.51/0.7038 | 20.06/0.7739 | 18.71/0.8182 | 19.00/0.7623 | 19.82/0.7646 |
| HAT [8]+ SCI [23] | 20.80+0.00 | 1.469+0.001 | 18.87/0.7033 | 20.15/0.7694 | 16.57/0.8353 | 14.95/0.7584 | 17.64/0.7666 |
| Single Image Low-Light Enhancement + Super-Resolution | |||||||
| KinD [46]+ SwinIR [19] | 8.54+11.90 | 0.020+1.120 | 20.82/0.6103 | 20.29/0.7558 | 20.43/0.8130 | 20.75/0.7563 | 20.57/0.7339 |
| KinD [46]+ HAT [8] | 8.54+20.80 | 0.020+1.500 | 20.90/0.6155 | 20.28/0.7636 | 20.44/0.8165 | 20.75/0.7582 | 20.60/0.7385 |
| Zero-DCE [12]+ SwinIR [19] | 0.07+11.90 | 0.001+1.794 | 20.86/0.6933 | 21.08/0.7694 | 19.98/0.8205 | 20.64/0.7764 | 20.64/0.7649 |
| Zero-DCE [12]+ HAT [8] | 0.07+20.80 | 0.001+1.535 | 20.84/0.6943 | 21.02/0.7713 | 19.91/0.8163 | 20.64/0.7785 | 20.60/0.7651 |
| SCI [23]+ SwinIR [19] | 0.03+11.90 | 0.001+1.707 | 18.28/0.6744 | 19.42/0.7526 | 16.37/0.8016 | 14.62/0.7194 | 17.17/0.7370 |
| SCI [23]+ HAT [8] | 0.03+20.80 | 0.001+1.489 | 18.24/0.6754 | 19.36/0.7523 | 16.33/0.7986 | 14.61/0.7193 | 17.14/0.7364 |
| Video Low-Light Enhancement + Super-Resolution | |||||||
| FastLLVE [18]+ SwinIR [19] | 11.10+11.90 | 0.014+0.904 | 17.18/0.4343 | 18.42/0.5341 | 21.21/0.5914 | 18.20/0.4964 | 18.75/0.5141 |
| FastLLVE [18]+ HAT [8] | 11.10+20.80 | 0.014+1.420 | 17.05/0.4253 | 18.37/0.5153 | 20.99/0.5614 | 18.15/0.4834 | 18.64/0.4964 |
| StableLLVE [45]+ SwinIR [19] | 4.31+11.90 | 0.018+0.948 | 18.35/0.5673 | 16.76/0.6445 | 15.17/0.7096 | 14.85/0.6443 | 16.28/0.6414 |
| StableLLVE [45]+ HAT [8] | 4.31+20.80 | 0.018+1.425 | 18.37/0.5693 | 16.79/0.6522 | 15.18/0.7122 | 14.85/0.6451 | 16.30/0.6447 |
| Video Super-Resolution + Low-Light Enhancement | |||||||
| BasicVSR++ [5]+ KinD [46] | 31.42+8.54 | 3.926+0.133 | 14.91/0.4602 | 19.75/0.7273 | 17.94/0.7600 | 18.89/0.7731 | 17.87/0.6802 |
| BasicVSR++ [5]+ Zero-DCE [12] | 31.42+0.07 | 3.926+0.002 | 14.84/0.4533 | 19.50/0.7113 | 17.14/0.7383 | 17.73/0.7433 | 17.30/0.6616 |
| BasicVSR++ [5]+ SCI [23] | 31.42+0.00 | 3.926+0.001 | 13.33/0.4560 | 19.18/0.7266 | 15.46/0.7535 | 14.61/0.7357 | 15.65/0.6680 |
| EDVR [36]+ KinD [46] | 20.60+8.54 | 0.070+0.129 | 20.23/0.7024 | 20.32/0.7704 | 19.63/0.8356 | 20.76/0.7876 | 20.24/0.7740 |
| EDVR [36]+ Zero-DCE [12] | 20.60+0.07 | 0.070+0.002 | 21.41/0.7036 | 20.03/0.7646 | 18.65/0.8108 | 19.09/0.7754 | 19.80/0.7636 |
| EDVR [36]+ SCI [23] | 20.60+0.00 | 0.070+0.001 | 18.65/0.7034 | 19.98/0.7644 | 16.46/0.8284 | 14.89/0.7686 | 17.50/0.7662 |
| IART [40]+ KinD [46] | 13.40+8.54 | 0.552+0.132 | 20.84/0.7038 | 20.48/0.7759 | 19.83/0.8503 | 20.97/0.7914 | 20.53/0.7084 |
| IART [40]+ Zero-DCE [12] | 13.40+0.00 | 0.552+0.002 | 21.96/0.7046 | 20.27/0.7726 | 18.84/0.8384 | 19.39/0.7885 | 20.12/0.7760 |
| IART [40]+ SCI [23] | 13.40+0.00 | 0.552+0.001 | 19.27/0.7035 | 20.41/0.7747 | 16.61/0.8443 | 15.07/0.7911 | 17.84/0.7784 |
| All-in-One | |||||||
| PromptIR* [25] | 33.32 | 0.133 | 16.05/0.5642 | 19.78/0.6744 | 21.74/0.6882 | 18.48/0.5942 | 19.01/0.6303 |
| AdaIR* [9] | 29.09 | 0.364 | 16.20/0.5623 | 20.41/0.6863 | 22.38/0.7072 | 18.22/0.5941 | 19.30/0.6375 |
| MoCE-IR* [43] | 24.26 | 0.156 | 17.31/0.5876 | 20.56/0.6846 | 20.85/0.7025 | 17.81/0.5906 | 19.13/0.6413 |
| Ours | 6.290 | 0.069 | 24.43/0.7050 | 25.34/0.7776 | 28.37/0.8508 | 25.62/0.7920 | 25.94/0.7813 |