STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Rui Xie1*,  Yinhong Liu1*,  Chen Zhao1,  Penghao Zhou2,  Zhenheng Yang2
Jun Zhou3,  Kai Zhang1,  Zhenyu Zhang1,  Jian Yang1,  Ying Tai1†
1Nanjing University, 2ByteDance,  3Southwest University

### πŸ”† Updates - **2024.12.01** The pretrained STAR model (I2VGen-XL version) and inference code have been released. ## πŸ”Ž Method Overview ![STAR](assets/overview.png) ## πŸ“· Results Display ![STAR](assets/teaser.png) ![STAR](assets/real_world.png) πŸ‘€ More visual results can be found in our [Project Page](https://nju-pcalab.github.io/projects/STAR) and [Video Demo](https://youtu.be/hx0zrql-SrU). ## βš™οΈ Dependencies and Installation ``` ## git clone this repository git clone https://github.com/NJU-PCALab/STAR.git cd STAR ## create an environment conda create -n star python=3.10 conda activate star pip install -r requirements.txt sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y ``` ## πŸš€ Inference #### Step 1: Download the pretrained model STAR from [HuggingFace](https://huggingface.co/SherryX/STAR). We provide two verisions, `heavy_deg.pt` for heavy degraded videos and `light_deg.pt` for light degraded videos (e.g., the low-resolution video downloaded from video websites). You can put the weight into `pretrained_weight/`. #### Step 2: Prepare testing data You can put the testing videos in the `input/video/`. As for the prompt, there are three options: 1. No prompt. 2. Automatically generate a prompt [using Pllava](https://github.com/hpcaitech/Open-Sora/tree/main/tools/caption#pllava-captioning). 3. Manually write the prompt. You can put the txt file in the `input/text/`. #### Step 3: Change the path You need to change the paths in `video_super_resolution/scripts/inference_sr.sh` to your local corresponding paths, including `video_folder_path`, `txt_file_path`, `model_path`, and `save_dir`. #### Step 4: Running inference command ``` bash video_super_resolution/scripts/inference_sr.sh ``` ## ❀️ Acknowledgments This project is based on [I2VGen-XL](https://github.com/ali-vilab/VGen), [VEnhancer](https://github.com/Vchitect/VEnhancer) and [CogVideoX](https://github.com/THUDM/CogVideo). Thanks for their awesome works. ## πŸŽ“Citations If our project helps your research or work, please consider citing our paper: ``` @misc{xie2024addsr, title={AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation}, author={Rui Xie and Ying Tai and Kai Zhang and Zhenyu Zhang and Jun Zhou and Jian Yang}, year={2024}, eprint={2404.01717}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` ## πŸ“§ Contact If you have any inquiries, please don't hesitate to reach out via email at `ruixie0097@gmail.com`