# SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022). The paper is available at [this link](https://arxiv.org/pdf/2203.10209.pdf). - We use the models pre-trained on ImageNet. The ImageNet pre-trained [SwinTransformer](https://drive.google.com/file/d/1wvzCMLJtEID8hBDu3wLpPv4xm3Es8ELC/view?usp=sharing) backbone is obtained from [SwinT_detectron2](https://github.com/xiaohu2015/SwinT_detectron2). ## Models [SWINTS-swin-english-pretrain [config]](https://github.com/mxin262/SwinTextSpotter/blob/main/projects/SWINTS/configs/SWINTS-swin-pretrain.yaml) \| [model_Google Drive](https://drive.google.com/file/d/1q3cNhJYPIZ8Sbk0-4i_gnQIF6z09rCKh/view?usp=sharing) \| [model_BaiduYun](https://pan.baidu.com/s/1INNghiHoI_K6m2t9YxVCIw) PW: 954t [SWINTS-swin-Total-Text [config]](https://github.com/mxin262/SwinTextSpotter/blob/main/projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml) \| [model_Google Drive](https://drive.google.com/file/d/1o6LbT0NayfIzTtJpozAqtz50wrSNnKIJ/view?usp=sharing) \| [model_BaiduYun](https://pan.baidu.com/s/1fLqMa9r-Ea2wIT6I81bwhA) PW: tf0i [SWINTS-swin-ctw [config]](https://github.com/mxin262/SwinTextSpotter/blob/main/projects/SWINTS/configs/SWINTS-swin-finetune-ctw.yaml) \| [model_Google Drive](https://drive.google.com/file/d/1LC7-JFuQIIYeUt_KaDH61ICGvVkRqkz7/view?usp=sharing) \| [model_BaiduYun](https://pan.baidu.com/s/1q7zZQ1Hnl6QPmwfJXal98Q) PW: 4etq [SWINTS-swin-icdar2015 [config]](https://github.com/mxin262/SwinTextSpotter/blob/main/projects/SWINTS/configs/SWINTS-swin-finetune-ic15.yaml) \| [model_Google Drive](https://drive.google.com/file/d/15lDht7RtN092DeGggN5qoEyTTUkmIuGs/view?usp=sharing) \| [model_BaiduYun](https://pan.baidu.com/s/1bWTwmIrZOUNqEUqx5cKXng) PW: 3n82 [SWINTS-swin-ReCTS [config]](https://github.com/mxin262/SwinTextSpotter/blob/main/projects/SWINTS/configs/SWINTS-swin-chn_finetune.yaml) \| [model_Google Drive](https://drive.google.com/file/d/1FLW35M18tw4fYSBL1qGzEOkTaD2t6mXT/view?usp=sharing) \| [model_BaiduYun](https://pan.baidu.com/s/1BHsLuwqUs_D_CO54UIaNPQ) PW: a4be [SWINTS-swin-vintext [config]](https://github.com/mxin262/SwinTextSpotter/blob/main/projects/SWINTS/configs/SWINTS-swin-finetune-vintext.yaml) \| [model_Google Drive](https://drive.google.com/file/d/1IfyPrYFnQOWoY8pPg-GIN5ofuALU15yD/view?usp=sharing) \| [model_BaiduYun](https://pan.baidu.com/s/1c5Xc9_lCun6mazhuxBk7sA) PW: slmp ## Installation - Python=3.8 - PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1 - OpenCV for visualization ## Steps 1. Install the repository (we recommend to use [Anaconda](https://www.anaconda.com/) for installation.) ``` conda create -n SWINTS python=3.8 -y conda activate SWINTS conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge pip install opencv-python pip install scipy pip install shapely pip install rapidfuzz pip install timm pip install Polygon3 git clone https://github.com/mxin262/SwinTextSpotter.git cd SwinTextSpotter python setup.py build develop ``` 2. dataset path ``` datasets |_ totaltext | |_ train_images | |_ test_images | |_ totaltext_train.json | |_ weak_voc_new.txt | |_ weak_voc_pair_list.txt |_ mlt2017 | |_ train_images | |_ annotations/icdar_2017_mlt.json ....... ``` Downloaded images - ICDAR2017-MLT [[image]](https://rrc.cvc.uab.es/?ch=8&com=downloads) - Syntext-150k: - Part1: 94,723 [[dataset]](https://universityofadelaide.box.com/s/xyqgqx058jlxiymiorw8fsfmxzf1n03p) - Part2: 54,327 [[dataset]](https://universityofadelaide.box.com/s/e0owoic8xacralf4j5slpgu50xfjoirs) - ICDAR2015 [[image]](https://rrc.cvc.uab.es/?ch=4&com=downloads) - ICDAR2013 [[image]](https://rrc.cvc.uab.es/?ch=2&com=downloads) - Total-Text_train_images [[image]](https://drive.google.com/file/d/1idATPS2Uc0PAwTBcT2ndYNLse3yKtT6G/view?usp=sharing) - Total-Text_test_images [[image]](https://drive.google.com/file/d/1P1mHAZN82HqR-YFui-wOTdp3zBY2N_lJ/view?usp=sharing) - ReCTs [[images&label]](https://pan.baidu.com/s/1JC0_rNbsyz564YakptP6Ow) PW: 2b4q - LSVT [[images&label]](https://pan.baidu.com/s/1j-zlH8SfmdTtH2OnuT9B7Q) PW: 9uh1 - ArT [[images&label]](https://pan.baidu.com/s/165RtrJVIsJ3QqDjesoX1jQ) PW: 2865 - SynChinese130k [[images]](https://drive.google.com/file/d/1w9BFDTfVgZvpLE003zM694E0we4OWmyP/view?usp=sharing)[[label]](https://drive.google.com/file/d/199sLThD_1e0vtDmpWrAEtUJyleS8DDTv/view?usp=sharing) - Vintext_images [[image]](https://drive.google.com/file/d/1O8t84JtlQZE9ev4dgHrK3TLfbzRu2z9E/view?usp=sharing) Downloaded label[[Google Drive]](https://drive.google.com/file/d/1wd_Z8UPNXRtnzU_qZCukKhxa_CDO5eaO/view?usp=sharing) [[BaiduYun]]( https://pan.baidu.com/s/1bFTlChn92GdOvcF4TfjjIw) PW: wpaf Downloader lexicion[[Google Drive]](https://drive.google.com/file/d/1jNX0NQKtyMC1pnh_IV__0drgNwTnupca/view?usp=sharing) and place it to corresponding dataset. You can also prepare your custom dataset following the example scripts. [[example scripts]](https://drive.google.com/file/d/1FE17GXyGPhDk5XI3EpbXwlOv1S8txOx2/view?usp=sharing) ## Totaltext To evaluate on Total Text, CTW1500, ICDAR2015, first download the zipped annotations with ``` cd datasets mkdir evaluation cd evaluation wget -O gt_ctw1500.zip https://cloudstor.aarnet.edu.au/plus/s/xU3yeM3GnidiSTr/download wget -O gt_totaltext.zip https://cloudstor.aarnet.edu.au/plus/s/SFHvin8BLUM4cNd/download wget -O gt_icdar2015.zip https://drive.google.com/file/d/1wrq_-qIyb_8dhYVlDzLZTTajQzbic82Z/view?usp=sharing wget -O gt_vintext.zip https://drive.google.com/file/d/11lNH0uKfWJ7Wc74PGshWCOgSxgEnUPEV/view?usp=sharing ``` 3. Pretrain SWINTS (e.g., with Swin-Transformer backbone) ``` python projects/SWINTS/train_net.py \ --num-gpus 8 \ --config-file projects/SWINTS/configs/SWINTS-swin-pretrain.yaml ``` 4. Fine-tune model on the mixed real dataset ``` python projects/SWINTS/train_net.py \ --num-gpus 8 \ --config-file projects/SWINTS/configs/SWINTS-swin-mixtrain.yaml ``` 5. Fine-tune model ``` python projects/SWINTS/train_net.py \ --num-gpus 8 \ --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml ``` 6. Evaluate SWINTS (e.g., with Swin-Transformer backbone) ``` python projects/SWINTS/train_net.py \ --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \ --eval-only MODEL.WEIGHTS ./output/model_final.pth ``` 7. Visualize the detection and recognition results (e.g., with ResNet50 backbone) ``` python demo/demo.py \ --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \ --input input1.jpg \ --output ./output \ --confidence-threshold 0.4 \ --opts MODEL.WEIGHTS ./output/model_final.pth ``` ## Example results: ## Acknowlegement [Adelaidet](https://github.com/aim-uofa/AdelaiDet), [Detectron2](https://github.com/facebookresearch/detectron2), [ISTR](https://github.com/hujiecpp/ISTR), [SwinT_detectron2](https://github.com/xiaohu2015/SwinT_detectron2), [Focal-Transformer](https://github.com/microsoft/Focal-Transformer) and [MaskTextSpotterV3](https://github.com/MhLiao/MaskTextSpotterV3). ## Citation If our paper helps your research, please cite it in your publications: ```BibText @article{huang2022swints, title = {SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition}, author = {Mingxin Huang and YuLiang liu and Zhenghao Peng and Chongyu Liu and Dahua Lin and Shenggao Zhu and Nicholas Yuan and Kai Ding and Lianwen Jin}, journal={arXiv preprint arXiv:2203.10209}, year = {2022} } ``` # Copyright For commercial purpose usage, please contact Dr. Lianwen Jin: eelwjin@scut.edu.cn Copyright 2019, Deep Learning and Vision Computing Lab, South China China University of Technology. http://www.dlvc-lab.net