VQA-vit5 / README.md
truong-xuan-linh's picture
Update README.md
5c2bd87 verified

Question:

  • Encoder: ViT5-base
  • Max length: 32
  • Pre-Processing: lower, remove special character

Image:

  • Encoder: VIT-base
  • Pre-Processing: None

OCR:

  • Text Detection: Paddle OCR

  • Text Recognition: VietOCR

    • Threshold: 0.8
  • Max length: 128

  • Post-processing: group layout, divide=4

Answer:

  • Max length: 56

Result:

  • Dev:
    • CIDEr: 3.4616
    • BLEU: 0.4689