English
align
clip
coyo-align-b7-base / README.md
bgyoon's picture
Update README.md
248f3b8
metadata
language:
  - en
tags:
  - align
  - clip
license: apache-2.0
datasets:
  - kakaobrain/coyo-700m
inference: false

Model Details

This is an unofficial implementation of ALIGN trained on COYO-700M. The official ALIGN is trained on its dataset of 1.8B samples. That dataset is not released to the public. Instead, we trained our implementation of ALIGN model on COYO-700M.

It's developed by Kakao Brain to validate the performance of COYO-700M dataset on a large-scale model.

The training took about 8 days on TPU V3-512.

Model Date

April 2022

Model Type

This is dual encoder model where

  • image encoder is using EfficientNet-B7 architecture
  • text encoder is using BERT-base architecture

Training data

This model is trained on COYO-700M dataset.

Evaluation results

Dataset ImageNet Flickr30k MsCOCO
KNN I2T R@1 T2I R@1 I2T R@1 T2I R@1
ALIGN-L2-Large(Google) ALIGN 1.8B 76.4 88.6 75.7 58.6 45.6
ALIGN-B7-Base(Google) ALIGN 1.8B 69.3 - - 55.4 41.7
COYO-ALIGN-B7-Base(Kakao Brain) COYO-700M 68.6 88.1 73.2 61.2 43.1