metadata

language:
  - en
tags:
  - align
  - clip
license: apache-2.0
datasets:
  - kakaobrain/coyo-700m
inference: false

Model Details

This is an implementation of ALIGN trained on COYO-700M. The official ALIGN is trained on its dataset of 1.8B samples. That dataset is not released to the public. Instead, we trained our implementation of ALIGN model on COYO-700M.

It's developed by Kakao Brain to validate the performance of COYO-700M dataset on a large-scale model.

The training took about 10 days on V3-1024 with batch_size=64k.

Model Date

April 2022

Model Type

This is dual encoder model where

image encoder is using EfficientNet-B7 architecture
text encoder is using BERT-base architecture

Training data

This model is trained on COYO-700M dataset.

Evaluation results

	Dataset	ImageNet	Flickr30k		MsCOCO
		KNN	I2T R@1	T2I R@1	I2T R@1	T2I R@1
ALIGN-L2-Large(Google)	ALIGN 1.8B	76.4	88.6	75.7	58.6	45.6
ALIGN-B7-Base(Google)	ALIGN 1.8B	69.3	-	-	55.4	41.7
COYO-ALIGN-B7-Base(Kakao Brain)	COYO-700M	68.6	88.1	73.2	61.2	43.1