kakaobrain
/

coyo-align-b7-base

Model card Files Files and versions Community

coyo-align-b7-base / README.md

bgyoon's picture

Update README.md

ddaddb7 about 2 years ago

|

1.58 kB

	---
	language:
	- en
	tags:
	- align
	- clip
	license: apache-2.0
	datasets:
	- kakaobrain/coyo-700m
	inference: false
	---

	# Model Details

	This is an implementation of [ALIGN](https://arxiv.org/abs/2102.05918) trained on [COYO-700M](https://github.com/kakaobrain/coyo-dataset). The official ALIGN is trained on its dataset of 1.8B samples. That dataset is not released to the public. Instead, we trained our implementation of ALIGN model on [COYO-700M](https://github.com/kakaobrain/coyo-dataset).

	It's developed by Kakao Brain to validate the performance of COYO-700M dataset on a large-scale model.

	The training took about 10 days on V3-1024 with batch_size=64k.

	## Model Date

	April 2022

	## Model Type

	This is dual encoder model where
	- image encoder is using EfficientNet-B7 architecture
	- text encoder is using BERT-base architecture

	# Training data

	This model is trained on [COYO-700M](https://github.com/kakaobrain/coyo-dataset) dataset.

	# Evaluation results

	\| \| Dataset \| ImageNet \| Flickr30k \| \| MsCOCO \| \|
	\|--------------------------------\|:----------:\|:--------:\|:---------:\|:-------:\|:-------:\|:-------:\|
	\| \| \| KNN \| I2T R@1 \| T2I R@1 \| I2T R@1 \| T2I R@1 \|
	\| ALIGN-L2-Large(Google) \| ALIGN 1.8B \| 76.4 \| 88.6 \| 75.7 \| 58.6 \| 45.6 \|
	\| ALIGN-B7-Base(Google) \| ALIGN 1.8B \| 69.3 \| - \| - \| 55.4 \| 41.7 \|
	\| COYO-ALIGN-B7-Base(Kakao Brain) \| COYO-700M \| 68.6 \| 88.1 \| 73.2 \| 61.2 \| 43.1 \|