AhmedSSabir
/

BERT-CNN-Visual-Semantic

Model card Files Files and versions Community

AhmedSSabir commited on Jan 9, 2023

Commit

86eab5b

•

1 Parent(s): 54500ea

Update README.md

Files changed (1) hide show

README.md +3 -6

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 # Visual semantic with BERT-CNN
-This model can be used to assign an object-to-caption semantic ness score, which is valuable for
 (1) caption diverse re-ranking, and (2) generate soft labels for caption filtering when scraping text-to-captions from the internet.
 To take advantage of the overlapping between the visual context and the caption, and to extract global information from each visual (i.e., object, scene, etc) we use BERT  as an embedding layer followed by a shallow CNN (tri-gram kernel) (Kim, 2014).
@@ -9,15 +9,13 @@ To take advantage of the overlapping between the visual context and the caption,
 Please refer to [Github](https://github.com/ahmedssabir/Visual-Semantic-Relatedness-Dataset-for-Image-Captioning) for more information.
  For datasets that are less than 100K please have look at our [shallow model](https://github.com/ahmedssabir/Semantic-Relatedness-Based-Reranker-for-Text-Spotting)
 The model is trained with a strict filter of 0.4 similarity distance thresholds between the object and its related caption.
- For a quick start please have a look at this [colab](https://colab.research.google.com/drive/1N0JVa6y8FKGLLSpiG7hd_W75UYhHRe2j?usp=sharing)
  For the [dataset](https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset)
@@ -33,7 +31,6 @@ pip install --upgrade tensorflow_hub==0.7.0
 git clone https://github.com/gaphex/bert_experimental/
 ```
 ```python
 import tensorflow as tf
 import numpy as np

 # Visual semantic with BERT-CNN
+This model can be used to assign an object-to-caption semantic relatedness score, which is valuable for
 (1) caption diverse re-ranking, and (2) generate soft labels for caption filtering when scraping text-to-captions from the internet.
 To take advantage of the overlapping between the visual context and the caption, and to extract global information from each visual (i.e., object, scene, etc) we use BERT  as an embedding layer followed by a shallow CNN (tri-gram kernel) (Kim, 2014).
 Please refer to [Github](https://github.com/ahmedssabir/Visual-Semantic-Relatedness-Dataset-for-Image-Captioning) for more information.
  For datasets that are less than 100K please have look at our [shallow model](https://github.com/ahmedssabir/Semantic-Relatedness-Based-Reranker-for-Text-Spotting)
 The model is trained with a strict filter of 0.4 similarity distance thresholds between the object and its related caption.
+ For a quick start please have a look at this [demo](https://github.com/ahmedssabir/Textual-Visual-Semantic-Dataset/blob/main/BERT_CNN_Visual_re_ranker_demo.ipynb)
  For the [dataset](https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset)
 git clone https://github.com/gaphex/bert_experimental/
 ```
 ```python
 import tensorflow as tf
 import numpy as np