Update README.md
Browse files
README.md
CHANGED
@@ -21,8 +21,10 @@ tags:
|
|
21 |
|
22 |
## Intended Usage & Model Info
|
23 |
|
24 |
-
`jina-embeddings-v2-base-code` is an multilingual **embedding model** speaks English and
|
25 |
-
|
|
|
|
|
26 |
The backbone `jina-bert-v2-base-code` is pretrained on the [github-code](https://huggingface.co/datasets/codeparrot/github-code) dataset.
|
27 |
The model is further trained on Jina AI's collection of more than 150 millions of coding question answer and docstring source code pairs.
|
28 |
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|
|
|
21 |
|
22 |
## Intended Usage & Model Info
|
23 |
|
24 |
+
`jina-embeddings-v2-base-code` is an multilingual **embedding model** speaks **English and 30 widely used programming languages**.
|
25 |
+
Similar as other jina-embeddings-v2 series models, it supports **8192 sequence length**.
|
26 |
+
|
27 |
+
`jina-embeddings-v2-base-code` is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
|
28 |
The backbone `jina-bert-v2-base-code` is pretrained on the [github-code](https://huggingface.co/datasets/codeparrot/github-code) dataset.
|
29 |
The model is further trained on Jina AI's collection of more than 150 millions of coding question answer and docstring source code pairs.
|
30 |
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|