louisbrulenaudet
commited on
Commit
•
44efea8
1
Parent(s):
9327588
Update README.md
Browse files
README.md
CHANGED
@@ -351,6 +351,10 @@ language:
|
|
351 |
|
352 |
# Lemone-Embed: A Series of Fine-Tuned Embedding Models for French Taxation
|
353 |
|
|
|
|
|
|
|
|
|
354 |
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
355 |
|
356 |
## Model Details
|
|
|
351 |
|
352 |
# Lemone-Embed: A Series of Fine-Tuned Embedding Models for French Taxation
|
353 |
|
354 |
+
This sentence transformers model, specifically designed for French taxation, has been fine-tuned on a dataset comprising 43 million tokens, integrating a blend of semi-synthetic and fully synthetic data generated by GPT-4 Turbo and Llama 3.1 70B, which have been further refined through evol-instruction tuning and manual curation.
|
355 |
+
|
356 |
+
The model is tailored to meet the specific demands of information retrieval across large-scale tax-related corpora, supporting the implementation of production-ready Retrieval-Augmented Generation (RAG) applications. Its primary purpose is to enhance the efficiency and accuracy of legal processes in the taxation domain, with an emphasis on delivering consistent performance in real-world settings, while also contributing to advancements in legal natural language processing research.
|
357 |
+
|
358 |
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
359 |
|
360 |
## Model Details
|