|
--- |
|
extra_gated_heading: Access aimped/nlp-health-translation-base-en-fr on Hugging Face |
|
extra_gated_description: >- |
|
This is a form to enable access to this model on Hugging Face after you have |
|
been granted access from the Aimped. Please visit the [Aimped |
|
website](https://aimped.ai/) to Sign Up and accept our Terms of Use and |
|
Privacy Policy before submitting this form. Requests will be processed in 1-2 |
|
days. |
|
extra_gated_prompt: >- |
|
**Your Hugging Face account email address MUST match the email you provide on |
|
the Aimped website or your request will not be approved.** |
|
extra_gated_button_content: Submit |
|
extra_gated_fields: |
|
I agree to share my name, email address, and username with Aimped and confirm that I have already been granted download access on the Aimped website: checkbox |
|
license: cc-by-nc-4.0 |
|
language: |
|
- en |
|
- fr |
|
metrics: |
|
- bleu |
|
pipeline_tag: translation |
|
widget: |
|
- text: >- |
|
Objective: Physical traumas are one of the important causes of mortality and |
|
morbidity in childhood. Permanent disabilities resulting from traumas |
|
constitute significant losses for the individual and society. |
|
- text: >- |
|
Evidence is reported that a variety of chronic respiratory diseases, |
|
particularly COPD, asthma, bronchiectasis, lung cancer, interstitial lung |
|
diseases, and sarcoidosis, are significantly associated with poor clinical |
|
outcomes of COVID-19. |
|
tags: |
|
- medical |
|
- translation |
|
- medical translation |
|
datasets: |
|
- aimped/medical-translation-test-set |
|
--- |
|
<p> |
|
|
|
<p align="center"> |
|
<img src="https://raw.githubusercontent.com/ai-amplified/models/main/media/AimpedLogoDark.svg" alt="aimped logo" width="50%" height="50%"/> |
|
</p> |
|
|
|
### Description of the Model |
|
|
|
<p> |
|
Paper: <a href="https://arxiv.org/abs/2407.12126" style="text-decoration: underline; color: blue;">LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation</a> |
|
</p> |
|
|
|
</p> |
|
<p style="margin-bottom: 0in; text-align: justify; line-height: 1.3;"><span style="font-family: "IBM Plex Sans", sans-serif; font-size: 16px;">The Medical Translation AI model represents a specialized language model, trained for the accurate translations of medical documents from English to French. Its primary objective is to provide healthcare professionals, researchers, and individuals within the medical field with a reliable tool for the precise translation of a wide spectrum of medical documents. </span></p> |
|
<p style="margin-bottom: 0in; text-align: justify; line-height: 1.3;"> |
|
<span style="font-family: "IBM Plex Sans", sans-serif; font-size: 16px;">The development of this model entailed the utilization of the |
|
<a href="https://github.com/Helsinki-NLP/OPUS-MT-train/tree/master/models/en-fr" style="text-decoration: underline; color: blue;">Hensinki/MarianMT</a> neural translation architecture, which required 2+ days of intensive training using A100 (24G RAM) GPU. To create an exceptionally high-quality corpus for training the translation model, we combined both publicly available and proprietary datasets. These datasets were further enriched by meticulously curated text collected from online sources. In addition, the inclusion of clinical and discharge reports from diverse healthcare institutions enhanced the dataset's depth and diversity. This meticulous curation process plays a pivotal role in ensuring the model's ability to generate accurate translations tailored specifically to the medical domain, meeting the stringent standards expected by our users.<br><br>The versatility of the Medical Translation AI model extends to the translation of a wide array of healthcare-related documents, encompassing medical reports, patient records, medication instructions, research manuscripts, clinical trial documents, and more. By harnessing the capabilities of this model, users can efficiently and dependably obtain translations, thereby streamlining and expediting the often complex task of language translation within the medical field.</span> |
|
</p> |
|
<p style="margin-bottom: 0in; text-align: justify; line-height: 1.3;"><span style="font-family: "IBM Plex Sans", sans-serif; font-size: 16px;">The model we have developed outperforms leading translation companies like Google, Helsinki-Opus/MarianMT, and DeepL when compared against our meticulously curated proprietary test data set. </span></p> |
|
<p style="line-height: 1.3; margin-bottom: 0in; text-align: justify;"><br></p> |
|
|
|
<table style="border-collapse: collapse; width: 605px; height: 117px; border: 1px lightgray;"> |
|
<tbody> |
|
<tr> |
|
<td style="width: 19.5041%; border: 1px lightgray;"><br></td> |
|
<td style="width: 20.6612%; text-align: center; border: 1px lightgray; font-size: 16px;"><strong>ROUGE</strong></td> |
|
<td style="width: 20%; text-align: center; border: 1px lightgray; font-size: 16px;"><strong>BLEU</strong></td> |
|
<td style="width: 20%; text-align: center; border: 1px lightgray; font-size: 16px;"><strong>METEOR</strong></td> |
|
<td style="width: 20%; text-align: center; border: 1px lightgray; font-size: 16px;"><strong>BERT</strong></td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>Aimped</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.85</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.62</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.83</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.95</span></td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>Google</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.84</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.61</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.82</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.95</span></td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>DeepL</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.81</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.57</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.78</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.94</span></td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>Opus/MarianMT</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.80</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.52</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.76</span></td> |
|
<td style="text-align: center; border: 1px lightgray; font-size: 16px;"><span>0.93</span></td> |
|
</tr> |
|
</tbody> |
|
</table> |
|
<p></p> |
|
|
|
## Why should you use Aimped API? |
|
|
|
To get started, you can easily use our open-source version of the models for research purposes. However, the models provided through the Aimped API are trained on new data every three months. This ensures that the models understand ongoing healthcare developments in the world and can identify the most relevant medical terminology without a knowledge cutoff. In addition, we implement post/pre processing steps to improve the translation quality. Naturally, our quality control ensures that the models' performance always remains at least similar to previous versions. |
|
|
|
## How to Use: |
|
To get the right results, use this function. |
|
|
|
- Install requirements |
|
```python |
|
!pip install transformers |
|
!pip install sentencepiece |
|
!pip install aimped |
|
import nltk |
|
nltk.download('punkt') |
|
``` |
|
- import libraries |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline |
|
from aimped.nlp.translation import text_translate |
|
import torch |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
``` |
|
- load model |
|
```python |
|
model_path = "aimped/nlp-health-translation-base-en-fr" |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_path) |
|
``` |
|
```python |
|
translater = pipeline( |
|
task="translation_en_to_fr", |
|
model=model, |
|
tokenizer=tokenizer, |
|
device= device, |
|
max_length=512, |
|
num_beams=7, |
|
early_stopping=False, |
|
num_return_sequences=1, |
|
do_sample=False, |
|
|
|
) |
|
``` |
|
|
|
- Use Model: |
|
```python |
|
sentence = "Conclusion: According to our findings, the most common causes of major injuries in childhood are falls and home accidents." |
|
translated_text = text_translate([sentence],source_lang="en", pipeline=translater) |
|
``` |
|
## Test Set |
|
<p><span style="font-family: "IBM Plex Sans", sans-serif; font-size: 16px;">Trainin data: Public and in-house datasets.</span></p> |
|
<p><span style="font-family: "IBM Plex Sans", sans-serif; font-size: 16px;">Test data: Public and in-house datasets which is available <a href="https://github.com/ai-amplified/models/tree/main/medical_translation/test_data/en-fr pairs">here</a>.</span></p><br class="Apple-interchange-newline"> |