jaandoui
/

DNABERT2-AttentionExtracted

Inference Endpoints

Model card Files Files and versions Community

jaandoui commited on May 14

Commit

cc8cdbf

•

1 Parent(s): 7df9fc1

Update README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -14,8 +14,8 @@ Use  ```--model_name_or_path jaandoui/DNABERT2-AttentionExtracted``` instead of
 Most of the modifications were done in Bert_Layer.py.
 It has been modified especially for fine tuning and hasn't been tried for pretraining.
-Before or next to each modification, you can find "JAANDOUI" so to see al modifications, search for "JAANDOUI".
-"JAANDOUI TODO" means that if that part is going to be used, maybe something might be missing.
 Now in ```Trainer``` (or ```CustomTrainer``` if overwritten) in ```compute_loss(..)``` when defining the model:
         ```outputs = model(**inputs, return_dict=True, output_attentions=True)```
@@ -23,6 +23,11 @@ activate the extraction of attention: ```output_attentions=True``` (and ```retur
 You can now extract the attention in ```outputs.attentions```
 Read more about model outputs here: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/output#transformers.utils.ModelOutput
 The official link to DNABERT2 [DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
 ](https://arxiv.org/pdf/2306.15006.pdf).

 Most of the modifications were done in Bert_Layer.py.
 It has been modified especially for fine tuning and hasn't been tried for pretraining.
+Before or next to each modification, you can find ```"JAANDOUI"``` so to see al modifications, search for ```"JAANDOUI"```.
+```"JAANDOUI TODO"``` means that if that part is going to be used, maybe something might be missing.
 Now in ```Trainer``` (or ```CustomTrainer``` if overwritten) in ```compute_loss(..)``` when defining the model:
         ```outputs = model(**inputs, return_dict=True, output_attentions=True)```
 You can now extract the attention in ```outputs.attentions```
 Read more about model outputs here: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/output#transformers.utils.ModelOutput
+I'm also not using Triton, therefore cannot guarantee that it will work with it.
+I also read that there were some problems with extracting attention when using Flash Attention here: https://github.com/huggingface/transformers/issues/28903
+Not sure if that is relevant for us, since it's about Mistral models.
 The official link to DNABERT2 [DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
 ](https://arxiv.org/pdf/2306.15006.pdf).