jaandoui commited on
Commit
cc8cdbf
1 Parent(s): 7df9fc1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -14,8 +14,8 @@ Use ```--model_name_or_path jaandoui/DNABERT2-AttentionExtracted``` instead of
14
 
15
  Most of the modifications were done in Bert_Layer.py.
16
  It has been modified especially for fine tuning and hasn't been tried for pretraining.
17
- Before or next to each modification, you can find "JAANDOUI" so to see al modifications, search for "JAANDOUI".
18
- "JAANDOUI TODO" means that if that part is going to be used, maybe something might be missing.
19
 
20
  Now in ```Trainer``` (or ```CustomTrainer``` if overwritten) in ```compute_loss(..)``` when defining the model:
21
  ```outputs = model(**inputs, return_dict=True, output_attentions=True)```
@@ -23,6 +23,11 @@ activate the extraction of attention: ```output_attentions=True``` (and ```retur
23
  You can now extract the attention in ```outputs.attentions```
24
  Read more about model outputs here: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/output#transformers.utils.ModelOutput
25
 
 
 
 
 
 
26
  The official link to DNABERT2 [DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
27
  ](https://arxiv.org/pdf/2306.15006.pdf).
28
 
 
14
 
15
  Most of the modifications were done in Bert_Layer.py.
16
  It has been modified especially for fine tuning and hasn't been tried for pretraining.
17
+ Before or next to each modification, you can find ```"JAANDOUI"``` so to see al modifications, search for ```"JAANDOUI"```.
18
+ ```"JAANDOUI TODO"``` means that if that part is going to be used, maybe something might be missing.
19
 
20
  Now in ```Trainer``` (or ```CustomTrainer``` if overwritten) in ```compute_loss(..)``` when defining the model:
21
  ```outputs = model(**inputs, return_dict=True, output_attentions=True)```
 
23
  You can now extract the attention in ```outputs.attentions```
24
  Read more about model outputs here: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/output#transformers.utils.ModelOutput
25
 
26
+ I'm also not using Triton, therefore cannot guarantee that it will work with it.
27
+
28
+ I also read that there were some problems with extracting attention when using Flash Attention here: https://github.com/huggingface/transformers/issues/28903
29
+ Not sure if that is relevant for us, since it's about Mistral models.
30
+
31
  The official link to DNABERT2 [DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
32
  ](https://arxiv.org/pdf/2306.15006.pdf).
33