dicta-il
/

dictabert-seg

Feature Extraction

text-embeddings-inference

Model card Files Files and versions Community

Shaltiel commited on Aug 29, 2023

Commit

b3075e8

•

1 Parent(s): b003ea2

Upload BertForPrefixMarking.py

Files changed (1) hide show

BertForPrefixMarking.py +1 -0

BertForPrefixMarking.py CHANGED Viewed

@@ -174,6 +174,7 @@ def encode_sentences_for_bert_for_prefix_marking(tokenizer: BertTokenizerFast, s
             next_tok_idx = tok_idx + 1
             while next_tok_idx < len(tokens) and tokens[next_tok_idx].startswith('##'):
                 token += tokens[next_tok_idx][2:]
             # find all the possible prefixes - and mark them as 0 (and in the possible mark it as it's value for embed lookup)
             for pre_class in get_prefix_classes_from_str(token):

             next_tok_idx = tok_idx + 1
             while next_tok_idx < len(tokens) and tokens[next_tok_idx].startswith('##'):
                 token += tokens[next_tok_idx][2:]
+                next_tok_idx += 1
             # find all the possible prefixes - and mark them as 0 (and in the possible mark it as it's value for embed lookup)
             for pre_class in get_prefix_classes_from_str(token):