LLM2Vec applied on DictaLM-2.0

This is a Hebrew encoder model achieved by applying the LLM2Vec method onto DictaLM-2.0, utilizing the HeDC4 dataset.

Usage

import torch
from llm2vec import LLM2Vec

def get_device() -> str:
    if torch.backends.mps.is_available():
        return "mps"
    elif torch.cuda.is_available():
        return "cuda"
    return "cpu"


l2v = LLM2Vec.from_pretrained(
    base_model_name_or_path="omriel1/LLM2Vec-DictaLM2.0-mntp",
    peft_model_name_or_path="omriel1/LLM2Vec-DictaLM2.0-mntp-unsup-simcse",
    device_map=get_device(),
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

texts = [
    "ื”ื™ื™ ืžื” ืงื•ืจื”?",
    "ื”ื›ืœ ื˜ื•ื‘ ืื™ืชืš?"
]
results = l2v.encode(texts)
print(results)
Downloads last month
16
Inference API
Unable to determine this model's library. Check the docs .

Model tree for omriel1/LLM2Vec-DictaLM2.0-mntp-unsup-simcse

Finetuned
(5)
this model

Dataset used to train omriel1/LLM2Vec-DictaLM2.0-mntp-unsup-simcse