Serbian
Word2VecSr / README.md
procesaur's picture
Update README.md
501769b verified
metadata
license: cc-by-sa-4.0
datasets:
  - procesaur/Vikipedija
  - procesaur/Vikizvornik
  - procesaur/ZNANJE
  - jerteh/SrpELTeC
  - procesaur/kisobran
language:
  - sr

Word2Vec Sr

Обучаван над корпусом српског језика - 9.5 милијарди речи

Међу датотекама се налазе два модела (CBOW и SkipGram варијанте)

Trained on the Serbian language corpus - 9.5 billion words

There are two models among the files (CBOW and SkipGram variants)

from gensim.models import Word2Vec
model = Word2Vec.load("TeslaSG")
examples = [
    ("dim", "zavesa"),
    ("staklo", "zavesa"),
    ("ormar", "zavesa"),
    ("prozor", "zavesa"),
    ("draperija", "zavesa")
]
for e in examples:
    model.wv.similarity(e[0], e[1]))
0.5193785
0.5763144
0.59982747
0.6022524
0.7117646
Author
Mihailo Škorić
Computation
TESLA project


Истраживање jе спроведено уз подршку Фонда за науку Републике Србиjе, #7276, Text Embeddings – Serbian Language Applications – TESLA

This research was supported by the Science Fund of the Republic of Serbia, #7276, Text Embeddings - Serbian Language Applications - TESLA