RapMinerz commited on
Commit
86042cc
1 Parent(s): 0a03d85

update model

Browse files
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - fr
4
+ tags:
5
+ - music
6
+ - rap
7
+ - lyrics
8
+ - word2vec
9
+ library_name: gensim
10
+ ---
11
+ # Word2Bezbar: Word2Vec Models for French Rap Lyrics
12
+
13
+ ## Overview
14
+
15
+ __Word2Bezbar__ are __Word2Vec__ models trained on __french rap lyrics__ sourced from __Genius__. Tokenization has been done using __NLTK__ french `word_tokenze` function, with a prior processing to remove __french oral contractions__. Used dataset size was __323MB__, corresponding to __77M tokens__.
16
+
17
+ The model captures the __semantic relationships__ between words in the context of __french rap__, providing a useful tool for studies associated to __french slang__ and __music writing__.
18
+
19
+ ## Model Details
20
+
21
+ Size of this model is __medium__
22
+
23
+ | Parameter | Value |
24
+ |----------------|--------------|
25
+ | Dimensionality | 100 |
26
+ | Window Size | 5 |
27
+ | Epochs | 10 |
28
+ | Algorithm | CBOW |
29
+
30
+ ## Versions
31
+
32
+ This model has been trained with the followed software versions
33
+
34
+ | Requirement | Version |
35
+ |----------------|--------------|
36
+ | Python | 3.8.5 |
37
+ | Gensim library | 4.3.2 |
38
+ | NTLK library | 3.8.1 |
39
+
40
+ ## Installation
41
+
42
+ 1. **Install Required Python Libraries**:
43
+
44
+ ```bash
45
+ pip install gensim
46
+ ```
47
+
48
+ 2. **Clone the Repository**:
49
+
50
+ ```bash
51
+ git clone https://github.com/rapminerz/Word2Bezbar-medium.git
52
+ ```
53
+
54
+ 3. **Navigate to the Model Directory**:
55
+
56
+ ```bash
57
+ cd Word2Bezbar-medium
58
+ ```
59
+
60
+ ## Loading the Model
61
+
62
+ To load the Word2Bezbar Word2Vec model, use the following Python code:
63
+
64
+ ```python
65
+ import gensim
66
+
67
+ # Load the Word2Vec model
68
+ model = gensim.models.Word2Vec.load("word2vec.model")
69
+ ```
70
+
71
+ ## Using the Model
72
+
73
+ Once the model is loaded, you can use it as shown:
74
+
75
+ 1. **To get the most similary words regarding a word**
76
+
77
+ ```python
78
+ model.wv.most_similar("bendo")
79
+ [('binks', 0.8920747637748718),
80
+ ('bando', 0.8460732698440552),
81
+ ('hood', 0.8299438953399658),
82
+ ('tieks', 0.8264378309249878),
83
+ ('hall', 0.817583441734314),
84
+ ('secteur', 0.8145656585693359),
85
+ ('barrio', 0.809047281742096),
86
+ ('block', 0.793493390083313),
87
+ ('bâtiment', 0.7826434969902039),
88
+ ('bloc', 0.7753982543945312)]
89
+
90
+ model.wv.most_similar("kichta")
91
+ [('liasse', 0.878665566444397),
92
+ ('sse-lia', 0.8552991151809692),
93
+ ('kishta', 0.8535938262939453),
94
+ ('kich', 0.7646669149398804),
95
+ ('skalape', 0.7576569318771362),
96
+ ('moula', 0.7466527223587036),
97
+ ('valise', 0.7429592609405518),
98
+ ('sacoche', 0.7324921488761902),
99
+ ('mallette', 0.7247079014778137),
100
+ ('re-pai', 0.7060815095901489)]
101
+ ```
102
+
103
+ 2. **To find the word that doesn't match in a list of words**
104
+
105
+ ```python
106
+ model.wv.doesnt_match(["racli","gow","gadji","fimbi","boug"])
107
+ 'boug'
108
+
109
+ model.wv.doesnt_match(["Zidane","Mbappé","Ronaldo","Messi","Jordan"])
110
+ 'Jordan'
111
+ ```
112
+
113
+ 3. **To find the similarity between two words**
114
+
115
+ ```python
116
+ model.wv.similarity("kichta", "moula")
117
+ 0.7466528
118
+
119
+ model.wv.similarity("bonheur", "moula")
120
+ 0.16985293
121
+ ```
122
+
123
+ 4. **Or even get the vector representation of a word**
124
+
125
+ ```python
126
+ model.wv['ekip']
127
+ array([ 1.4757039e-01, ... 1.1260221e+00],
128
+ dtype=float32)
129
+ ```
130
+
131
+ ## Purpose and Disclaimer
132
+
133
+ This model is designed for academic and research purposes only. It is not intended for commercial use. The creators of this model do not endorse or promote any specific views or opinions that may be represented in the dataset.
134
+
135
+ ## Contact
136
+
137
+ For any questions or issues, please contact the repository owner, __RapMinerz__, at rapminerz.contact@gmail.com.
config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model": "Word2Bezbar-medium",
3
+ "files": [
4
+ "word2vec.model",
5
+ "word2vec.model.syn1neg.npy",
6
+ "word2vec.model.wv.vectors.npy"
7
+ ]
8
+ }
Word2Bezbar-medium.model → word2vec.model RENAMED
File without changes
Word2Bezbar-medium.model.syn1neg.npy → word2vec.model.syn1neg.npy RENAMED
File without changes
Word2Bezbar-medium.model.wv.vectors.npy → word2vec.model.wv.vectors.npy RENAMED
File without changes