Bangla FastText Model (16 million tokens)
I am Uzzal Mondal (LinkedIn), I have also built a FastText model specifically for Bangla NLP tasks. This repository contains a FastText model trained on Bangla Wikipedia data. This model can be used for various NLP tasks, including word similarity, word embeddings, and semantic analysis in Bangla text. Check the accompanying Python script for practical examples.
- **Tokens Processed**: The model has read and processed 16 million tokens from the training corpus.
- **Vocabulary Size**: 120,332 unique words
- **Training Loss**: Average loss during training was 0.552678
- **Embedding Dimension**: 100
- **Training Configuration**: Epochs = 10
How to Use the Model
Load the Model in Your Code
Use the following code to download and use the model:
from huggingface_hub import hf_hub_download
import fasttext
# Download the model
model_path = hf_hub_download(repo_id="uzzalmondal/fasttext_wiki_bn_100d_16m", filename="fasttext_bn_wiki_100.bin")
# Load the model
model = fasttext.load_model(model_path)
# Perform a word analogy task: "King - Man + Woman = Queen"
analogy_result = model.get_nearest_neighbors('রাজা', k=10)
print("Analogy result (King - Man + Woman):")
for word, similarity in analogy_result:
print(f"{word}: {similarity}")
Output
Analogy result (King - Man + Woman):
0.8683507442474365: রাজার
0.8025670051574707: রাজায়
0.7848780751228333: রাজপুত্র
0.7837258577346802: রাজারাও
0.7768903374671936: সামন্তরাজা
0.7766559720039368: রাজসিংহাসনে
0.7681295275688171: রাজত্বের
0.7603954672813416: রাজপুত্রদের
0.7589437365531921: রাজাদের
0.7575206756591797: রাজপুত্রের
🔧 How Can You Contribute?
- Suggest improvements or new features
- Report any issues or bugs
- Contribute to the codebase or documentation
- Share your use cases or experiments
💬 Your feedback helps us:
- Make Bangla NLP tools more accessible
- Improve model performance
- Extend the model’s capabilities to more applications