uzzalmondal/fasttext_wiki_bn_100d_16m

Bangla FastText Model (16 million tokens)

I am Uzzal Mondal (LinkedIn), I have also built a FastText model specifically for Bangla NLP tasks. This repository contains a FastText model trained on Bangla Wikipedia data. This model can be used for various NLP tasks, including word similarity, word embeddings, and semantic analysis in Bangla text. Check the accompanying Python script for practical examples.

**Tokens Processed**: The model has read and processed 16 million tokens from the training corpus.

**Vocabulary Size**: 120,332 unique words

**Training Loss**: Average loss during training was 0.552678

**Embedding Dimension**: 100

**Training Configuration**: Epochs = 10

How to Use the Model

Load the Model in Your Code

Use the following code to download and use the model:

from huggingface_hub import hf_hub_download import fasttext # Download the model model_path = hf_hub_download(repo_id="uzzalmondal/fasttext_wiki_bn_100d_16m", filename="fasttext_bn_wiki_100.bin") # Load the model model = fasttext.load_model(model_path) # Perform a word analogy task: "King - Man + Woman = Queen" analogy_result = model.get_nearest_neighbors('রাজা', k=10) print("Analogy result (King - Man + Woman):") for word, similarity in analogy_result: print(f"{word}: {similarity}")

Output

Analogy result (King - Man + Woman): 0.8683507442474365: রাজার 0.8025670051574707: রাজায় 0.7848780751228333: রাজপুত্র 0.7837258577346802: রাজারাও 0.7768903374671936: সামন্তরাজা 0.7766559720039368: রাজসিংহাসনে 0.7681295275688171: রাজত্বের 0.7603954672813416: রাজপুত্রদের 0.7589437365531921: রাজাদের 0.7575206756591797: রাজপুত্রের

🔧 How Can You Contribute?

Suggest improvements or new features

Report any issues or bugs

Contribute to the codebase or documentation

Share your use cases or experiments

💬 Your feedback helps us:

Make Bangla NLP tools more accessible

Improve model performance

Extend the model’s capabilities to more applications