---
language: tr
tags:
  - text-classification
  - customer-support
  - Turkish
datasets:
  - Turkish_Conversations
license: mit
model_name: bert-topic-classification-turkish
base_model: dbmdz/bert-base-turkish-cased
library_name: transformers
pipeline_tag: text-classification
---


# bert-topic-classification-turkish

## Model Description
This is a fine-tuned BERT model for topic classification on Turkish text data. The model is trained on a custom dataset, **Turkish_Conversations**, consisting of Turkish customer support conversations. The model classifies text into the following 5 categories:

1. **Financial Services** (Finansal Hizmetler)
2. **Account Operations** (Hesap İşlemleri)
3. **Technical Support** (Teknik Destek)
4. **Products and Sales** (Ürün ve Satış)
5. **Returns and Exchanges** (İade ve Değişim)

The model achieves an accuracy of **93.51%** on the validation dataset.

---

## Usage
Below is an example of how to use the model for topic classification:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("GosamaIKU/bert-topic-classification-turkish")
model = AutoModelForSequenceClassification.from_pretrained("GosamaIKU/bert-topic-classification-turkish")

# Example dataset
dataset = [
    {"conversation_id": 1, "speaker": "customer", "text": "Siparişim eksik geldi."},
    {"conversation_id": 1, "speaker": "representative", "text": "Hemen kontrol edip size bilgi vereceğim."},
    {"conversation_id": 1, "speaker": "customer", "text": "Anlayışınız için teşekkür ederim."}
]

# Combine texts for topic analysis
combined_text = " ".join([item["text"] for item in dataset])
inputs = tokenizer(combined_text, return_tensors="pt")
outputs = model(**inputs)

# Access topic classification results
logits = outputs.logits
predicted_class = logits.argmax(dim=1).item()
print(f"Predicted Topic Class ID: {predicted_class}")

```

## Training Details
- **Base Model:** [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased)
- **Dataset:** **Turkish_Conversations** (Custom dataset for Turkish customer support)
- **Epochs:** 5
- **Batch Size:** 8
- **Learning Rate:** 0.00005
- **Accuracy:** 93.51%
- **Framework:** PyTorch

---

## Limitations
- The model may not perform well on text significantly different from the training data (e.g., informal or slang language).
- It is designed for topic classification and may not generalize to other NLP tasks like sentiment analysis or intent detection.
- Performance may degrade on very short or ambiguous texts.

---

## Model Files
This repository contains the following files:
- `config.json`: Model configuration file.
- `model.safetensors`: Model weights.
- `special_tokens_map.json`: Special tokens used in the tokenizer.
- `tokenizer_config.json`: Tokenizer configuration file.
- `vocab.txt`: Vocabulary file for the tokenizer.

---

## Links and Resources
- **Base Model:** [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased)
- **Zero-Shot Model (Optional):** [xlm-roberta-large-xnli](https://huggingface.co/joeddav/xlm-roberta-large-xnli)
- **Fine-Tuned Model:** [GosamaIKU/bert-topic-classification-turkish](https://huggingface.co/GosamaIKU/bert-topic-classification-turkish)

---

## Dataset
The model was fine-tuned on a custom dataset named **Turkish_Conversations**, which consists of 2,695 Turkish customer support conversations. The dataset includes text labeled into the following categories:
- Financial Services
- Account Operations
- Technical Support
- Products and Sales
- Returns and Exchanges

If you wish to access this dataset, please upload it to the repository or share a link to download it.

---

## License
This model is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.