|
--- |
|
license: mit |
|
base_model: dbmdz/bert-base-turkish-cased |
|
pipeline_tag: token-classification |
|
library_name: transformers |
|
tags: |
|
- ner |
|
- token-classification |
|
- pytorch |
|
- turkish |
|
- tr |
|
- dbmdz |
|
- bert |
|
- bert-base-cased |
|
- bert-base-turkish-cased |
|
widget: |
|
- text: "Bağlarbaşı Mahallesi, Zübeyde Hanım Caddesi No: 10 / 3 34710 Üsküdar/İstanbul" |
|
--- |
|
|
|
# address-extraction |
|
|
|
![Next Geography](https://nextgeography.com/wp-content/uploads/2022/02/next-geo-logo-1.png) |
|
|
|
This is a simple library to extract addresses from text. The train.py file contains the code to train but is just included for reference, not to be run. The model is trained on our own dataset of addresses, which is not included in this repo. There is also predict.py which is a simple script to run the model on a single address. |
|
|
|
The model is based on [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) from [Hugging Face](https://huggingface.co/). |
|
|
|
## Example Results |
|
|
|
``` |
|
(g:\projects\address-extraction\venv) G:\projects\address-extraction>python predict.py |
|
Osmangazi Mahallesi, Hoca Ahmet Yesevi Cd. No:34, 16050 Osmangazi/Bursa |
|
Osmangazi Mahalle 98.80% |
|
Hoca Ahmet Yesevi Cadde 98.55% |
|
34 Bina Numarası 99.50% |
|
16050 Posta Kodu 98.49% |
|
Osmangazi İlçe 98.71% |
|
Bursa İl 99.21% |
|
Average Score: 0.9874102413654328 |
|
Labels Found: 6 |
|
---------------------------------------------------------------------- |
|
Karşıyaka Mahallesi, Mavişehir Caddesi No: 91, Daire 4, 35540 Karşıyaka/İzmir |
|
Karşıyaka Mahalle 98.93% |
|
Mavişehir Cadde 96.90% |
|
91 Bina Numarası 99.25% |
|
4 Bina Numarası 30.75% |
|
35540 Posta Kodu 98.97% |
|
Karşıyaka İlçe 98.84% |
|
İzmir İl 98.86% |
|
Average Score: 0.9173339426517486 |
|
Labels Found: 7 |
|
---------------------------------------------------------------------- |
|
Selçuklu Mahallesi, Atatürk Bulvarı No: 55, 42050 Selçuklu/Konya |
|
Selçuklu Mahalle 98.53% |
|
Atatürk Cadde 47.01% |
|
55 Bina Numarası 99.49% |
|
42050 Posta Kodu 98.78% |
|
Selçuklu İlçe 98.74% |
|
Konya İl 99.16% |
|
Average Score: 0.9240859523415565 |
|
Labels Found: 6 |
|
---------------------------------------------------------------------- |
|
Alsancak Mahallesi, 1475. Sk. No:3, 35220 Konak/İzmir |
|
Alsancak Mahalle 99.35% |
|
1475 Sokak 97.71% |
|
3 Bina Numarası 99.18% |
|
35220 Posta Kodu 99.00% |
|
Konak İlçe 98.90% |
|
İzmir İl 98.95% |
|
Average Score: 0.9881603717803955 |
|
Labels Found: 6 |
|
---------------------------------------------------------------------- |
|
Kocatepe Mahallesi, Yaşam Caddesi 3. Sokak No:4, 06420 Bayrampaşa/İstanbul |
|
Kocatepe Mahalle 99.44% |
|
Yaşam Cadde 92.45% |
|
3 Sokak 70.61% |
|
4 Bina Numarası 99.18% |
|
06420 Posta Kodu 99.00% |
|
Bayrampaşa İlçe 98.86% |
|
İstanbul İl 98.90% |
|
Average Score: 0.9558616995811462 |
|
Labels Found: 7 |
|
---------------------------------------------------------------------- |
|
``` |
|
|
|
## Installation & Usage |
|
|
|
The environment.yml file contains the conda environment used to run the model. Environment is configured to use cuda enabled gpus but should work with no gpus too. To run the model, you can use the following commands: |
|
|
|
```bash |
|
conda env create -f environment.yml -p ./condaenv |
|
conda activate ./condaenv |
|
|
|
python predict.py |
|
``` |
|
|
|
|
|
## License |
|
|
|
This project is licensed under the terms of the MIT license. |