brema76's picture
Update README.md
cffb09a
metadata
tags:
  - bertopic
library_name: bertopic
license: gpl-3.0

topic_immigration_it

This is a BERTopic model built as part of the European project VALAWAI and designed for predicting the topic distribution of immigration-related content in Italian language.
The model includes a pointer towards the model to be loaded in with SentenceTransformers, i.e., "sentence-transformers/paraphrase-multilingual-mpnet-base-v2".
The model was fine-tuned on a comprehensive set of tweets representing the information provided by both political entities and news sources on the immigration subject during the 5-year period 2018-2022.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("brema76/topic_immigration_it")

topic_model.get_topic_info()

example_text = "Questo è un esempio di testo sul topic immigrazione, subtopic sbarchi e accoglienza."
topics, probs = topic_model.transform(example_text)

probs = probs / probs.sum()

Topic overview

  • Number of topics: 36
  • Number of training documents: 159408
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
0 salvini - immigrato - ong - straniero - mare 77801 Crime
1 italia - sicilia - italiano - meloni - aquarius 27045 Reception
2 libia - libico - seawatch - tunisia - tunisino 18643 Sea rescue
3 ucraino - trump - profugo - polonia - russia 10779 Border crisis
4 sardegna - video - lampedusa - notte - algerino 3478 Landings
5 papa - chiesa - francesco - vescovo - papafrancesco 3395 The Church's view
6 coronavirus - positivo - virus - ospedale - contagio 2308 Coronavirus and its spread
7 hotspot - protestare - collasso - centro - lampedusa 2298 First reception centers
8 camion - arrestato - scafista - arresto - furgone 1624 Illegal immigration
9 incendio - baobab - tendopoli - moria - fiamma 1046 Shanty towns
10 reddito - pensione - cittadinanza - euro - investimento 1028 Economy
11 afghanistan - afgano - talebani - kabul - profugo 1011 Humanitarian corridors for refugees
12 turismo - turista - vacanza - estate - presenza 798 Tourism and vacations
13 incinta - bambino - donna - neonato - bimbo 776 Pregnancy, parenthood, and children
14 scuola - università - studente - tutore - lingua 651 Education and School-related themes
15 vaccino - vaccinato - vaccinale - vaccinare - covid 650 Vaccinations and EU digital Covid certificate
16 musumeci - ordinanza - islam - islamico - musulmano 621 Islam
17 alarmphone - egeo - naufragio - bambino - pericolo 552 Shipwrecks
18 stupro - sessuale - stuprato - violenza - stuprare 519 Sexual violence
19 multa - ong - decreto - amp - erostraniero 517 NGOs regulation
20 razzismo - razzista - odio - razziale - insulto 422 Racism and Hatred
21 agricoltura - schiavo - schiavitù - agricolo - schiavismo 409 Illegal work and exploitation
22 tubercoloso - malattia - tbc - salute - pandemia 361 Disease transmission
23 africa - africano - mission - continente - sviluppo 353 Cooperation and Development in Africa
24 asilo - giudice - richiedente - tribunale - ricorso 350 Right to asylum
25 dublino - regolamento - riforma - trattato - superare 310 EU regulation
26 fake - propaganda - fakenews - news - sapevatelo 285 Spread of fake news
27 droga - cocaina - hashish - eroina - marijuana 266 Drugs and Drug Dealing
28 film - miglior - oscar - sorrentino - dogman 263 Movies
29 gay - lgbt - omosessuale - cassazione - lgbti 149 LGBTQ+ and sexual minorities' rights
30 qatar - qatargate - panzeri - fifa - mondiale 129 Quatargate
31 perugia - università - suarez - rettore - universitàstranieri 127 Luis Suarez's Italian exam scam
32 brexit - regno - unito - inglese - qualificato 120 Brexit
33 basket - calcio - campionato - squadra - giocatore 117 Sports
34 profugo - volontario - gualzetti - sottoscrizione - accoglienza 107 Donations
35 matrimonio - combinato - finto - nozze - fittizio 100 Marriages and Fake Unions

Training hyperparameters

  • calculate_probabilities: True
  • language: None
  • low_memory: False
  • min_topic_size: 200
  • n_gram_range: (1, 1)
  • nr_topics: auto
  • seed_topic_list: None
  • top_n_words: 15
  • verbose: True

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 2.0.3
  • Scikit-Learn: 1.3.0
  • Sentence-transformers: 2.2.2
  • Transformers: 4.33.1
  • Numba: 0.56.4
  • Plotly: 5.16.1
  • Python: 3.10.11