Commit
•
a4d0823
1
Parent(s):
f96bff8
Add BERTopic model
Browse files- README.md +223 -0
- config.json +14 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
@@ -0,0 +1,223 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- bertopic
|
5 |
+
library_name: bertopic
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# hub_issues_topocs
|
10 |
+
|
11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
To use this model, please install BERTopic:
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install -U bertopic
|
20 |
+
```
|
21 |
+
|
22 |
+
You can use the model as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from bertopic import BERTopic
|
26 |
+
topic_model = BERTopic.load("davanstrien/hub_issues_topocs")
|
27 |
+
|
28 |
+
topic_model.get_topic_info()
|
29 |
+
```
|
30 |
+
|
31 |
+
## Topic overview
|
32 |
+
|
33 |
+
* Number of topics: 156
|
34 |
+
* Number of training documents: 6427
|
35 |
+
|
36 |
+
<details>
|
37 |
+
<summary>Click here for an overview of all topics.</summary>
|
38 |
+
|
39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
40 |
+
|----------|----------------|-----------------|-------|
|
41 |
+
| -1 | model - version - training - add - base | 10 | Outlier Topic |
|
42 |
+
| 0 | yes - upscaling - embeddings - dir - 18 | 1785 | Yes Upscaling VAE Embeddings |
|
43 |
+
| 1 | images - image - img2img - generated - black | 218 | Image Distortion Investigation |
|
44 |
+
| 2 | languages - language - chinese - support - multilingual | 169 | Multilingual Language Support |
|
45 |
+
| 3 | request - thesis - checker - request request - work | 103 | DOI request and thesis checker |
|
46 |
+
| 4 | bloom - 176b - bloomz - bert - 7b1 | 95 | Bloom inference on BERT |
|
47 |
+
| 5 | api - inference api - hosted - inference - hosted inference | 80 | Configuring Inference API |
|
48 |
+
| 6 | report report - report - reports - look - awesome | 78 | Awesome Reports |
|
49 |
+
| 7 | use model - run model - model run - model use - tune model | 73 | Use model instructions |
|
50 |
+
| 8 | request access - access request - access - request - request requesting | 65 | Access Request Solution |
|
51 |
+
| 9 | colab - google - google colab - model google - collab | 64 | "Running Galactica on Colab" |
|
52 |
+
| 10 | json - config json - config - json file - file named | 62 | JSON configuration files |
|
53 |
+
| 11 | load model - load - model working - unable load - unable | 60 | "Model loading issues" |
|
54 |
+
| 12 | text - text generation - words - truncated - generation | 57 | Text Generation Techniques |
|
55 |
+
| 13 | label - labels - tags - classifier - entity | 57 | Document Labels |
|
56 |
+
| 14 | data - model dataset - dataset - train model - used train | 55 | Model Training Data |
|
57 |
+
| 15 | issue report - issue - report - 论文 - artists | 55 | Ethical Issues in Artists' Legal Discussion |
|
58 |
+
| 16 | loading - loading model - error loading - model error - load model | 55 | Model Loading Errors |
|
59 |
+
| 17 | error error - error - 500 error - connection - unknown error | 49 | Error 500 Connection |
|
60 |
+
| 18 | train model - train - trained - model did - model trained | 46 | Training models in Arabic |
|
61 |
+
| 19 | stable diffusion - diffusion - stable - diffusion v1 - diffusion webui | 46 | Stable Diffusion Downloads |
|
62 |
+
| 20 | question - answers - questions - tts - double | 45 | Question about Fig.2c |
|
63 |
+
| 21 | length - max - maximum - limit - sequence length | 45 | Length Limits and Token Length |
|
64 |
+
| 22 | model model - model architecture - generator - architecture - type | 42 | Model Architecture |
|
65 |
+
| 23 | commercial - license - commercial use - license license - mit | 41 | Commercial Use License |
|
66 |
+
| 24 | transformers - transformer - sentence transformers - sentence - using transformers | 40 | Issues with sentence transformers |
|
67 |
+
| 25 | huggingface - hugging face - hugging - face - using hugging | 40 | Hugging Face model usage |
|
68 |
+
| 26 | legal - legal issue - issue report - issue - report | 40 | Legal Issues Reports |
|
69 |
+
| 27 | v2 - v3 - anime - wav2vec2 - virus | 40 | Anime Virus Detection Vae |
|
70 |
+
| 28 | tutorials - thread - tricks - 26 - tips | 39 | Stable Diffusion 26+ Tutorials |
|
71 |
+
| 29 | difference - fp16 - dpm - opus - opus mt | 39 | Difference between phase1 and phase2 |
|
72 |
+
| 30 | tokenizer - using from_pretrained - loading - error loading - load | 37 | Tokenizer Loading Error |
|
73 |
+
| 31 | output - extraction - truncated - summaries - outputs | 37 | Output Extraction |
|
74 |
+
| 32 | attribute - object - attributeerror - typeerror - string | 36 | AttributeError in object attributes |
|
75 |
+
| 33 | ckpt file - ckpt - file ckpt - file - ckpt files | 36 | CKPT file location |
|
76 |
+
| 34 | dataset dataset - dataset - source dataset - datasets - source | 36 | dataset source semantic search |
|
77 |
+
| 35 | size - mismatch - discrepancy - vocab size - dimensionality | 36 | Size Mismatch Discrepancy |
|
78 |
+
| 36 | license - license license - permission - agreement - licence | 36 | License Agreement |
|
79 |
+
| 37 | model card - card - card model - building model - building | 35 | Model Card Typos |
|
80 |
+
| 38 | demo - space - spaces - gradio - cause | 35 | Troubleshooting Gradio Demo |
|
81 |
+
| 39 | commercially - does model - commercial - model used - usable | 34 | Commercial Usability of AI Model |
|
82 |
+
| 40 | automatic1111 - webui - automatic - ui - web ui | 33 | Automatic1111 WebUI |
|
83 |
+
| 41 | import - transformers - module - failed - export | 33 | ImportError in Transformers Module |
|
84 |
+
| 42 | example - examples - example use - prompt example - usage example | 33 | Example Usage |
|
85 |
+
| 43 | audio - noise - spectrogram - second - speaker | 33 | Audio Transcription and Conversion |
|
86 |
+
| 44 | cool - love - idea - amazing - great | 32 | "cool and amazing" |
|
87 |
+
| 45 | language model - language - kenlm - lm - multilingual | 32 | Language Model Inference with KenLM |
|
88 |
+
| 46 | really - nice - cool - love - amazing | 32 | amazing model |
|
89 |
+
| 47 | sagemaker - endpoint - deployment - deploy - amazon | 32 | Deploying SageMaker Endpoints |
|
90 |
+
| 48 | training training - training - training steps - general - video | 31 | "Training Steps Video" |
|
91 |
+
| 49 | tokenizer - problems - masked - tokenizer tokenizer - tokens | 31 | Tokenizer Problems |
|
92 |
+
| 50 | sd - sd2 - sd sd - does support - wd | 30 | Using SD with Different Versions |
|
93 |
+
| 51 | test - testing - sampler - discussion - split | 30 | Testing Sampler Discussion |
|
94 |
+
| 52 | argument - unexpected - keyword - typeerror - got | 30 | Unexpected keyword argument TypeError |
|
95 |
+
| 53 | float - runtimeerror expected - runtimeerror - expected - type | 30 | RuntimeErrors with Float and Half Types |
|
96 |
+
| 54 | dataset used - dataset - dataset dataset - used fine - used | 28 | Dataset Usage |
|
97 |
+
| 55 | json - json file - model architecture - inconsistency - architecture | 28 | JSON file inconsistency |
|
98 |
+
| 56 | usage - project - app - macos - usage questions | 28 | Usage with Sherpa |
|
99 |
+
| 57 | reproduce - results - result - civitai - reproducing results | 28 | Reproduce Result Difficulty |
|
100 |
+
| 58 | gene - cell - question generation - generation - geneformer | 27 | Gene Embedding Generation |
|
101 |
+
| 59 | gpu - gpus - multiple - gpu run - model multiple | 27 | Multi-GPU Model Execution |
|
102 |
+
| 60 | tokenizer use - wlop - mean - token - webui version | 26 | Tokenizer for Cantonese |
|
103 |
+
| 61 | model fine - tuning model - fine tuning - fine - tuning | 26 | Fine-Tuning the Model |
|
104 |
+
| 62 | model training - training model - training - redshift - model model | 26 | Model Training |
|
105 |
+
| 63 | bot - discord - tesla - chat - character | 26 | Tesla Discord Bot 2021 |
|
106 |
+
| 64 | work - doesn work - doesn - dont - does appear | 26 | Non-functional potty lora |
|
107 |
+
| 65 | use use - use - best - way use - methods | 26 | Best ways to use |
|
108 |
+
| 66 | report card - metadata - card - report - | 26 | Metadata Report Card |
|
109 |
+
| 67 | guide - instructions - guidance - prompt - cost | 25 | Fine-tuning guide instructions |
|
110 |
+
| 68 | code - finetuning code - finetuning - fine tuning - tuning | 25 | Fine-tuning Code Sample |
|
111 |
+
| 69 | dataset - custom dataset - dataset fine - custom - fine tuning | 25 | Custom dataset fine-tuning |
|
112 |
+
| 70 | safetensors - safetensor - version - version safetensors - safetensor version | 25 | SafeTensors Version Inquiry |
|
113 |
+
| 71 | model based - task model - model changes - bring - v7 | 25 | Model Description and Changes |
|
114 |
+
| 72 | weights - weight - flax - diffusers weights - load weights | 25 | Outdated Flax Weights |
|
115 |
+
| 73 | style - modern - mode - new - dark mode | 24 | Style in Modern Technology |
|
116 |
+
| 74 | convert - format - trying convert - safetensors - converter | 24 | Safetensors conversion error |
|
117 |
+
| 75 | checkpoint - save - checkpoint file - checkpoints - restore | 24 | Checkpoint Safety Restore |
|
118 |
+
| 76 | t5 - flan t5 - flan - google flan - xxl | 23 | T5 vs Flan-T5 Differences |
|
119 |
+
| 77 | download model - model load - download - load - model download | 23 | "Model Download" |
|
120 |
+
| 78 | access access - access - access need - need access - need | 23 | Access Request Assistance |
|
121 |
+
| 79 | model details - details model - details - information model - model access | 23 | Model Details |
|
122 |
+
| 80 | job - excellent - nice - great - congrats | 23 | Job Well Done |
|
123 |
+
| 81 | onnx - conversion - onnx conversion - convert - torchscript | 22 | ONNX Conversion Implementation |
|
124 |
+
| 82 | git - repository - repo - cloning - slow | 22 | Git repository cloning issues |
|
125 |
+
| 83 | online - 50 - 200 - buy - annotator | 22 | Buy Medications Online |
|
126 |
+
| 84 | access - request access - acces request - access request - request | 22 | Access Request |
|
127 |
+
| 85 | cuda - cuda memory - memory - cuda error - memory cuda | 22 | CUDA memory out of error |
|
128 |
+
| 86 | api model - api - inference api - model api - trying use | 22 | API Model Errors |
|
129 |
+
| 87 | training data - data training - data - training dataset - training | 22 | Data Training Examples |
|
130 |
+
| 88 | pipeline - valid - pipe - sentence similarity - similarity | 21 | Pipeline error analysis |
|
131 |
+
| 89 | tensor - tensors - device - expected - size | 21 | Tensor size mismatch errors |
|
132 |
+
| 90 | in_silico_perturber - eos_token_id - switch - 64 - encoder | 21 | Error in decoder generation |
|
133 |
+
| 91 | pytorch_model - pytorch_model bin - bin - diffusion_pytorch_model bin - diffusion_pytorch_model | 21 | Missing pytorch_model.bin file |
|
134 |
+
| 92 | 404 - url - https - https huggingface - resolve | 21 | 404 error Huggingface documents |
|
135 |
+
| 93 | requirements - acess - feature request - request request - feature | 21 | System Requirements Access |
|
136 |
+
| 94 | info - technical - details - information - detailed | 21 | Technical Details Inquiry |
|
137 |
+
| 95 | hello - hi - good - translates - 100 | 20 | Greetings and Translations |
|
138 |
+
| 96 | accuracy - drop - compatibility - precision - half precision | 20 | Accuracy Drop in Precision |
|
139 |
+
| 97 | access request - request access - access - request - new | 20 | Access Request |
|
140 |
+
| 98 | file missing - log - filenotfounderror - location - sorry | 20 | File Not Found |
|
141 |
+
| 99 | model card - card - link model - link - example model | 20 | Broken link in model |
|
142 |
+
| 100 | python - kernel - 10 - pytorch - talks | 20 | Python usage and errors |
|
143 |
+
| 101 | bug - fix - racist - possible bug - thing | 19 | Bug Fix with Racist Bug |
|
144 |
+
| 102 | training code - code training - code - share - share training | 19 | "Training Code Sharing" |
|
145 |
+
| 103 | license - accept - license license - model accept - indication | 19 | Model License |
|
146 |
+
| 104 | gpt - protgpt2 - 6b - jt - gpt jt | 19 | GPT-JT-6B-v1 Abilities |
|
147 |
+
| 105 | report report - report - - - | 19 | Multiple Reports on Topic |
|
148 |
+
| 106 | tuning fine - tune fine - fine - fine tuning - tuning | 18 | Fine-tuning for domain adaptation |
|
149 |
+
| 107 | inpaint model - inpaint - ix - size model - model pruned | 18 | Inpaint Model |
|
150 |
+
| 108 | config file - config - tokenizer config - files config - file | 18 | Config File Troubleshooting |
|
151 |
+
| 109 | sample code - example - sample - copied - error example | 18 | Issues with sample code |
|
152 |
+
| 110 | nsfw - nsfw content - content - disable - safety | 18 | NSFW Content Filtering |
|
153 |
+
| 111 | length - summary - longformer - summary length - text length | 18 | Length of Summaries |
|
154 |
+
| 112 | access download - access - download - access access - download working | 18 | Access Download |
|
155 |
+
| 113 | thank - thanks - just want - pretty - request thank | 18 | Thank you efforts |
|
156 |
+
| 114 | sd v1 - v1 - ema ckpt - sd - ema | 18 | Access to sd-v1-4-full-ema.ckpt |
|
157 |
+
| 115 | padding_side - tokens - token - cls token - token id | 18 | Padding and token discrepancy |
|
158 |
+
| 116 | amd - vram - gb - gpu - 448 | 17 | "AMD GPU compatibility" |
|
159 |
+
| 117 | dataset - pretraining - dataset dataset - datasets - request dataset | 17 | Dataset Pretraining |
|
160 |
+
| 118 | version - ggml version - version ggml - ggml - pytorch version | 17 | "Version Possibility" |
|
161 |
+
| 119 | memory - leak - a100 - cuda memory - memory google | 17 | Memory-related Issues |
|
162 |
+
| 120 | trigger - words - word - trigger word - semantic | 17 | Trigger words and semantic search |
|
163 |
+
| 121 | result - results - output - score - ways | 16 | Visualizing Inference Results |
|
164 |
+
| 122 | sd - tested - sd sd - lora training - ui | 16 | Stable Diffusion LORA Training |
|
165 |
+
| 123 | ckpt file - bin - convert - weights - dreambooth | 16 | Convert Diffusion Diffusers to CKPT |
|
166 |
+
| 124 | need help - help - help help - need - started | 16 | Need Help Getting Started |
|
167 |
+
| 125 | keyerror - key - exception error - key error - codegen | 16 | KeyError Troubleshooting |
|
168 |
+
| 126 | controlnet - control - a1111 - installed - model embedding | 16 | ControlNet not working |
|
169 |
+
| 127 | implementation - issue - solved - np - experiencing | 16 | Implementation Issue Fix |
|
170 |
+
| 128 | runtimeerror - time series - everytime - process runtimeerror - try run | 16 | Time Series Runtime Error |
|
171 |
+
| 129 | use use - use - use readme - use diffusers - tk | 15 | How to use Diffusers |
|
172 |
+
| 130 | training dataset - dataset used - used dataset - nli - used training | 15 | Training Dataset Used |
|
173 |
+
| 131 | yaml files - colab pc - install run - diffusion google - train custom | 15 | Stable Diffusion Tutorials |
|
174 |
+
| 132 | spam - deleted - removed - delete - contact | 15 | Removal of Spam Discussion |
|
175 |
+
| 133 | details training - details - training - details details - details info | 14 | Training Details |
|
176 |
+
| 134 | hyper parameters - hyper - parameters - provide - provide training | 14 | Hyperparameter Optimization |
|
177 |
+
| 135 | fine tune - tune - ner - fine - emotions | 14 | Fine-tune Sentence Embeddings |
|
178 |
+
| 136 | model using - using model - examples - question lora - models used | 14 | Inkpunk Diffusion model |
|
179 |
+
| 137 | error running - running - running example - usage code - code | 14 | Error running example code |
|
180 |
+
| 138 | difference - alpaca - model difference - original model - difference model | 14 | Model Differences |
|
181 |
+
| 139 | install - locally - know install - run local - mini | 14 | "How to install locally" |
|
182 |
+
| 140 | training script - script - script training - sharing training - midi | 13 | Training Script |
|
183 |
+
| 141 | model file - missing model - corrupt - file model - file missing | 13 | Model File Issues |
|
184 |
+
| 142 | error help - help error - help - solve - try | 13 | Error Help |
|
185 |
+
| 143 | hardware - hardware requirements - requirements - gpu inference - requirements fine | 13 | Hardware Requirements for Inference |
|
186 |
+
| 144 | update - updated - channel - expired - new update | 13 | update query status |
|
187 |
+
| 145 | negative - negative prompt - negative prompts - prompts - prompt | 13 | "Negative Prompt Function" |
|
188 |
+
| 146 | unable run - unable - run unable - run - human | 13 | Unable to run on local machine |
|
189 |
+
| 147 | injection - nmkd gui - nmkd - tutorial videos - gui | 12 | Stable Diffusion Tutorial Videos |
|
190 |
+
| 148 | download download - download - request acces - know download - fim | 12 | "Download Instructions" |
|
191 |
+
| 149 | transformers - sentence transformers - huggingface transformers - different results - usage | 12 | Transformer Usage Discrepancy |
|
192 |
+
| 150 | link - broken link - broken - documentation - expired | 11 | Broken links and documentation |
|
193 |
+
| 151 | broke - padding - dead - kenlm - dropout | 11 | "Dead KenLM Finetuning" |
|
194 |
+
| 152 | training question - question training - training process - question regarding - question | 11 | Training Process Question |
|
195 |
+
| 153 | dataset training - training data - training dataset - data training - custom dataset | 11 | Training Data Quality |
|
196 |
+
| 154 | download - download download - possible download - hd 18 - hd | 11 | Troubleshooting download errors |
|
197 |
+
|
198 |
+
</details>
|
199 |
+
|
200 |
+
## Training hyperparameters
|
201 |
+
|
202 |
+
* calculate_probabilities: False
|
203 |
+
* language: None
|
204 |
+
* low_memory: False
|
205 |
+
* min_topic_size: 10
|
206 |
+
* n_gram_range: (1, 1)
|
207 |
+
* nr_topics: None
|
208 |
+
* seed_topic_list: None
|
209 |
+
* top_n_words: 10
|
210 |
+
* verbose: True
|
211 |
+
|
212 |
+
## Framework versions
|
213 |
+
|
214 |
+
* Numpy: 1.22.4
|
215 |
+
* HDBSCAN: 0.8.33
|
216 |
+
* UMAP: 0.5.3
|
217 |
+
* Pandas: 1.5.3
|
218 |
+
* Scikit-Learn: 1.2.2
|
219 |
+
* Sentence-transformers: 2.2.2
|
220 |
+
* Transformers: 4.31.0
|
221 |
+
* Numba: 0.56.4
|
222 |
+
* Plotly: 5.13.1
|
223 |
+
* Python: 3.10.6
|
config.json
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"calculate_probabilities": false,
|
3 |
+
"language": null,
|
4 |
+
"low_memory": false,
|
5 |
+
"min_topic_size": 10,
|
6 |
+
"n_gram_range": [
|
7 |
+
1,
|
8 |
+
1
|
9 |
+
],
|
10 |
+
"nr_topics": null,
|
11 |
+
"seed_topic_list": null,
|
12 |
+
"top_n_words": 10,
|
13 |
+
"verbose": true
|
14 |
+
}
|
topic_embeddings.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6801b74dc91982ca99e81484f07c1f22c5333819ad2e961edc5b935a0cf4cd03
|
3 |
+
size 319576
|
topics.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|