GuillaumeSalouHF HF staff commited on
Commit
fe538cb
1 Parent(s): adf6e56

Upload Hugging Face models.ipynb

Browse files
Files changed (1) hide show
  1. Hugging Face models.ipynb +212 -0
Hugging Face models.ipynb ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "8edbb15c",
6
+ "metadata": {},
7
+ "source": [
8
+ "Author: Julien Simon <julsimon@huggingface.co>"
9
+ ]
10
+ },
11
+ {
12
+ "cell_type": "code",
13
+ "execution_count": null,
14
+ "id": "3f8ff20d",
15
+ "metadata": {},
16
+ "outputs": [],
17
+ "source": [
18
+ "import huggingface_hub\n",
19
+ "import pandas as pd"
20
+ ]
21
+ },
22
+ {
23
+ "cell_type": "markdown",
24
+ "id": "555233b4",
25
+ "metadata": {},
26
+ "source": [
27
+ "### Retrieve metadata on all public models"
28
+ ]
29
+ },
30
+ {
31
+ "cell_type": "code",
32
+ "execution_count": null,
33
+ "id": "87037e3a",
34
+ "metadata": {},
35
+ "outputs": [],
36
+ "source": [
37
+ "models = huggingface_hub.list_models(full=True)"
38
+ ]
39
+ },
40
+ {
41
+ "cell_type": "code",
42
+ "execution_count": null,
43
+ "id": "aadd5d5a",
44
+ "metadata": {},
45
+ "outputs": [],
46
+ "source": [
47
+ "models[0]"
48
+ ]
49
+ },
50
+ {
51
+ "cell_type": "code",
52
+ "execution_count": null,
53
+ "id": "9e0fe1db",
54
+ "metadata": {},
55
+ "outputs": [],
56
+ "source": [
57
+ "huggingface_hub.model_info('distilgpt2', securityStatus=True)"
58
+ ]
59
+ },
60
+ {
61
+ "cell_type": "code",
62
+ "execution_count": null,
63
+ "id": "a06997e7",
64
+ "metadata": {},
65
+ "outputs": [],
66
+ "source": [
67
+ "models_df = pd.DataFrame(columns=['model_name', 'task_type', 'downloads'])"
68
+ ]
69
+ },
70
+ {
71
+ "cell_type": "code",
72
+ "execution_count": null,
73
+ "id": "91225693",
74
+ "metadata": {},
75
+ "outputs": [],
76
+ "source": [
77
+ "for m in models:\n",
78
+ " if hasattr(m, 'downloads'):\n",
79
+ " downloads = m.downloads\n",
80
+ " else:\n",
81
+ " downloads = 0\n",
82
+ " m_df = pd.DataFrame({'model_name': [m.modelId],'task_type': [m.pipeline_tag], 'downloads': [downloads]})\n",
83
+ " models_df = models_df.append(m_df, ignore_index=True)"
84
+ ]
85
+ },
86
+ {
87
+ "cell_type": "code",
88
+ "execution_count": null,
89
+ "id": "eaa0b6e7",
90
+ "metadata": {},
91
+ "outputs": [],
92
+ "source": [
93
+ "models_df.head()"
94
+ ]
95
+ },
96
+ {
97
+ "cell_type": "markdown",
98
+ "id": "6a38785c",
99
+ "metadata": {},
100
+ "source": [
101
+ "### List tast types"
102
+ ]
103
+ },
104
+ {
105
+ "cell_type": "code",
106
+ "execution_count": null,
107
+ "id": "f690e417",
108
+ "metadata": {},
109
+ "outputs": [],
110
+ "source": [
111
+ "task_types = models_df['task_type'].unique()\n",
112
+ "print(task_types)\n",
113
+ "print(len(task_types))"
114
+ ]
115
+ },
116
+ {
117
+ "cell_type": "markdown",
118
+ "id": "865346cf",
119
+ "metadata": {},
120
+ "source": [
121
+ "### For each task type, find out the percentage of downloads that the top 'n' models represent"
122
+ ]
123
+ },
124
+ {
125
+ "cell_type": "code",
126
+ "execution_count": null,
127
+ "id": "b8edf413",
128
+ "metadata": {},
129
+ "outputs": [],
130
+ "source": [
131
+ "n = 20"
132
+ ]
133
+ },
134
+ {
135
+ "cell_type": "code",
136
+ "execution_count": null,
137
+ "id": "3bcbcc8e",
138
+ "metadata": {},
139
+ "outputs": [],
140
+ "source": [
141
+ "for t in task_types:\n",
142
+ " if t is None:\n",
143
+ " continue\n",
144
+ " task_models_df = models_df[models_df['task_type']==t]\n",
145
+ " topn_downloads = task_models_df[:n]['downloads'].sum()\n",
146
+ " all_downloads = task_models_df['downloads'].sum()\n",
147
+ " if all_downloads!=0:\n",
148
+ " print('{} ({} models): {:.1%}'.format(t, len(task_models_df), topn_downloads/all_downloads))"
149
+ ]
150
+ },
151
+ {
152
+ "cell_type": "markdown",
153
+ "id": "c44c3ef6",
154
+ "metadata": {},
155
+ "source": [
156
+ "### For each task type, list the repository of the top 'n' models"
157
+ ]
158
+ },
159
+ {
160
+ "cell_type": "code",
161
+ "execution_count": null,
162
+ "id": "d77e65fc",
163
+ "metadata": {},
164
+ "outputs": [],
165
+ "source": [
166
+ "BASE_URL = 'https://huggingface.co'\n",
167
+ "\n",
168
+ "for t in task_types:\n",
169
+ " if t is None:\n",
170
+ " continue\n",
171
+ " task_models_df = models_df[models_df['task_type']==t]\n",
172
+ " topn_models = task_models_df[:n]['downloads']\n",
173
+ " print('[{}]'.format(t))\n",
174
+ " if len(task_models_df) < n:\n",
175
+ " indexes = range(len(task_models_df))\n",
176
+ " else:\n",
177
+ " indexes = range(n)\n",
178
+ " for i in indexes:\n",
179
+ " print('{}/{}'.format(BASE_URL, task_models_df.iloc[i]['model_name']))"
180
+ ]
181
+ },
182
+ {
183
+ "cell_type": "code",
184
+ "execution_count": null,
185
+ "id": "f893cd03",
186
+ "metadata": {},
187
+ "outputs": [],
188
+ "source": []
189
+ }
190
+ ],
191
+ "metadata": {
192
+ "kernelspec": {
193
+ "display_name": "Python 3 (ipykernel)",
194
+ "language": "python",
195
+ "name": "python3"
196
+ },
197
+ "language_info": {
198
+ "codemirror_mode": {
199
+ "name": "ipython",
200
+ "version": 3
201
+ },
202
+ "file_extension": ".py",
203
+ "mimetype": "text/x-python",
204
+ "name": "python",
205
+ "nbconvert_exporter": "python",
206
+ "pygments_lexer": "ipython3",
207
+ "version": "3.9.7"
208
+ }
209
+ },
210
+ "nbformat": 4,
211
+ "nbformat_minor": 5
212
+ }