Pretrained models from the paper "Predicting the Order of Upcoming Tokens Improves Language Modeling"
Zayd Muhammad Kawakibi Zuhri PRO
zaydzuhri
AI & ML interests
I really like watching loss go down
Recent Activity
updated
a model
3 days ago
zaydzuhri/dsmtp-7B-4096-model
published
a model
3 days ago
zaydzuhri/dsmtp-7B-4096-model
Organizations
None yet
models
85
zaydzuhri/dsmtp-7B-4096-model
7B
•
Updated
•
273
zaydzuhri/dsmtp-7B-4096-batch8x2-steps200000-20251016-114748
Updated
zaydzuhri/mtp-math-1.8B-4096-model
2B
•
Updated
•
46
zaydzuhri/mtp-math-1B-4096-batch16x1-steps40000-20251004-223317
Updated
zaydzuhri/top-math-1.8B-4096-model
2B
•
Updated
•
50
zaydzuhri/top-math-1B-4096-batch16x1-steps40000-20251003-211951
Updated
zaydzuhri/vanilla-math-1B-4096-batch16x1-steps40000-20251003-204935
Updated
zaydzuhri/dsmtp-7B-4096-batch8x2-steps200000-20250909-122923
Updated
zaydzuhri/vanilla-math-1.8B-4096-model
2B
•
Updated
•
32
zaydzuhri/vanilla-math-1B-4096-batch16x1-steps40000-20250915-145922
Updated
datasets
9
zaydzuhri/OpenMathInstruct-2-Text
Viewer
•
Updated
•
22M
•
119
zaydzuhri/stack-edu-python
Viewer
•
Updated
•
25.3M
•
20
zaydzuhri/stack-edu
Viewer
•
Updated
•
99.5M
•
168
zaydzuhri/kreyol-mt-cleaned
Viewer
•
Updated
•
903k
•
134
zaydzuhri/the_pile_tokenized_5percent_truncated_packed_v2
Viewer
•
Updated
•
2.46M
•
55
zaydzuhri/the_pile_tokenized_5percent_truncated_packed
Viewer
•
Updated
•
2.11M
•
128
zaydzuhri/the_pile_tokenized_5percent_truncated
Viewer
•
Updated
•
6M
•
137
zaydzuhri/the_pile_tokenized_5percent
Viewer
•
Updated
•
6M
•
63
zaydzuhri/the_pile_tokenized_6k
Viewer
•
Updated
•
6k
•
4