-
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 866 • 28 -
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 2 • 4 -
kenhktsui/code-natural-language-fasttext-classifier
Text Classification • Updated • 369 • 2 -
kenhktsui/math-fasttext-classifier
Text Classification • Updated • 11 • 2
Ken Tsui
kenhktsui
AI & ML interests
ML engineer, researcher
VLM, LLM benchmark
Opinions are my own
Recent Activity
upvoted
a
paper
2 days ago
Less is More: Recursive Reasoning with Tiny Networks
upvoted
a
paper
8 days ago
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action
Model
Organizations
textbook-quality-classifier
-
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 2 • 4 -
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 866 • 28 -
kenhktsui/llm-data-textbook-quality-classifier-v1
Text Classification • 0.3B • Updated • 1 • 9 -
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1
Text Classification • Updated • 8 • 4
nano-phi
Small Language Model Trained with Textbook Quality Data - How Far Can It Go?
FastText Model for Pretraining Data Curation
-
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 866 • 28 -
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 2 • 4 -
kenhktsui/code-natural-language-fasttext-classifier
Text Classification • Updated • 369 • 2 -
kenhktsui/math-fasttext-classifier
Text Classification • Updated • 11 • 2
LongTalk
A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training
textbook-quality-classifier
-
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 2 • 4 -
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 866 • 28 -
kenhktsui/llm-data-textbook-quality-classifier-v1
Text Classification • 0.3B • Updated • 1 • 9 -
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1
Text Classification • Updated • 8 • 4
CoT
nano-phi
Small Language Model Trained with Textbook Quality Data - How Far Can It Go?
VLM Data