6 5 8

Shizhe Diao

shizhediao

https://shizhediao.github.io/

AI & ML interests

None yet

Recent Activity

authored a paper about 10 hours ago

Hymba: A Hybrid-head Architecture for Small Language Models

authored a paper about 1 month ago

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

View all activity

Organizations

shizhediao's activity

authored a paper about 10 hours ago

Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published 2 days ago • 16

authored a paper about 1 month ago

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

Paper • 2410.03290 • Published Oct 4 • 6

updated a dataset 2 months ago

Post-training-Data-Flywheel/function-calling-1.0

Updated Sep 20 • 31

updated a collection 3 months ago

flywheel

Collection

2 items • Updated Aug 29

updated a Space 3 months ago

Running

📊

README

upvoted a paper 3 months ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 55

updated a model 4 months ago

shizhediao/hf-lora

Updated Aug 4

liked a Space 4 months ago

Configuration error

🏃

Berkeley Function Calling Leaderboard

liked a model 4 months ago

nvidia/Minitron-4B-Base

Updated Aug 22 • 18 • 127

upvoted a paper 4 months ago

Compact Language Models via Pruning and Knowledge Distillation

Paper • 2407.14679 • Published Jul 19 • 38

upvoted an article 4 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 265

liked a Space 4 months ago

Running

💻

Merging Competition

upvoted a paper 5 months ago

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Paper • 2407.03203 • Published Jul 3 • 10

New activity in shizhediao/lmflow-sft 5 months ago

Dataset Viewer issue: JobManagerCrashedError

#1 opened 5 months ago by

shizhediao

liked a dataset 5 months ago

sablo/oasst2_curated

Viewer • Updated Jan 12 • 4.94k • 52 • 14

authored 3 papers 5 months ago

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Paper • 2306.12420 • Published Jun 21, 2023 • 2

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Paper • 2304.06767 • Published Apr 13, 2023 • 2

DetGPT: Detect What You Need via Reasoning

Paper • 2305.14167 • Published May 23, 2023