Li Tan's picture

38 1 8

Li Tan PRO

tanliboy

·

https://github.com/tanliboy

AI & ML interests

None yet

Recent Activity

New activity 21 days ago

rombodawg/Rombos-LLM-V2.5-Qwen-72b:what is your "continuous finetuning"

New activity 21 days ago

google/gemma-2-9b-it:Batch Inference causes degraded performance

updated a model about 2 months ago

tanliboy/qwen2.5-7b-sft

View all activity

Organizations

tanliboy's activity

New activity in rombodawg/Rombos-LLM-V2.5-Qwen-72b 21 days ago

what is your "continuous finetuning"

#2 opened about 2 months ago by

New activity in google/gemma-2-9b-it 21 days ago

Batch Inference causes degraded performance

#43 opened 3 months ago by

New activity in Qwen/Qwen2.5-7B-Instruct about 2 months ago

Scorecard on popular benchmarks

#2 opened 2 months ago by

New activity in ContextualAI/ultrafeedback_clair_32k 2 months ago

Phi-2-Instruct-APO: aligned with Anchored Preference Optimization

#3 opened 2 months ago by

New activity in Qwen/Qwen2.5-Math-RM-72B 2 months ago

Preference Alignment

#6 opened 2 months ago by

New activity in meta-llama/Llama-3.1-8B 2 months ago

Text Classification with LLMs

#30 opened 4 months ago by

New activity in NousResearch/Hermes-3-Llama-3.1-8B 2 months ago

IFEVAL drop

#16 opened 2 months ago by

New activity in Alibaba-NLP/gte-Qwen2-7B-instruct 2 months ago

bfloat16 vs. float32

#34 opened 2 months ago by

New activity in Alibaba-NLP/gte-Qwen2-1.5B-instruct 2 months ago

Qwen 2.5 1.5B retrain?

#12 opened 2 months ago by

New activity in meta-llama/Llama-3.1-8B-Instruct 2 months ago

GSM8K Evaluation Result: 84.5 vs. 76.95

#81 opened 4 months ago by

New activity in Qwen/Qwen2-VL-7B-Instruct 2 months ago

Finetuning script using HuggingFace (No llama-factory)

#32 opened 3 months ago by

2U1

New activity in meta-llama/Llama-3.1-8B-Instruct 3 months ago

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.

#120 opened 3 months ago by

New activity in Qwen/Qwen2-VL-7B-Instruct 3 months ago

Have you deleted your GitHub page?

#10 opened 3 months ago by

New activity in google/gemma-2-9b-it 3 months ago

Sliding window vs. Global Attention

#41 opened 3 months ago by

New activity in google/gemma-2-2b 3 months ago

Gemma2-2b training uses much more momory!

#23 opened 3 months ago by

New activity in google/gemma-2b 3 months ago

GemmaSdpaAttention vs GemmaAttention

#71 opened 3 months ago by

New activity in meta-llama/Llama-3.1-70B-Instruct 3 months ago

Fix Llama 3.1 Chat Template to Properly Handle add_generation_prompt

#26 opened 3 months ago by

New activity in Qwen/Qwen2-VL-7B-Instruct 3 months ago

🍭 Fine-tuning support for Qwen2-VL-7B-Instruct

#1 opened 3 months ago by

New activity in google/recurrentgemma-9b-it 3 months ago

Evaluation Result

#15 opened 3 months ago by

New activity in meta-llama/Llama-3.1-8B-evals 3 months ago

How is this dataset supposed to be used to evaluate the model?

#1 opened 3 months ago by

realdanielbyrne