YuJuLin's picture

19 5

YuJuLin

nps798

·

nps798

AI & ML interests

None yet

Recent Activity

New activity about 2 months ago

stepfun-ai/GOT-OCR2_0

Organizations

None yet

nps798's activity

New activity in stepfun-ai/GOT-OCR2_0 about 2 months ago

is it available to change output text format to Markdown?

#21 opened about 2 months ago by

New activity in iiiorg/piiranha-v1-detect-personal-information 2 months ago

It also works with (Taiwan's) Traditinal Chainese.

#3 opened 2 months ago by

New activity in K024/mt5-zh-ja-en-trimmed 6 months ago

Brilliant !

#3 opened 6 months ago by

New activity in ccrains/larson-gemma-2b-chinese-v0.1 8 months ago

Looking into training detail

#1 opened 8 months ago by

New activity in BAAI/bge-large-en 9 months ago

Why do you add a normalize layer in the end of the model? Does it affect how we fine tune results?

#13 opened 9 months ago by

New activity in yuuko-eth/Rain-2x7B-MoE-32k-v0.1 10 months ago

How do you build your own MoE model?

#1 opened 10 months ago by

New activity in openai/whisper-medium 10 months ago

The same audio. Why are the results on Hugging Face different from the results of the model on GitHub?

#31 opened 11 months ago by

New activity in charent/Phi2-Chinese-0.2B 11 months ago

厲害！請教預訓練所需硬體資源以及訓練代碼原理

#2 opened 11 months ago by

New activity in stabilityai/japanese-stablelm-3b-4e1t-base 11 months ago

Code to continue pretrain

#1 opened 11 months ago by

New activity in mistralai/Mistral-7B-v0.1 about 1 year ago

Has a massive repetition problem

#29 opened about 1 year ago by

New activity in Vasanth/phi-1_5-finetuned-gsm8k about 1 year ago

finetuning

#1 opened about 1 year ago by

New activity in migtissera/SynthIA-7B-v1.3 about 1 year ago

Hi what did you train this model with, and what were hyperparams?

#1 opened about 1 year ago by

New activity in teknium/Mistral-Trismegistus-7B about 1 year ago

Excellent model ! Asking about training details

#3 opened about 1 year ago by

New activity in mistralai/Mistral-7B-v0.1 about 1 year ago

QLORA fine tuning with longer length of sequence (max_length=2048, padding=True) cause RuntimeError: CUDA error: device-side assert triggered; shorten length to 512 works !

#46 opened about 1 year ago by

New activity in mistralai/Mistral-7B-Instruct-v0.1 about 1 year ago

How is this model different from Llama 2-7B?

#8 opened about 1 year ago by

New activity in yentinglin/Taiwan-LLaMa-v1.0 over 1 year ago

Curious about how the model was trained to support Taiwan Chinese so well

#1 opened over 1 year ago by

New activity in TheBloke/Llama-2-70B-Chat-GGML over 1 year ago

using oobaboga to load model fail for 70b chat ggml Q2_k anad Q3_K_S

#2 opened over 1 year ago by

New activity in mosaicml/mpt-7b over 1 year ago

After installing triton, running pipe() return "fatal error: cuda.h: No such file or directory " and "CalledProcessError: Command '['/usr/bin/gcc'...."

#66 opened over 1 year ago by

After installing triton, running pipe() return "fatal error: cuda.h: No such file or directory " and "CalledProcessError: Command '['/usr/bin/gcc'...."

#66 opened over 1 year ago by