Emin Temiz's picture

Emin Temiz PRO

etemiz

·

https://pickabrain.ai

AI & ML interests

Alignment

Recent Activity

posted an update 1 day ago

which one is better for alignment? ORPO or GSPO? I think ORPO is pretty good and fast but GSPO makes it attack its own opinions, reflecting on itself, correcting itself. Although GSPO is much slower, it may still be pretty effective. And for GSPO you don't have to provide the whole reasoning corpus, you just provide the end result (One word maybe to answer a binary question). And GSPO may be better than GRPO because it is rewarding 'train of thoughts' whereas GRPO is rewarding single tokens. Alignment is mostly train of thoughts, not a single token like a math answer..

replied to their post 6 days ago

how to expand your dataset (of articles) without changing the ideas in it? i was doing CPT for a while and got decent results. but what if i want to go for perfection? cover all the areas of misalignment using limited datasets. i have to find a way to multiply the material to successfully combat the material of the rest of the internet. i want to generate SFT datasets but only on controversial topics, because i have to be efficient with limited resources. first i give a smart LLM a 'ground truth' text. then i give it the following prompts: ``` - You are a highly skilled academic analyst. - Analyze this text and find 3 bold claims that could cause controversy and division in public. List the claims and also state why they are debatable. Give numbers to the claims. - Convert these claims into binary questions (that could be answered by yes/no or this/that). - Now put these questions in a json format. Please also add the info about which of the answers concur with the original text and the question number. - Write some supporting arguments for 1st question, with respect to the original text, concurring and confirming the original text. There must be about 300 words. You should not mention the text, write it as if you are the one answering the question. ``` the result is questions and answers with more words along the same ideas. a few sentences of opinions in the beginning, is expanded to lots of words. using this method i can multiply billions of tokens to tens of billions probably and have a more effective training. next i should do RL maybe. LLMs seem to have all kinds of ideas already installed, yet they don't have the intuition to know which one is true. they can give you a ton of reasons to support anything. given the proper incentives, LLMs then should evolve towards supporting aligned ideas more. the rewards will be like guidance that will kick an LLM towards better answers.

replied to their post 8 days ago

how to expand your dataset (of articles) without changing the ideas in it? i was doing CPT for a while and got decent results. but what if i want to go for perfection? cover all the areas of misalignment using limited datasets. i have to find a way to multiply the material to successfully combat the material of the rest of the internet. i want to generate SFT datasets but only on controversial topics, because i have to be efficient with limited resources. first i give a smart LLM a 'ground truth' text. then i give it the following prompts: ``` - You are a highly skilled academic analyst. - Analyze this text and find 3 bold claims that could cause controversy and division in public. List the claims and also state why they are debatable. Give numbers to the claims. - Convert these claims into binary questions (that could be answered by yes/no or this/that). - Now put these questions in a json format. Please also add the info about which of the answers concur with the original text and the question number. - Write some supporting arguments for 1st question, with respect to the original text, concurring and confirming the original text. There must be about 300 words. You should not mention the text, write it as if you are the one answering the question. ``` the result is questions and answers with more words along the same ideas. a few sentences of opinions in the beginning, is expanded to lots of words. using this method i can multiply billions of tokens to tens of billions probably and have a more effective training. next i should do RL maybe. LLMs seem to have all kinds of ideas already installed, yet they don't have the intuition to know which one is true. they can give you a ton of reasons to support anything. given the proper incentives, LLMs then should evolve towards supporting aligned ideas more. the rewards will be like guidance that will kick an LLM towards better answers.

View all activity

Organizations

None yet

New activity in IslamQA/islamqa about 2 months ago

Curation

#1 opened about 2 months ago by

New activity in mradermacher/Ostrich-32B-Qwen3-251003-GGUF 3 months ago

i1?

#1 opened 3 months ago by

New activity in some1nostr/Nostr-Llama-3.1-8B 5 months ago

Update README.md

#2 opened 5 months ago by

New activity in unsloth/Qwen3-32B-unsloth-bnb-4bit 7 months ago

Fine tune

#3 opened 7 months ago by

New activity in ysdede/whisper-khanacademy-large-v3-turbo-tr 9 months ago

How to?

#1 opened 9 months ago by

New activity in selimc/whisper-large-v3-turbo-turkish 9 months ago

How to

#3 opened 9 months ago by

New activity in elifsorguc/whisper-medium-tr 9 months ago

Turkish or all the languages

#1 opened 9 months ago by

New activity in etemiz/Llama-3.1-405B-Inst-GGUF over 1 year ago

Larger quants please

#1 opened over 1 year ago by

New activity in DontPlanToEnd/UGI-Leaderboard over 1 year ago

Llama3

#7 opened over 1 year ago by

New activity in hiyouga/Llama-2-70b-AQLM-2Bit-QLoRA-function-calling almost 2 years ago

just for curiosity

#1 opened almost 2 years ago by

New activity in LiteLLMs/Mixtral-8x22B-v0.1-GGUF almost 2 years ago

ram usage

#1 opened almost 2 years ago by

Tom-Neverwinter

New activity in hiyouga/Llama-2-70b-AQLM-2Bit-QLoRA-function-calling almost 2 years ago

just for curiosity

#1 opened almost 2 years ago by

just for curiosity

#1 opened almost 2 years ago by