15 70 350

alkinun

AtAndDev

AI & ML interests

LLMs, Alignment, Merging, Unsloth, DPO, SFT, ORPO, SPIN..

Recent Activity

reacted to nicolay-r's post with 🔥 about 4 hours ago

📢 With the recent release of Gemma-3, If you interested to play with textual chain-of-though, the notebook below is a wrapper over the the model (native transformers inference API) for passing the predefined schema of promps in batching mode. https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_gemma_3.ipynb Limitation: schema supports texts only (for now), while gemma-3 is a text+image to text. Model: https://huggingface.co/google/gemma-3-1b-it Provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_gemma3.py

liked a Space about 4 hours ago

huggingface-projects/gemma-3-12b-it

liked a model about 5 hours ago

unsloth/gemma-3-4b-it-unsloth-bnb-4bit

View all activity

Organizations

AtAndDev's activity

reacted to nicolay-r's post with 🔥 about 4 hours ago

Post

896

📢 With the recent release of Gemma-3, If you interested to play with textual chain-of-though, the notebook below is a wrapper over the the model (native transformers inference API) for passing the predefined schema of promps in batching mode.
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_gemma_3.ipynb

Limitation: schema supports texts only (for now), while gemma-3 is a text+image to text.

Model: google/gemma-3-1b-it
Provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_gemma3.py

1 reply

liked a Space about 4 hours ago

Gemma 3 12b It

🔥

Generate text by uploading images or videos

liked 2 models about 5 hours ago

unsloth/gemma-3-4b-it-unsloth-bnb-4bit

Image-Text-to-Text • Updated 1 day ago • 1.08k • 2

unsloth/gemma-3-27b-it-bnb-4bit

Image-Text-to-Text • Updated 1 day ago • 9.49k • 9

upvoted a paper about 6 hours ago

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Paper • 2503.09669 • Published 2 days ago • 28

reacted to burtenshaw's post with 🔥👍 about 6 hours ago

Post

876

Still speed running Gemma 3 to think. Today I focused on setting up gpu poor hardware to run GRPO.

This is a plain TRL and PEFT notebook which works on mac silicone or colab T4. This uses the 1b variant of Gemma 3 and a reasoning version of GSM8K dataset.

🧑‍🍳 There’s more still in the oven like releasing models, an Unsloth version, and deeper tutorials, but hopefully this should bootstrap your projects.

Here’s a link to the 1b notebook: https://colab.research.google.com/drive/1mwCy5GQb9xJFSuwt2L_We3eKkVbx2qSt?usp=sharing

reacted to prithivMLmods's post with 🤗🔥 about 6 hours ago

Post

2350

Gemma-3-4B : Image and Video Inference 🖼️🎥

🧤Space: prithivMLmods/Gemma-3-Multimodal

@gemma3 : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}

+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct

Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf

1 reply

reacted to pidou's post with 🤝👍😎 about 6 hours ago

Post

1462

testing post

updated a dataset 1 day ago

AtAndDev/ranky-dataset

Viewer • Updated 1 day ago • 399 • 9

published a dataset 1 day ago

AtAndDev/ranky-dataset

Viewer • Updated 1 day ago • 399 • 9

reacted to thomwolf's post with 🚀🔥 1 day ago

Post

2018

We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions

reacted to clefourrier's post with 👍🚀 1 day ago

Post

1537

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

reacted to eliebak's post with 🔥 2 days ago

Post

1348

Google just dropped an exciting technical report for the brand-new Gemma3 model! 🚀 Here are my personal notes highlighting the most intriguing architectural innovations, design choices, and insights from this release:

1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!

2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension

3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?

4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2

1 reply

reacted to jasoncorkill's post with 🔥 2 days ago

Post

2093

Benchmarking Google's Veo2: How Does It Compare?

The results did not meet expectations. Veo2 struggled with style consistency and temporal coherence, falling behind competitors like Runway, Pika, Tencent, and even Alibaba. While the model shows promise, its alignment and quality are not yet there.

Google recently launched Veo2, its latest text-to-video model, through select partners like fal.ai. As part of our ongoing evaluation of state-of-the-art generative video models, we rigorously benchmarked Veo2 against industry leaders.

We generated a large set of Veo2 videos spending hundreds of dollars in the process and systematically evaluated them using our Python-based API for human and automated labeling.

Check out the ranking here: https://www.rapidata.ai/leaderboard/video-models

Rapidata/text-2-video-human-preferences-veo2