Makar Vlasov's picture

7 12

Makar Vlasov

Makar7

·

AI & ML interests

None yet

Recent Activity

reacted to thomwolf's post with 🔥 about 17 hours ago

We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1. And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder (https://huggingface.co/open-r1/OlympicCoder-7B and https://huggingface.co/open-r1/OlympicCoder-32B) It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters! And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3 Datasets are are releasing: - https://huggingface.co/datasets/open-r1/codeforces - https://huggingface.co/datasets/open-r1/codeforces-cots - https://huggingface.co/datasets/open-r1/ioi - https://huggingface.co/datasets/open-r1/ioi-test-cases - https://huggingface.co/datasets/open-r1/ioi-sample-solutions - https://huggingface.co/datasets/open-r1/ioi-cots - https://huggingface.co/datasets/open-r1/ioi-2024-model-solutions

reacted to clefourrier's post with 🚀 1 day ago

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov. Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**. (Which everybody does, but people usually don't say) For a tech report, it makes a lot of sense to report model performance when used optimally! On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models) Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation! Because if your model knows its evals by heart, you're not testing for generalization.

reacted to Bils's post with 😎 5 days ago

Spatial sound experience! SonicOrbit features AI beat detection to auto-sync your rhythm. https://huggingface.co/spaces/Bils/SonicOrbit

View all activity

Organizations

None yet

models

None public yet

datasets

None public yet