Omar Sanseviero's picture

Omar Sanseviero

osanseviero

·

https://osanseviero.github.io/hackerllama/

AI & ML interests

Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.🦙

Recent Activity

liked a Space 3 days ago

Qwen/Qwen2.5-Turbo-1M-Demo

liked a model 3 days ago

mistralai/Pixtral-Large-Instruct-2411

reacted to Xenova's post with 🔥 3 days ago

Articles

Llama can now see and run on your device - welcome Llama 3.2

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

WWDC 24: Running Mistral 7B with Core ML

How we leveraged distilabel to create an Argilla 2.0 Chatbot

Welcome Gemma 2 - Google's new open LLM

Welcome Llama 3 - Meta's new open LLM

CodeGemma - an official Google release for code LLMs

🪆 Introduction to Matryoshka Embedding Models

Welcome Gemma - Google's new open LLM

Constitutional AI with Open LLMs

Preference Tuning LLMs with Direct Preference Optimization Methods

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Inference for PROs

Spread Your Wings: Falcon 180B is here

Code Llama: Llama 2 learns to code

Results of the Open Source AI Game Jam

Llama 2 is here - get it on Hugging Face

The Falcon has landed in the Hugging Face ecosystem

Hugging Face Machine Learning Demos on arXiv

What's new in Diffusers? 🎨

Announcing Evaluation on the Hub

An Introduction to Deep Reinforcement Learning

Welcome spaCy to the 🤗 Hub

Sentence Transformers in the 🤗 Hub

Organizations

osanseviero's activity

upvoted 2 papers 4 days ago

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published 14 days ago • 48

Scaling Laws for Precision

Paper • 2411.04330 • Published 15 days ago • 6

upvoted a paper 7 days ago

MusicLM: Generating Music From Text

Paper • 2301.11325 • Published Jan 26, 2023 • 2

upvoted a collection 7 days ago

LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 7 items • Updated 2 days ago • 35

upvoted a paper 15 days ago

How Far is Video Generation from World Model: A Physical Law Perspective

Paper • 2411.02385 • Published 17 days ago • 32

upvoted 2 papers 17 days ago

Stealing User Prompts from Mixture of Experts

Paper • 2410.22884 • Published 22 days ago • 13

Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets

Paper • 2410.01779 • Published Oct 2 • 1

upvoted a collection 17 days ago

OS-Atlas

OS-Atlas series models • 7 items • Updated 3 days ago • 12

upvoted a paper 18 days ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published 21 days ago • 20

upvoted a collection 19 days ago

AMD-OLMo

AMD-OLMo are a series of 1 billion parameter language models trained by AMD on AMD Instinct™ MI250 GPUs based on OLMo. • 4 items • Updated 21 days ago • 16

upvoted 2 collections 21 days ago

Sparsh

Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing • 15 items • Updated 29 days ago • 11

MobileLLM

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 8 items • Updated 15 days ago • 95

upvoted 4 papers 22 days ago

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Paper • 2410.20424 • Published 25 days ago • 37

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

Paper • 2410.09918 • Published Oct 13 • 3

SurCo: Learning Linear Surrogates For Combinatorial Nonlinear Optimization Problems

Paper • 2210.12547 • Published Oct 22, 2022 • 1

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21 • 47

upvoted a collection 22 days ago

LongVU

7 items • Updated 21 days ago • 26

upvoted 2 papers 24 days ago

WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

Paper • 2407.05291 • Published Jul 7 • 1

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Paper • 2403.07718 • Published Mar 12 • 1

upvoted a collection 30 days ago

steiner-preview

Reasoning models trained on synthetic data using reinforcement learning. • 3 items • Updated Oct 20 • 23