Hi HuggingFacers!๐ค December is here and time has come, for most of us, to wrap up our code projects and take stock of our 2024 contributions๐๏ธ In order to do this, I made a small Gradio application, what-a-git-year:
that scrapes information from your GitHub profile and summarizes them, producing also nice plots๐ Find also the GitHub repo here: https://github.com/AstraBert/what-a-git-year โญ
INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.
๐ผ๏ธ Multimodal > At Hugging Face we released SmolVLM, a performant and efficient smol vision language model ๐ > Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents ๐ค > Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length > ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM > Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning model๐ > Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT ๐
๐ฌ LLMs > Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet ๐ฅ > AliBaba has released Marco-o1, a new open-source reasoning model ๐ฅ > NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer)
โฏ๏ธ Image/Video Generation > Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation > Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res โฏ๏ธ > Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla ๐ท๏ธ
Audio > OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
Excited to share @LinkedIn 's innovative approach to evaluating semantic search quality! As part of the Search AI team, we've developed a groundbreaking evaluation pipeline that revolutionizes how we measure search relevance.
>> Key Innovation: On-Topic Rate (OTR) This novel metric measures the semantic match between queries and search results, going beyond simple keyword matching. The system evaluates whether content is truly relevant to the query's intent, not just matching surface-level terms.
>> Technical Implementation Details Query Set Construction โข Golden Set: Contains curated top queries and complex topical queries โข Open Set: Includes trending queries and random production queries for diversity
Evaluation Pipeline Architecture 1. Query Processing: - Retrieves top 10 documents per query - Extracts post text and article information - Processes both primary content and reshared materials
2. GAI Integration: - Leverages GPT-3.5 with specialized prompts - Produces three key outputs: - Binary relevance decision - Relevance score (0-1 range) - Decision reasoning
Quality Assurance โข Validation achieved 94.5% accuracy on a test set of 600 query-post pairs โข Human evaluation showed 81.72% consistency with expert annotators
>> Business Impact This system now serves as LinkedIn's benchmark for content search experiments, enabling: โข Weekly performance monitoring โข Rapid offline testing of new ML models โข Systematic identification of improvement opportunities
What are your thoughts on semantic search evaluation?
reacted to cfahlgren1's
post with ๐ฅ๐1 day ago
It's 2025, you shouldn't be hand writing SQL! This is a big step in making it where anyone can do in depth analysis on a dataset. Let us know what you think ๐ค
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):
- There will be the first major public protest related to AI - A big company will see its market cap divided by two or more because of AI - At least 100,000 personal AI robots will be pre-ordered - China will start to lead the AI race (as a consequence of leading the open-source AI race). - There will be big breakthroughs in AI for biology and chemistry. - We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.
How my predictions for 2024 turned out:
- A hyped AI company will go bankrupt or get acquired for a ridiculously low price โ (Inflexion, AdeptAI,...)
- Open-source LLMs will reach the level of the best closed-source LLMs โ with QwQ and dozens of others
- Big breakthroughs in AI for video, time-series, biology and chemistry โ for video ๐ดfor time-series, biology and chemistry
- We will talk much more about the cost (monetary and environmental) of AI โ Monetary ๐ดEnvironmental (๐ข)
- A popular media will be mostly AI-generated โ with NotebookLM by Google
- 10 millions AI builders on Hugging Face leading to no increase of unemployment ๐currently 7M of AI builders on Hugging Face
2 replies
ยท
reacted to merve's
post with ๐ฅ๐๐1 day ago
small but mighty ๐ฅ you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM ๐ซฐ๐ป also with gradient accumulation simulated batch size is 16 โจ I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work ๐ https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
reacted to brunatrevelin's
post with ๐ค๐ฅโค๏ธ1 day ago
Drag and drop your assets (images/videos/audios) to create any video you want using natural language!
It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights ๐.
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! ๐คฏ - Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! ๐ - SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU! - SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!
Would you like to get a high-quality dataset to pre-train LLMs in your language? ๐
At Hugging Face we're preparing a collaborative annotation effort to build an open-source multilingual dataset as part of the Data is Better Together initiative.
Follow the link below, check if your language is listed and sign up to be a Language Lead!