π Excited to announce the release of our new research paper, "LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment"! In this work, we introduce LLAVAGUARD, a family of cutting-edge Vision-Language Model (VLM) judges designed to enhance the safety and integrity of vision datasets and generative models. Our approach leverages flexible policies for assessing safety in diverse settings. This context awareness ensures robust data curation and model safeguarding alongside comprehensive safety assessments, setting a new standard for vision datasets and models. We provide three versions (7B, 13B, and 34B) and our data, see below. This achievement wouldn't have been possible without the incredible teamwork and dedication of my great colleagues @LukasHug , @PSaiml , @mbrack . π Together, we've pushed the boundaries of whatβs possible at the intersection of large generative models and safety. π Dive into our paper to explore: Innovative methodologies for dataset curation and model safeguarding. State-of-the-art safety assessments. Practical implications for AI development and deployment. Find more at AIML-TUDA/llavaguard-665b42e89803408ee8ec1086 and https://ml-research.github.io/human-centered-genai/projects/llavaguard/index.html
π₯ What's New: - Polars integration π»ββοΈ - fsspec support for conversion to JSON, CSV, and Parquet - Mode parameter for Image feature - CLI function to convert script-datasets to Parquet - Dataset.take and Dataset.skip
Plus, a bunch of general improvements & bug fixes!
Yesterday, Mistral released their latest base model (via magnet link of course π ) and the community quickly converted it to transformers format and pushed it to the Hub: mistral-community/Mixtral-8x22B-v0.1
Early evals of this model looked extremely strong, so we teamed up with Argilla and KAIST AI to cook up a Zephyr recipe with a few new alignment techniques that came out recently:
π§βπ³ Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm developed by @JW17 and @nlee-208 and @j6mes and does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO.
𦫠Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at Argilla. To create this dataset, they took the excellent Capybara SFT dataset from @LDJnrLDJnr/Capybara and converted it into a preference dataset by augmenting the final turn with responses from new LLMs that were then ranked by GPT-4.
What we find especially neat about this approach is that training on 7k samples only takes ~1.3h on 4 H100 nodes, yet produces a model that is very strong on chat benchmarks like IFEval and BBH.