Last Week in Medical AI: Top Research Papers/Models ๐ฅ ๐ (December 7 โ December 14, 2024)
Medical LLM & Other Models - PediaBench: Chinese Pediatric LLM - Comprehensive pediatric dataset - Advanced benchmarking platform - Chinese healthcare innovation - BiMediX: Bilingual Medical LLM - Multilingual medical expertise - Diverse medical knowledge integration - Cross-cultural healthcare insights - MMedPO: Vision-Language Medical LLM - Clinical multimodal optimization - Advanced medical image understanding - Precision healthcare modeling
Frameworks and Methodologies - TOP-Training: Medical Q&A Framework - Hybrid RAG: Secure Medical Data Management - Zero-Shot ATC Clinical Coding - Chest X-Ray Diagnosis Architecture - Medical Imaging AI Democratization
Benchmarks & Evaluations - KorMedMCQA: Korean Healthcare Licensing Benchmark - Large Language Model Medical Tasks - Clinical T5 Model Performance Study - Radiology Report Quality Assessment - Genomic Analysis Benchmarking
Medical LLM Applications - BRAD: Digital Biology Language Model - TCM-FTP: Herbal Prescription Prediction - LLaSA: Activity Analysis via Sensors - Emergency Department Visit Predictions - Neurodegenerative Disease AI Diagnosis - Kidney Disease Explainable AI Model
Ethical AI & Privacy - Privacy-Preserving LLM Mechanisms - AI-Driven Digital Organism Modeling - Biomedical Research Automation - Multimodality in Medical Practice
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute ๐ฅ
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
๐ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
๐ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
๐งญ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM