Meta AI vision has been cooking @facebook They shipped multiple models and demos for their papers at @ECCVπ€
Here's a compilation of my top picks: - Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos π
All models have their demos and even torchscript checkpoints! A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc - VFusion3D is state-of-the-art consistent 3D generation model from images
- Slower response times: o1 can take over 10+ seconds to answer some questions, as it spends more time "thinking" through problems. In my case, it took over 50 seconds.
- Less likely to admit ignorance: The models are reported to be less likely to admit when they don't know the answer to a question.
- Higher pricing: o1-preview is significantly more expensive than GPT-4o, costing 3x more for input tokens and 4x more for output tokens in the API. With more thinking and more tokens, this could require houses to be mortgaged!
- Do we need this?: While it's better than GPT-4o for complex reasoning, on many common business tasks, its performance is just equivalent.
- Not a big deal: No comparisons to Anthropic or Google DeepMind Gemini are mentioned or included.
- This model tries to think and iterate over the response on its own! Think of it as an inbuilt CoT on steroids! Would love a technical review paper on the training process.
Very Insightful Read!!! A RAG framework entirely inspired by natural intelligence - modeled after hippocampal indexing theory of human long-term memory(which suggests the hippocampus links and retrieves memory details stored in the cortex)
It outperforms current βcheatβ RAG:) This is how we achieve human-level intelligence, by modeling natural intelligence correctly!