Angel Camilo Guillen Guzman

acamilogg88
ยท

AI & ML interests

Enhanced AI software development

Recent Activity

Reacted to m-ric's post with ๐Ÿ”ฅ about 2 months ago
Emu3: Next-token prediction conquers multimodal tasks ๐Ÿ”ฅ This is the most important research in months: weโ€™re now very close to having a single architecture to handle all modalities. The folks at Beijing Academy of Artificial Intelligence (BAAI) just released Emu3, a single model that handles text, images, and videos all at once. ๐—ช๐—ต๐—ฎ๐˜'๐˜€ ๐˜๐—ต๐—ฒ ๐—ฏ๐—ถ๐—ด ๐—ฑ๐—ฒ๐—ฎ๐—น? ๐ŸŒŸ Emu3 is the first model to truly unify all these different types of data (text, images, video) using just one simple trick: predicting the next token. And itโ€™s only 8B, but really strong: ๐Ÿ–ผ๏ธ For image generation, it's matching the best specialized models out there, like SDXL. ๐Ÿ‘๏ธ In vision tasks, it's outperforming top models like LLaVA-1.6-7B, which is a big deal for a model that wasn't specifically designed for this. ๐ŸŽฌ It's the first to nail video generation without using complicated diffusion techniques. ๐—›๐—ผ๐˜„ ๐—ฑ๐—ผ๐—ฒ๐˜€ ๐—ถ๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ? ๐Ÿงฉ Emu3 uses a special tokenizer (SBER-MoVQGAN) to turn images and video clips into sequences of 4,096 tokens. ๐Ÿ”— Then, it treats everything - text, images, and videos - as one long series of tokens to predict. ๐Ÿ”ฎ During training, it just tries to guess the next token, whether that's a word, part of an image, or a video frame. ๐—–๐—ฎ๐˜ƒ๐—ฒ๐—ฎ๐˜๐˜€ ๐—ผ๐—ป ๐˜๐—ต๐—ฒ ๐—ฟ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€: ๐Ÿ‘‰ In image generation, Emu3 beats SDXL, but itโ€™s also much bigger (8B vs 3.5B). It would be more difficult to beat the real diffusion GOAT FLUX-dev. ๐Ÿ‘‰ In vision, authors also donโ€™t show a comparison against all the current SOTA models like Qwen-VL or Pixtral. This approach is exciting because it's simple (next token prediction) and scalable(handles all sorts of data)! Read the paper ๐Ÿ‘‰ https://huggingface.co/papers/2409.18869
Reacted to singhsidhukuldeep's post with ๐Ÿ”ฅ about 2 months ago
Good folks at @PyTorch have just released torchao, a game-changing library for native architecture optimization. -- How torchao Works (They threw the kitchen-sink at it...) torchao leverages several advanced techniques to optimize PyTorch models, making them faster and more memory-efficient. Here's an overview of its key mechanisms: Quantization torchao employs various quantization methods to reduce model size and accelerate inference: โ€ข Weight-only quantization: Converts model weights to lower precision formats like int4 or int8, significantly reducing memory usage. โ€ข Dynamic activation quantization: Quantizes activations on-the-fly during inference, balancing performance and accuracy. โ€ข Automatic quantization: The `autoquant` function intelligently selects the best quantization strategy for each layer in a model. Low-bit Datatypes The library utilizes low-precision datatypes to speed up computations: โ€ข float8: Enables float8 training for linear layers, offering substantial speedups for large models like LLaMA 3 70B. โ€ข int4 and int8: Provide options for extreme compression of weights and activations. Sparsity Techniques torchao implements sparsity methods to reduce model density: โ€ข Semi-sparse weights: Combine quantization with sparsity for compute-bound models. KV Cache Optimization For transformer-based models, torchao offers KV cache quantization, leading to significant VRAM reductions for long context lengths. Integration with PyTorch Ecosystem torchao seamlessly integrates with existing PyTorch tools: โ€ข Compatible with `torch.compile()` for additional performance gains. โ€ข Works with FSDP2 for distributed training scenarios. โ€ข Supports most PyTorch models available on Hugging Face out-of-the-box. By combining these techniques, torchao enables developers to significantly improve the performance and efficiency of their PyTorch models with minimal code changes and accuracy impact.
View all activity

Organizations

None yet

models

None public yet

datasets

None public yet