Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
singhsidhukuldeep 
posted an update Sep 9
Post
1221
Remember when @Google launched MediaPipe in an effort to create efficient on-device pipelines?

They've just unlocked the ability to run 7B+ parameter language models directly in your browser. This is a game-changer for on-device AI!

Yes, they are streaming 8.6 GB model files!

Currently, they have Gemma 2B/7B running, but imagine Dynamic LoRA, multimodal support, quantization, and you never leaving Chrome!

This is a significant technical advancement, especially in Memory Optimization:

- Redesigned the model-loading code to work around WebAssembly's 4 GB memory limit.
- Implemented asynchronous loading of transformer stack layers (28 for Gemma 1.1 7B).
- Reduced peak WebAssembly memory usage to less than 1% of previous requirements.

Cross-Platform Compatibility
- Compiled the C++ codebase to WebAssembly for broad browser support.
- Utilized the WebGPU API for native GPU acceleration in browsers.

Here's why this matters:

1. Privacy: No need to send data to remote servers.
2. Cost-Efficiency: Eliminates server expenses.
3. Offline Capabilities: Use powerful AI without an internet connection.

Blog: https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/