This means you can now generate high-quality segmentation masks for objects in a scene, directly in your browser! π€―
Demo (+ source code): Xenova/segment-anything-web
Model: Xenova/slimsam-77-uniform
But how does this differ from Meta's original demo? π€ Didn't that also run in-browser?
Well, in their demo, the image embeddings are computed server-side, then sent to the client for decoding. Trying to do this all client-side would be completely impractical: taking minutes per image! π΅βπ«
That's where SlimSAM comes to the rescue! SlimSAM is a novel SAM compression method, able to shrink the model over 100x (637M β 5.5M params), while still achieving remarkable results!
The best part? You can get started in a few lines of JavaScript code, thanks to Transformers.js! π₯
// npm i @xenova/transformers
import { SamModel, AutoProcessor, RawImage } from '@xenova/transformers';
// Load model and processor
const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');
// Prepare image and input points
const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]];
// Process inputs and perform mask generation
const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);
// Post-process masks
const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);
// Visualize the mask
const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');
I can't wait to see what you build with it! π€