@Xenova on Hugging Face: "Last week, we released 🤗 Transformers.js v2.14, which added support for SAM…"

Xenova

posted an update Jan 19

Post

Last week, we released 🤗 Transformers.js v2.14, which added support for SAM (Segment Anything Model).

This means you can now generate high-quality segmentation masks for objects in a scene, directly in your browser! 🤯

Demo (+ source code): Xenova/segment-anything-web
Model: Xenova/slimsam-77-uniform

But how does this differ from Meta's original demo? 🤔 Didn't that also run in-browser?

Well, in their demo, the image embeddings are computed server-side, then sent to the client for decoding. Trying to do this all client-side would be completely impractical: taking minutes per image! 😵‍💫

That's where SlimSAM comes to the rescue! SlimSAM is a novel SAM compression method, able to shrink the model over 100x (637M → 5.5M params), while still achieving remarkable results!

The best part? You can get started in a few lines of JavaScript code, thanks to Transformers.js! 🔥

// npm i @xenova/transformers
import { SamModel, AutoProcessor, RawImage } from '@xenova/transformers';

// Load model and processor
const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');

// Prepare image and input points
const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]];

// Process inputs and perform mask generation
const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);

// Post-process masks
const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);

// Visualize the mask
const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');

I can't wait to see what you build with it! 🤗

crapthings

Jan 27

how to use box prompt?

Xenova

Feb 4

Coming soon!

Matthieu

Jan 31

Thanks for this great post!

About the demo:
1/ Where did the photos in vector database were scraped about?
2/ Is the free Supabase sufficient for this demo?

Xenova

Feb 4

•

edited Feb 4

I assume this is for another demo? See here for more information.

cggaurav

Feb 4

Amazing

doublelotus

Feb 12

Hey, quick question on this. I've been playing around with it and loving it. I wanted to know that if I wanted to take Metas approach and compute the image embeddings server side would I be able to use the normal sam-vit-base on the server alongside xenova\sam-vit-base on the frontend for decoding?

Xenova

Feb 12

•

edited Feb 12

Hi there! I suppose you could do this with custom configs and specifying the model_file_name (see here). Feel free to open an issue on GitHub and I'll be happy to try provide example code. Alternatively, you can find the .onnx files here, then use onnxruntime-node on the server and onnxruntime-web on the client to load the models. You can use the SamProcessor (provided by transformers.js) to do the pre- and post-processing (see model card for example usage).

ritheshSree

Apr 22

How to give text prompt as input? Thanks in advance

Join the conversation