maxiw (Maxi)

posted an update 2 days ago

Post

953

🤖 Controlling Computers with Small Models 🤖

We just released PTA-1, a fine-tuned Florence-2 for localization of GUI text and elements. It runs with ~150ms inference time on a RTX 4080. This means you can now start building fast on-device computer use agents!

Model: AskUI/PTA-1
Demo: AskUI/PTA-1

1 reply

·

reacted to rwightman's post with 🚀 2 days ago

Post

1528

New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models.

There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180.
* AdamW for 250 epochs - timm/mobilenetv4_conv_medium.e250_r384_in12k
* Adopt for 180 epochs - timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* AdamW for 180 epochs - timm/mobilenetv4_conv_medium.e180_r384_in12k

This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.

reacted to m-ric's post with 👀 2 days ago

Post

1296

Great feature alert: 𝗬𝗼𝘂 𝗰𝗮𝗻 𝗻𝗼𝘄 𝘂𝘀𝗲 𝗮𝗻𝘆 𝗦𝗽𝗮𝗰𝗲 𝗮𝘀 𝗮 𝘁𝗼𝗼𝗹 𝗳𝗼𝗿 𝘆𝗼𝘂𝗿 𝘁𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀.𝗮𝗴𝗲𝗻𝘁! 🛠️🔥🔥

This lets you take the coolest spaces, like FLUX.1-dev, and use them in agentic workflows with a few lines of code! 🧑‍💻

On the video below, I set up my fake vacation pictures where I'm awesome at surfing (I'm really not) 🏄

Head to the doc to learn this magic 👉 https://huggingface.co/docs/transformers/main/en/agents_advanced#import-a-space-as-a-tool-

reacted to merve's post with 👀 5 days ago

Post

4672

OmniVision-968M: a new local VLM for edge devices, fast & small but performant
💨 a new vision language model with 9x less image tokens, super efficient
📖 aligned with DPO for reducing hallucinations
⚡️ Apache 2.0 license 🔥

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model NexaAIDev/omnivision-968M

4 replies

·

reacted to m-ric's post with ❤️ 7 days ago

Post

3631

𝗧𝗵𝗲 𝗻𝗲𝘅𝘁 𝗯𝗶𝗴 𝘀𝗼𝗰𝗶𝗮𝗹 𝗻𝗲𝘁𝘄𝗼𝗿𝗸 𝗶𝘀 𝗻𝗼𝘁 🦋, 𝗶𝘁'𝘀 𝗛𝘂𝗯 𝗣𝗼𝘀𝘁𝘀! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

⚙️ Computed with the great dataset maxiw/hf-posts
⚙️ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me

9 replies

·

reacted to merve's post with 😎 7 days ago

Post

1627

Microsoft released LLM2CLIP: a CLIP model with longer context window for complex text inputs 🤯
All models with Apache 2.0 license here microsoft/llm2clip-672323a266173cfa40b32d4c

TLDR; they replaced CLIP's text encoder with various LLMs fine-tuned on captioning, better top-k accuracy on retrieval.
This will enable better image-text retrieval, better zero-shot image classification, better vision language models 🔥
Read the paper to learn more: LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation (2411.04997)

replied to their post 8 days ago

I didn't know about the hub-stats dataset. Really cool, thx for sharing! 🤗

posted an update 8 days ago

Post

4492

I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
❤️: 7048 times
🔥: 5921 times
👍: 4856 times
🚀: 2549 times
🤗: 2065 times

9 replies

·

posted an update 13 days ago

Post

1709

Exciting to see open-source models thriving in the computer agent space! 🔥
I just built a demo for OS-ATLAS: A Foundation Action Model For Generalist GUI Agents — check it out here: maxiw/OS-ATLAS

This demo predicts bounding boxes based on screenshot + instructions as input.

reacted to cbensimon's post with ❤️ 2 months ago

Post

4265

Hello everybody,

We've rolled out a major update to ZeroGPU! All the Spaces are now running on it.

Major improvements:

1. GPU cold starts about twice as fast!
2. RAM usage reduced by two-thirds, allowing more effective resource usage, meaning more GPUs for the community!
3. ZeroGPU initializations (coldstarts) can now be tracked and displayed (use progress=gr.Progress(track_tqdm=True))
4. Improved compatibility and PyTorch integration, increasing ZeroGPU compatible spaces without requiring any modifications!

Feel free to answer in the post if you have any questions

🤗 Best regards,
Charles

replied to m-ric's post 2 months ago

HF Space to try it out: maxiw/HTML-to-Markdown

replied to their post 2 months ago

@fridayfairy this is not fine-tuned. It's the base model just prompted to return bounding boxes in a specific format. The Qwen2-VL models must have been pre-trained on detection data.

reacted to rwightman's post with 👍 3 months ago

Post

1277

The timm leaderboard timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?

replied to their post 3 months ago

According to @simonw Gemini might also be able to do this but OpenAI’s GPT-4o and Anthropic’s Claude 3 and Claude 3.5 models can’t.
https://simonwillison.net/2024/Aug/26/gemini-bounding-box-visualization/

posted an update 3 months ago

Post

2349

The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

4 replies

·

posted an update 3 months ago

Post

2271

Just added the newly released xGen-MM v1.5 foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research to my xGen-MM HF Space maxiw/XGen-MM

2 replies

·

Maxi PRO

AI & ML interests

Recent Activity

Organizations

maxiw's activity