Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
1
9
44
Tyler Williams
PRO
unmodeled-tyler
Follow
rajkumarrawal's profile picture
taharben's profile picture
Tico1982's profile picture
34 followers
·
48 following
https://unmodeledtyler.com
unmodeled_tyler
unmodeled-tyler
AI & ML interests
AI researcher/engineer. The human behind VANTA Research. Feel free to reach out on X or telegram: @unmodeledtyler or send me an email: tyler@vantaresearch.xyz
Recent Activity
reacted
to
melvindave
's
post
with 🚀
about 16 hours ago
Currently having a blast learning the transformers library. I noticed that model cards usually have Transformers code as usage examples. So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp. Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies. Once I got the model loaded and sample inference working, I made an API to serve it. I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working! Model used: https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct GPU: NVIDIA GeForce RTX 3090 Here's the result of my experimentation
updated
a Space
about 18 hours ago
vanta-research/README
reacted
to
Teen-Different
's
post
with 👀
about 19 hours ago
Interesting... looked into Apple's DiffuCoder and the masked diffusion approach is actually hitting SOTA parity... basicallly proving global MDLM can work for code https://arxiv.org/pdf/2506.20639 but then you look at Tiny-A2D results and it’s the complete opposite...BD3LM (block diffusion) totally outperforms MDLM... and then both MDLM and BD3LM models struggle hard compared to the AR baselines... https://github.com/ZHZisZZ/dllm/tree/main/examples/a2d digging into the why and i think it comes down to the adaptation method....tiny-A2D just SFT’d an AR model adaption to force it into diffusion... asking a model wired for left to right causal attention to suddenly think bidirectionally is a massive shock... it struggles to unlearn that strong AR inductive bias ...that explains why BD3LM worked better in their case... since it generates in chunks it preserves some sequential order... acts like a bridge or crutch that feels more natural to the original Qwen weights contrast that with Apple... they didn't just SFT...they pre-trained/adapted on 130B tokens... fundamentally rewiring the model to understand global dependencies from the ground up my theory is if we want MDLM to actually work we can’t just SFT... we need that heavy adaptation or full pre-training phase to break the causal priors... otherwise the model just gets confused
View all activity
Organizations
unmodeled-tyler
's Spaces
1
Sort: Recently updated
pinned
Running
3
Model Rank
⚡
Browse the most popular models by category (base, quant, FT)