Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
3
7
5
Ann Huang
PRO
erinys
Follow
Tonic's profile picture
lunarflu's profile picture
sugatoray's profile picture
27 followers
Β·
26 following
https://huggingface.co/erinys
annintweetd
erinys
annux
AI & ML interests
None yet
Recent Activity
Reacted to
elliesleightholm
's
post
with π€
2 days ago
I made a beginners guide to Hugging Face Spaces π€ I hope it's useful to some of you :) YouTube video: https://www.youtube.com/watch?v=xqdTFyRdtjQ Blog: https://www.marqo.ai/blog/how-to-create-a-hugging-face-space
Reacted to
jsulz
's
post
with π₯
2 days ago
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. Thatβs where our chunk-based approach comes in. Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means: β© Only upload the chunks that changed. π Download just the updates, not the whole file. π§ We store your file as deduplicated chunks In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isnβt just a performance boost. Itβs a rethinking of how we manage models and datasets on the Hub. We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows? https://huggingface.co/blog/from-files-to-chunks
Reacted to
reach-vb
's
post
with π
4 days ago
What a brilliant week for Open Source AI! Qwen 2.5 Coder by Alibaba - 0.5B / 1.5B / 3B / 7B / 14B/ 32B (Base + Instruct) Code generation LLMs, with 32B tackling giants like Gemnini 1.5 Pro, Claude Sonnet https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f LLM2CLIP from Microsoft - Leverage LLMs to train ultra-powerful CLIP models! Boosts performance over the previous SOTA by ~17% https://huggingface.co/collections/microsoft/llm2clip-672323a266173cfa40b32d4c Athene v2 Chat & Agent by NexusFlow - SoTA general LLM fine-tuned from Qwen 2.5 72B excels at Chat + Function Calling/ JSON/ Agents https://huggingface.co/collections/Nexusflow/athene-v2-6735b85e505981a794fb02cc Orca Agent Instruct by Microsoft - 1 million instruct pairs covering text editing, creative writing, coding, reading comprehension, etc - permissively licensed https://huggingface.co/datasets/microsoft/orca-agentinstruct-1M-v1 Ultravox by FixieAI - 70B/ 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder https://huggingface.co/collections/reach-vb/ultravox-audio-language-model-release-67373b602af0a52b2a88ae71 JanusFlow 1.3 by DeepSeek - Next iteration of their Unified MultiModal LLM Janus with RectifiedFlow https://huggingface.co/deepseek-ai/JanusFlow-1.3B Common Corpus by Pleais - 2,003,039,184,047 multilingual, commercially permissive and high quality tokens! https://huggingface.co/datasets/PleIAs/common_corpus I'm sure I missed a lot, can't wait for the next week! Put down in comments what I missed! π€
View all activity
Articles
From Files to Chunks: Improving Hugging Face Storage Efficiency
4 days ago
β’
32
Share your open ML datasets on Hugging Face Hub!
12 days ago
β’
20
Organizations
erinys
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
xet-team/lfs-analysis
about 1 month ago
Compressed -> Deduped column header
2
#4 opened about 1 month ago by
erinys
New activity in
xet-team/lfs-analysis
about 2 months ago
Suggested text changes
1
#1 opened about 2 months ago by
erinys
New activity in
huggingface/data-measurements-tool
2 months ago
AttributeError: module 'matplotlib.cm' has no attribute 'register_cmap'
1
#7 opened 2 months ago by
erinys