Sorokin Evgeny's picture
3

Sorokin Evgeny

DeathGodlike
Β·

AI & ML interests

None yet

Recent Activity

reacted to tomaarsen's post with πŸ”₯ 3 days ago
🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics. We apply our recipe to train 2 Static Embedding models that we release today! We release: 2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0 🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation πŸ“œ my training scripts, using the Sentence Transformers library πŸ“Š my Weights & Biases reports with losses & metrics πŸ“• my list of 30 training and 13 evaluation datasets The 2 Static Embedding models have the following properties: 🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5' 0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed! πŸ“ No maximum sequence length! Embed texts at any length (note: longer texts may embed worse) πŸ“ Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more. πŸͺ† Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks) Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance. Alternatively, check out the models: * https://huggingface.co/sentence-transformers/static-retrieval-mrl-en-v1 * https://huggingface.co/sentence-transformers/static-similarity-mrl-multilingual-v1
reacted to tomaarsen's post with ❀️ 3 days ago
🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics. We apply our recipe to train 2 Static Embedding models that we release today! We release: 2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0 🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation πŸ“œ my training scripts, using the Sentence Transformers library πŸ“Š my Weights & Biases reports with losses & metrics πŸ“• my list of 30 training and 13 evaluation datasets The 2 Static Embedding models have the following properties: 🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5' 0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed! πŸ“ No maximum sequence length! Embed texts at any length (note: longer texts may embed worse) πŸ“ Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more. πŸͺ† Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks) Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance. Alternatively, check out the models: * https://huggingface.co/sentence-transformers/static-retrieval-mrl-en-v1 * https://huggingface.co/sentence-transformers/static-similarity-mrl-multilingual-v1
View all activity

Organizations

None yet

DeathGodlike's activity

reacted to tomaarsen's post with πŸ”₯❀️ 3 days ago
view post
Post
4116
🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
πŸ“œ my training scripts, using the Sentence Transformers library
πŸ“Š my Weights & Biases reports with losses & metrics
πŸ“• my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
πŸ“ No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
πŸ“ Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
πŸͺ† Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1
  • 1 reply
Β·
reacted to prithivMLmods's post with πŸ‘πŸ€— about 2 months ago
view post
Post
2645
Milestone for Flux.1 Dev πŸ”₯

πŸ’’The Flux.1 Dev model has crossed 1️⃣0️⃣,0️⃣0️⃣0️⃣ creative public adapters! 🎈
πŸ”— https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev

πŸ’’This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges

πŸ’’ Here’s the 10,000th public adapter : 😜
+ strangerzonehf/Flux-3DXL-Partfile-0006

πŸ’’ Page :
+ https://huggingface.co/strangerzonehf

πŸ’’ Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
reacted to openfree's post with πŸ‘ 3 months ago
view post
Post
3985
MixGen3 is an innovative image generation service that utilizes LoRA (Low-Rank Adaptation) models. Its key features include:

Integration of various LoRA models: Users can explore and select multiple LoRA models through a gallery.
Combination of LoRA models: Up to three LoRA models can be combined to express unique styles and content.
User-friendly interface: An intuitive interface allows for easy model selection, prompt input, and image generation.
Advanced settings: Various options are provided, including image size adjustment, random seed, and advanced configurations.

Main applications of MixGen3:

Content creation
Design and illustration
Marketing and advertising
Education and learning

Value of MixGen3:

Enhancing creativity
Time-saving
Collaboration possibilities
Continuous development

Expected effects:

Increased content diversity
Lowered entry barrier for creation
Improved creativity
Enhanced productivity

MixGen3 is bringing a new wave to the field of image generation by leveraging the advantages of LoRA models. Users can experience the service for free at
https://openfree-mixgen3.hf.space

contacts: arxivgpt@gmail.com
  • 1 reply
Β·
reacted to singhsidhukuldeep's post with πŸ‘€ 3 months ago
view post
Post
2163
While Google's Transformer might have introduced "Attention is all you need," Microsoft and Tsinghua University are here with the DIFF Transformer, stating, "Sparse-Attention is all you need."

The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance.

The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns.

How?
- It uses two separate softmax attention maps and subtracts them.
- It employs a learnable scalar Ξ» for balancing the attention maps.
- It implements GroupNorm for each attention head independently.
- It is compatible with FlashAttention for efficient computation.

What do you get?
- Superior long-context modeling (up to 64K tokens).
- Enhanced key information retrieval.
- Reduced hallucination in question-answering and summarization tasks.
- More robust in-context learning, less affected by prompt order.
- Mitigation of activation outliers, opening doors for efficient quantization.

Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters.

This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?
  • 1 reply
Β·
reacted to Felladrin's post with πŸ‘ 3 months ago
view post
Post
2950
MiniSearch is celebrating its 1st birthday! πŸŽ‰

Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!

HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space
  • 1 reply
Β·
reacted to nyuuzyou's post with β€οΈπŸ‘€ 3 months ago
view post
Post
1971
πŸŽ“ Introducing Doc4web.ru Documents Dataset - nyuuzyou/doc4web

Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use

The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.
reacted to m-ric's post with πŸ”₯ 4 months ago
view post
Post
1332
πŸ‡¨πŸ‡³β›΅οΈ ε‡Ίζ΅·: Chinese AI is expanding globally

Fact: Chinese LLMs are heavily underrated, for instance recently the excellent Deepseek-v2.5 or Qwen models.

Luckily for us, @AdinaY just wrote an excellent blog post explaining the Chinese AI ecosystem!

My key takeaways:

Since Google, OpenAI and Anthropic models are not available in China, local companies are fighting for the market. A really good market - AI has much higher penetration there than in the rest of the world, both with companies and individual users!

πŸ’° But since Deepseek heavily cut prices in May 24, this spiraled into a price war that created a cut-throat environment with unsustainably low prices.

πŸ“‹ On top of this, the local regulation is stringent: models must undergo licensing from a local censor (the Cyberspace Administration of China), that for instance requires models to refuse answering certain questions on the CCP. Although this is certainly simpler to implement than certain condition of the European AI Act.

πŸ’Έ If this wasn't enough, VC investment in AI is drying out: By mid-2024, Chinese AI startups raised approximately $4.4 billion, vs $55B for US startups just in Q2 24.

πŸ“± To get profitability companies have shifted from foundational models to model + application, for instance PopAI from [01.AI](http://01.ai/) with millions of users and high profitability.

⛏️ They also try to drill down specific industries: but these niches are also getting crowded.

➑️ Since their home market is becoming both too crowded and unhospitable, Chinese companies are now going for international market, "Sailing abroad" following the expression consacred for Zheng He's legendary journey in 1500.

There, they'll have to adapt to different infrastructures and regulations, but they have bright prospects for growth!

Read her post πŸ‘‰Β https://huggingface.co/blog/AdinaY/chinese-ai-global-expansion
reacted to TuringsSolutions's post with πŸ˜ŽπŸ‘ 4 months ago
view post
Post
3195
I solved the biggest math problem associated with the Attention Mechanism. it works, better than I ever expected. Test it all yourself. Everything you need is linked from this video: https://youtu.be/41dF0yoz0qo

Sorry the audio quality sucks, I will buy a new microphone today. Why does some moron like me solve these things and not you? I know more about how computers work than you do, that's it. Swarm algorithms were big in the 90's and early 2000's. Computers were absolute dog doo doo then in one specific way, compared to now. That one way, which everyone overlooks, is the entire secret behind why swarm algorithms are so good.
reacted to bartowski's post with πŸ‘ 5 months ago
view post
Post
6242
As some of you know, I try to convert models to either fp32 or bf16 depending on theirs size before doing imatrix and quantization

Today I decided to see if that matters, and the results have me.. for lack of a better word, perplexed

My setup:

Mistral Nemo Instruct 2407
- convert to FP32, calculate imatrix, quantize to Q8_0 and Q4_K_M
- convert to FP16, calculate imatrix, quantize to Q8_0 and Q4_K_M

I calculated the kld base from the FP32 model:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-f32.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld -ngl 35 -fa -sm row

then calculated the divergence itself for each like so:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-Q8_0.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld --kl-divergence -ngl 50 -fa -sm row

Q4_K_M from fp16 and fp32 were similar, trading blows across statistics, odd since i expected fp32 to be strictly better but it's not

Q8_0 is where things get weird. Despite each file being slightly different size, and the sha256sum of course being different, they each get *completely identical* scores, down to 6 decimal places of precision on the statistics.

How is this possible? Is there something I don't understand about llama.cpp that makes it always convert to fp16 before it does quantization? Am I wasting time using FP32/BF16??
Β·
reacted to nyuuzyou's post with πŸ”₯ 6 months ago
view post
Post
1114
Just released the GitVerse Code Dataset - nyuuzyou/gitverse-code.

πŸ“Š Dataset highlights:
- 30 GB of unique code extracted from over 400 GB of analyzed data
- 9,014 repositories
- 2,804,216 unique code files
- 419 different file types
- Multilingual: various programming languages

🌐 Sourced from GitVerse, a Russian GitHub alternative opened in 2024.

Let me know your thoughts.
reacted to mitkox's post with πŸ”₯❀️ 7 months ago
view post
Post
2376
Me: I want on device AI: fast, without latency, with real privacy, convenient for use and development.

Microsoft: The best I can do is Copilot+. You need a special Qualcomm chip and Windows 11 24H2. Today I can give you only Recall, taking screenshots and running a visual model to write context about what you are doing in the unencrypted Semantic Index database for embeddings. I'm giving you SLMs Phi Silica, accessible only via API and SDK. In the autumn I can give you the developer tools for C#/C++ and you can use them.

Apple: The best I can do is Apple Intelligence. You need a special Apple chip and macOS 15. Today I can give you only marketing. In the autumn I can give you on-device 3B quantized to 3.5bit mysterious SLMs and diffusion models with LoRA adapters. We will have an encrypted Semantic Index database for embeddings and agentic flows with function calling. We will call all of them with different names. In the autumn I will give you the developer tools in Swift and you can use them.

Open Source: The best I can do is llama.cpp. You can run it on any chip and OS. Today you can run AI inferencing on device and add other open source components for your solution. I can give you local AI models SLMs/LLMs - from wqen2-0.5B to Llama3-70B. You can have an encrypted local embeddings database with PostgreSQL/pgvector or SQLite-Vec. I can give you a wide choice of integrations and open-source components for your solution- from UIs to agentic workflows with function calling. Today I can give you the developer tools in Python/C/C++/Rust/Go/Node.js/JS/C#/Scala/Java and you can use them.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
Β·
reacted to maywell's post with πŸš€ 9 months ago
view post
Post
8936
πŸ”₯ Transfer model's Chat feature, Context length and Knowledge to another under 1 minute without any train.

Imagine being able to create chat models, expand context, and transfer domain-specific knowledge to models, all within a matter of minutes. Our innovative approach, based on a combination of diff-based techniques and sigmoid ratio calculations, makes this possible.

By considering the diffs between the desired information model (long context or chat) and the base model, as well as the diffs between the base model and the target model, we can efficiently transfer features and expand context without the need for extensive training or resources.

Our method minimizes model degradation and ensures that only the desired information is captured, resulting in high-quality models that can be created with just a single click. Whether you need a chat model, expanded context, or domain-specific knowledge transfer, our approach offers a rapid and effective solution.

In blog post below, we will dive into the details of our method, provide code examples, and showcase the impressive results achieved using our approach. Get ready to revolutionize your model creation process and unlock new possibilities with this powerful technique.

Blog - https://huggingface.co/blog/maywell/llm-feature-transfer
  • 2 replies
Β·