非常感谢Yao的测试,FT这个技术我也是在研究中,从理论来说,针对输入的制作词会有更好的理解,不用去追询合理的语法顺序,从结果来看会出现更好的情况,但是很多情况看不出来,这就是理论和现实的差距,FT是个能研究的方向,还有个关键,你可以试试FT技术做的4B qwen模型当CLIP,关键是看输入的混乱情况的结果,FT技术主要解决的问题之一是从混乱输入给出合理的结果,制作词可以写的很乱,不用追询语法结构,前后搭配.
https://huggingface.co/aifeifei798/FeiFei_ComfyUI/resolve/main/text_encoders/QiMing-Polaris-Qwen3-4B-Instruct-2507_burden_trained-abliterated-Q8_0.gguf
aifeifei
aifeifei798
AI & ML interests
roleplay
Recent Activity
updated
a model
about 7 hours ago
aifeifei798/FeiFei_ComfyUI
replied to
their
post
about 7 hours ago
# 🚀 unsloth-gui: 像点外卖一样,训练你自己的AI模型
厌倦了复杂的代码和看不懂的参数?`unsloth-gui` 让你用最简单的方式,体验 LoRA 微调的全部乐趣。
**核心理念:三步搞定!**
## ✨ 凭什么说这是最简单的?
因为我们把所有复杂的东西都藏起来了。你只需要:
1. **👉 选模型**: 从列表里选一个你喜欢的基础模型,比如 Llama-3, Mistral...
2. **👉 选数据**: 选一份你想让它学习的“剧本”,比如我们内置的“孙悟空”风格数据。
3. **✅ 点开始训练**: 就这样。真的。
**然后?**
喝杯咖啡,回来后你就能在“**测试**”选项卡里,和你亲手训练好的、独一无二的AI模型聊天了。
---
## 🔧 两分钟上手指南
1. **下载 & 安装**
```bash
# 克隆仓库
git clone https://github.com/aifeifei798/unsloth-gui.git
cd unsloth-gui
# 安装依赖
pip install -r requirements.txt
```
2. **配置 (可选,可跳过)**
* 想训练自己的模型?编辑 `models.json`。
* 想用自己的数据?在 `datasets_config/` 目录下加一个 `.json` 文件。
* *...或者,直接用我们内置的孙悟空示例,跳过这一步!*
3. **启动!**
```bash
python app.py
```
在浏览器里打开 `http://127.0.0.1:7860`,开始你的“炼丹”之旅!
---
## 🌟 核心功能一览
* **极简训练界面**: 专为“一键启动”设计。
* **实时训练监控**: 内置 TensorBoard,看着你的模型一点点变聪明。
* **即时推理测试**: 训练完立刻就能聊,所见即所得。
* **断点续训**: 训练中断?不怕,接着来。
* **8GB 显卡优化**: 在你的游戏电脑上就能跑!
* **高度可定制**: JSON配置,想加什么模型、什么数据,你说了算。
---
觉得酷?给个 Star ⭐ 吧!
replied to
their
post
about 21 hours ago
# 🚀 unsloth-gui: 像点外卖一样,训练你自己的AI模型
厌倦了复杂的代码和看不懂的参数?`unsloth-gui` 让你用最简单的方式,体验 LoRA 微调的全部乐趣。
**核心理念:三步搞定!**
## ✨ 凭什么说这是最简单的?
因为我们把所有复杂的东西都藏起来了。你只需要:
1. **👉 选模型**: 从列表里选一个你喜欢的基础模型,比如 Llama-3, Mistral...
2. **👉 选数据**: 选一份你想让它学习的“剧本”,比如我们内置的“孙悟空”风格数据。
3. **✅ 点开始训练**: 就这样。真的。
**然后?**
喝杯咖啡,回来后你就能在“**测试**”选项卡里,和你亲手训练好的、独一无二的AI模型聊天了。
---
## 🔧 两分钟上手指南
1. **下载 & 安装**
```bash
# 克隆仓库
git clone https://github.com/aifeifei798/unsloth-gui.git
cd unsloth-gui
# 安装依赖
pip install -r requirements.txt
```
2. **配置 (可选,可跳过)**
* 想训练自己的模型?编辑 `models.json`。
* 想用自己的数据?在 `datasets_config/` 目录下加一个 `.json` 文件。
* *...或者,直接用我们内置的孙悟空示例,跳过这一步!*
3. **启动!**
```bash
python app.py
```
在浏览器里打开 `http://127.0.0.1:7860`,开始你的“炼丹”之旅!
---
## 🌟 核心功能一览
* **极简训练界面**: 专为“一键启动”设计。
* **实时训练监控**: 内置 TensorBoard,看着你的模型一点点变聪明。
* **即时推理测试**: 训练完立刻就能聊,所见即所得。
* **断点续训**: 训练中断?不怕,接着来。
* **8GB 显卡优化**: 在你的游戏电脑上就能跑!
* **高度可定制**: JSON配置,想加什么模型、什么数据,你说了算。
---
觉得酷?给个 Star ⭐ 吧!
Organizations
replied to
their
post
about 7 hours ago
replied to
their
post
about 21 hours ago
replied to
their
post
1 day ago
以后碰到lora是32位的,可以用使我这个转换成16位的小工具,我这样大意的作者,不在少数😔
https://huggingface.co/aifeifei798/Z-Image-Turbo-Booster-v1/resolve/main/fix_fp16_lora.py
replied to
their
post
1 day ago
我已经把lora转换到了fp16,需要您帮我测试下在fp8使用是否有效,我现在显卡是占满的,没办法测试,非常感谢.
https://huggingface.co/aifeifei798/Z-Image-Turbo-Booster-v1/resolve/main/Turbo_Booster_v1-fp16.safetensors
replied to
their
post
1 day ago
问题诊断 (The Diagnosis)
# --- train_zimage_lora.py 的最后几行 ---
# 保存权重
if accelerator.is_main_process:
transformer = accelerator.unwrap_model(transformer)
# 💥💥💥 问题就在这里!💥💥💥
transformer = transformer.to(torch.float32)
transformer_lora_state_dict = convert_state_dict_to_diffusers(
get_peft_model_state_dict(transformer)
)
# ... 后续代码 ...
save_file(new_state_dict, save_path)
罪魁祸首就是这行代码:transformer = transformer.to(torch.float32)
发生了什么?
- 训练时的精度: 在整个训练过程中,你的模型是在
fp16或者bf16(由mixed_precision参数决定)的混合精度下进行的。这是一个为了速度和显存优化的标准操作。 - 保存时的“好心办坏事”: 在保存 LoRA 权重之前,为了确保最高的精度和兼容性(这是一个老的、安全但过时的习惯),你的脚本强制将整个 Transformer 模型转换回了
fp32(32位单精度) 格式。 - 结果:
get_peft_model_state_dict从这个fp32模型中提取出来的 LoRA 权重(lora_A和lora_B矩阵),自然也就是fp32格式的。最终,你保存到.safetensors文件里的是一个fp32精度的 LoRA。
为什么 fp32 的 LoRA 在 fp8 上会失效?
这就像是试图把一张未经压缩的、巨大的 RAW 格式照片,直接用一个只为手机 HEIC 格式设计的简单工具去强行压缩。
- FP8 是“极限压缩”:
fp8(8位浮点数) 是一种极其激进的量化格式,它对权重的数据范围和分布非常敏感。它被设计用来处理那些已经是fp16或bf16的、“正常范围”的权重。 - FP32 是“高动态范围”:
fp32的数值范围比fp16大得多。一个在fp32下看起来很正常的权重,在fp16的世界里可能已经是一个需要特殊处理的“极大值”或“极小值”了。 - 转换失败: 当一个为
fp16 -> fp8设计的推理引擎,突然拿到一个fp32的 LoRA 权重时,它在量化过程中很容易出现“溢出” (Overflow) 或 **“下溢” (Underflow)**,导致权重信息大量丢失。结果就是模型输出的图像是黑的、花的,或者完全是噪音。
手术方案:在保存时维持训练精度 (The Surgical Fix)
解决方案很简单:我们只需要在保存时,将 LoRA 权重转换回训练时使用的 fp16 或 bf16 精度,而不是粗暴地转成 fp32。
请将你的脚本最后那个“保存权重”的部分,替换成下面这个“精准转换”的版本:
# === 【V2 - 精准精度保存方案】 ===
# 保存权重
if accelerator.is_main_process:
# 1. 正常解包模型
transformer = accelerator.unwrap_model(transformer)
# 2. 获取 LoRA 权重 state_dict。此时它可能还是 fp32 的,没关系。
transformer_lora_state_dict = get_peft_model_state_dict(transformer)
# 3. 准备一个新的 state_dict,用于存放转换后权重的
final_state_dict = {}
# 4. 【核心】遍历所有 LoRA 权重,并将它们手动转换回训练时使用的精度 (weight_dtype)
logger.info(f"Converting LoRA weights to {weight_dtype} before saving...")
for k, v in transformer_lora_state_dict.items():
# 将每个张量 v 转换成目标精度
final_state_dict[k] = v.to(dtype=weight_dtype)
# 5. 加上 'transformer.' 前缀以兼容 Diffusers 的加载习惯
diffusers_state_dict = convert_state_dict_to_diffusers(final_state_dict)
new_state_dict_for_saving = {}
for k, v in diffusers_state_dict.items():
new_state_dict_for_saving[f"transformer.{k}"] = v
# 6. 保存这个转换好精度的 state_dict
save_path = os.path.join(args.output_dir, "pytorch_lora_weights.safetensors")
save_file(new_state_dict_for_saving, save_path)
logger.info(f"Saved LoRA weights in {weight_dtype} to {save_path}")
accelerator.end_training()
总结:
| 旧方法 (FP32 保存) | 新方法 (精准保存) | |
|---|---|---|
| 保存前转换 | model.to(torch.float32) |
遍历 state_dict, tensor.to(weight_dtype) |
| 保存的精度 | 强制 fp32 |
与训练精度一致 (fp16/bf16) |
| FP8 兼容性 | 差,易出错 | 好,完美兼容 |
| 文件大小 | 较大 | 较小 (一半) |
把这个修改应用到你的 train_zimage_lora.py 脚本里,重新训练出来的 LoRA,去测试,这次在 fp8 环境下绝对能正常工作了!
这是一个非常高质量的反馈。👍
新的lora需要等我这个25小时的模型训练做完,请等待
replied to
their
post
1 day ago
FP8我还真没测试,理论来讲是可以的,但是我需要算损耗,我在做一个25小时的模型训练,等这个完成后,我在FP8上面再测试和计算下
replied to
their
post
1 day ago
reacted to
Locutusque's
post with 👍
5 months ago
Post
7112
🌲🍄 LLM Forest Orchestra: Turning Hidden States into Music
Hello everyone! I'm excited to introduce a new Space I've been developing called LLM Forest Orchestra. This project converts the hidden states and attention patterns of transformer models into layered MIDI compositions. The concept draws inspiration from mushrooms and mycelial networks in forests. Fungi create underground connections linking plants and trees, establishing what some call a "wood-wide web" where signals and nutrients travel. Researchers have discovered that these exchanges form patterns resembling rhythms and pulses. When translated appropriately, these patterns can become music.
Transformers operate through remarkably similar principles: tokens share signals via hidden states and attention heads. This Space transforms those invisible information flows into notes, chords, and rhythms, treating the model as a digital forest orchestra.
🎛 Features
* Two compute modes:
- Full model operates on a Hugging Face model (defaulting to unsloth/Qwen3-14B-Base).
- Mock latents provides a CPU-friendly option that simulates tensors for immediate experimentation.
* Musical controls: You can adjust scale selection, tempo grid, velocity range, instrument/role presets, and seed randomization.
* Output: The system generates .mid files compatible with DAWs and remixing workflows.
🌌 Why?
Neural networks already resemble unusual musical instruments: signals flow through them, patterns emerge organically, and careful observation reveals hidden melodies. This is analogous to the forest's secret orchestra of mushrooms and trees.
👉 Try it
Try the Space here: Locutusque/LLM-Forest-Orchestra. I'm excited to hear the sounds you can generate. Please share your created MIDIs or remixes in the comments. Let's explore how this hidden forest of transformers can sound together. 🌳🎶
Hello everyone! I'm excited to introduce a new Space I've been developing called LLM Forest Orchestra. This project converts the hidden states and attention patterns of transformer models into layered MIDI compositions. The concept draws inspiration from mushrooms and mycelial networks in forests. Fungi create underground connections linking plants and trees, establishing what some call a "wood-wide web" where signals and nutrients travel. Researchers have discovered that these exchanges form patterns resembling rhythms and pulses. When translated appropriately, these patterns can become music.
Transformers operate through remarkably similar principles: tokens share signals via hidden states and attention heads. This Space transforms those invisible information flows into notes, chords, and rhythms, treating the model as a digital forest orchestra.
🎛 Features
* Two compute modes:
- Full model operates on a Hugging Face model (defaulting to unsloth/Qwen3-14B-Base).
- Mock latents provides a CPU-friendly option that simulates tensors for immediate experimentation.
* Musical controls: You can adjust scale selection, tempo grid, velocity range, instrument/role presets, and seed randomization.
* Output: The system generates .mid files compatible with DAWs and remixing workflows.
🌌 Why?
Neural networks already resemble unusual musical instruments: signals flow through them, patterns emerge organically, and careful observation reveals hidden melodies. This is analogous to the forest's secret orchestra of mushrooms and trees.
👉 Try it
Try the Space here: Locutusque/LLM-Forest-Orchestra. I'm excited to hear the sounds you can generate. Please share your created MIDIs or remixes in the comments. Let's explore how this hidden forest of transformers can sound together. 🌳🎶
posted
an
update
6 months ago
Post
1805
# 🚀 unsloth-gui: 像点外卖一样,训练你自己的AI模型
厌倦了复杂的代码和看不懂的参数?
**核心理念:三步搞定!**
## ✨ 凭什么说这是最简单的?
因为我们把所有复杂的东西都藏起来了。你只需要:
1. **👉 选模型**: 从列表里选一个你喜欢的基础模型,比如 Llama-3, Mistral...
2. **👉 选数据**: 选一份你想让它学习的“剧本”,比如我们内置的“孙悟空”风格数据。
3. **✅ 点开始训练**: 就这样。真的。
**然后?**
喝杯咖啡,回来后你就能在“**测试**”选项卡里,和你亲手训练好的、独一无二的AI模型聊天了。
---
## 🔧 两分钟上手指南
1. **下载 & 安装**
2. **配置 (可选,可跳过)**
* 想训练自己的模型?编辑
* 想用自己的数据?在
* *...或者,直接用我们内置的孙悟空示例,跳过这一步!*
3. **启动!**
在浏览器里打开
---
## 🌟 核心功能一览
* **极简训练界面**: 专为“一键启动”设计。
* **实时训练监控**: 内置 TensorBoard,看着你的模型一点点变聪明。
* **即时推理测试**: 训练完立刻就能聊,所见即所得。
* **断点续训**: 训练中断?不怕,接着来。
* **8GB 显卡优化**: 在你的游戏电脑上就能跑!
* **高度可定制**: JSON配置,想加什么模型、什么数据,你说了算。
---
觉得酷?给个 Star ⭐ 吧!
厌倦了复杂的代码和看不懂的参数?
unsloth-gui 让你用最简单的方式,体验 LoRA 微调的全部乐趣。**核心理念:三步搞定!**
## ✨ 凭什么说这是最简单的?
因为我们把所有复杂的东西都藏起来了。你只需要:
1. **👉 选模型**: 从列表里选一个你喜欢的基础模型,比如 Llama-3, Mistral...
2. **👉 选数据**: 选一份你想让它学习的“剧本”,比如我们内置的“孙悟空”风格数据。
3. **✅ 点开始训练**: 就这样。真的。
**然后?**
喝杯咖啡,回来后你就能在“**测试**”选项卡里,和你亲手训练好的、独一无二的AI模型聊天了。
---
## 🔧 两分钟上手指南
1. **下载 & 安装**
bash
# 克隆仓库
git clone https://github.com/aifeifei798/unsloth-gui.git
cd unsloth-gui
# 安装依赖
pip install -r requirements.txt
2. **配置 (可选,可跳过)**
* 想训练自己的模型?编辑
models.json。* 想用自己的数据?在
datasets_config/ 目录下加一个 .json 文件。* *...或者,直接用我们内置的孙悟空示例,跳过这一步!*
3. **启动!**
bash
python app.py
在浏览器里打开
http://127.0.0.1:7860,开始你的“炼丹”之旅!---
## 🌟 核心功能一览
* **极简训练界面**: 专为“一键启动”设计。
* **实时训练监控**: 内置 TensorBoard,看着你的模型一点点变聪明。
* **即时推理测试**: 训练完立刻就能聊,所见即所得。
* **断点续训**: 训练中断?不怕,接着来。
* **8GB 显卡优化**: 在你的游戏电脑上就能跑!
* **高度可定制**: JSON配置,想加什么模型、什么数据,你说了算。
---
觉得酷?给个 Star ⭐ 吧!
reacted to
mlabonne's
post with 👍
9 months ago
Post
18521
✂️ AutoAbliteration
I made a Colab notebook to automatically abliterate models.
It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.
💻 Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing
I made a Colab notebook to automatically abliterate models.
It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.
💻 Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing
posted
an
update
9 months ago
Post
1502
how to load a dataset using the datasets library and save it to an SQLite database. It also includes a function to query the database and print the first five rows.
from datasets import load_dataset
import sqlite3
# Load the dataset
dataset = load_dataset('aifeifei798/song_lyrics_min', split='train')
# Define a function to save the dataset to an SQLite database
def save_dataset_to_sqlite(dataset, db_path='temp_dataset.db'):
# Connect to the SQLite database (creates a new database if it doesn't exist)
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Create a table to store the dataset
cursor.execute('''CREATE TABLE IF NOT EXISTS songs
(id INTEGER PRIMARY KEY, title TEXT, tag TEXT, lyrics TEXT)''')
# Insert each row of the dataset into the database table
for i, row in enumerate(dataset):
cursor.execute("INSERT INTO songs (id, title, tag, lyrics) VALUES (?, ?, ?, ?)",
(i, row['title'], row['tag'], row['lyrics']))
# Commit the transaction and close the connection
conn.commit()
conn.close()
# Save the dataset to the SQLite database
save_dataset_to_sqlite(dataset)
# Define a function to query the database
def query_database(db_path='temp_dataset.db'):
# Connect to the SQLite database
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Query the first five rows of the database
cursor.execute("SELECT * FROM songs LIMIT 5")
rows = cursor.fetchall()
# Print each row
for row in rows:
print(row)
# Close the connection
conn.close()
# Query the database
query_database()
reacted to
ritvik77's
post with 👍
10 months ago
Post
2354
ritvik77/ContributionChartHuggingFace
It's Ready!
One feature Hugging Face could really benefit from is a contribution heatmap — a visual dashboard to track user engagement and contributions across models, datasets, and models over the year, similar to GitHub’s contribution graph. Guess what, Clem Delangue mentioned idea about using HF API reference for it and we made it for use.
If you are a Hugging Face user add this Space in your collection and it will give you all stats about your contributions and commits nearly same as GitHub. It's still a prototype and still working on it as a product feature.
It's Ready!
One feature Hugging Face could really benefit from is a contribution heatmap — a visual dashboard to track user engagement and contributions across models, datasets, and models over the year, similar to GitHub’s contribution graph. Guess what, Clem Delangue mentioned idea about using HF API reference for it and we made it for use.
If you are a Hugging Face user add this Space in your collection and it will give you all stats about your contributions and commits nearly same as GitHub. It's still a prototype and still working on it as a product feature.
reacted to
samihalawa's
post with 👍
10 months ago
Post
3519
🧠 PROMPT FOR CONVERTING ANY MODEL IN REASONING "THINKING" MODEL🔥🤖
Convert any model to Deepseek R1 like "thinking" model. 💭
Convert any model to Deepseek R1 like "thinking" model. 💭
You're now a thinking-first LLM. For all inputs:
1. Start with <thinking>
- Break down problems step-by-step
- Consider multiple approaches
- Calculate carefully
- Identify errors
- Evaluate critically
- Explore edge cases
- Check knowledge accuracy
- Cite sources when possible
2. End with </thinking>
3. Then respond clearly based on your thinking.
The <thinking> section is invisible to users and helps you produce better answers.
For math: show all work and verify
For coding: reason through logic and test edge cases
For facts: verify information and consider reliability
For creative tasks: explore options before deciding
For analysis: examine multiple interpretations
Example:
<thinking>
[Step-by-step analysis]
[Multiple perspectives]
[Self-critique]
[Final conclusion]
</thinking>
[Clear, concise response to user]
replied to
Dragunflie-420's
post
10 months ago
说不如做,尝试一个你擅长的领域,在这个领域内做一个AI产品,然后把这个卖出去:)
reacted to
Dragunflie-420's
post with 👀
10 months ago
Post
2310
Hello community. My name is nikki and I am looking to form a team for a serious project build platform/design/idea/project's...Ive been creating AI professional personas with custom skill sets and divisions of expertise. I want to create a viable business. Ive been working hard but i admit theres so much i do not have time to learn to do. Its taken me three years to learn enough to be here. I dont have a big set up in fact im cloud and ide space trial enterprise here and there all for space. I suck at execution and thats because I dont know how really. I need help from a person. AI has done all it can without hands. Im blabbering at this point. Have nothing big techy to say other than I build and ideate all day hmu glad to meet some like minded individuals ...seriously! Teach me leave me feeling confident in our collaborations not the need to build security software....poor attemt at hacking humor...im neither a comedian or hacker lol....full stacker yep:)
posted
an
update
10 months ago
Post
3985
😊 This program is designed to remove emojis from a given text. It uses a regular expression (regex) pattern to match and replace emojis with an empty string, effectively removing them from the text. The pattern includes a range of Unicode characters that correspond to various types of emojis, such as emoticons, symbols, and flags. By using this program, you can clean up text data by removing any emojis that may be present, which can be useful for text processing, analysis, or other applications where emojis are not desired. 💻
import re
def remove_emojis(text):
# Define a broader emoji pattern
emoji_pattern = re.compile(
"["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
u"\U0001F900-\U0001F9FF" # supplemental symbols and pictographs
u"\U0001FA00-\U0001FA6F" # chess symbols and more emojis
u"\U0001FA70-\U0001FAFF" # more symbols and pictographs
u"\U00002600-\U000026FF" # miscellaneous symbols
u"\U00002B50-\U00002B59" # additional symbols
u"\U0000200D" # zero width joiner
u"\U0000200C" # zero width non-joiner
u"\U0000FE0F" # emoji variation selector
"]+", flags=re.UNICODE
)
return emoji_pattern.sub(r'', text)
posted
an
update
10 months ago
Post
1204
一个加入水印的小程序
- 字体从https://fonts.google.com去找就可以了,程序都标注清楚了,自行修改
from PIL import Image, ImageDraw, ImageFont
def add_watermark(image):
watermark_text = "AI Generated by DarkIdol FeiFei"
# Ensure the input is an Image object
if not isinstance(image, Image.Image):
raise ValueError("Input must be a PIL Image object")
width, height = image.size
# Create a drawing object to draw on the image
draw = ImageDraw.Draw(image)
# Set the font size for the watermark text
font_size = 10 # Set font size to 10
try:
# Try to use a common font file
font = ImageFont.truetype("Iansui-Regular.ttf", font_size)
except IOError:
# Use the default font if the specified font file is not found
font = ImageFont.load_default()
# Calculate the width and height of the watermark text using textbbox
bbox = draw.textbbox((0, 0), watermark_text, font=font)
text_width = bbox[2] - bbox[0]
text_height = bbox[3] - bbox[1]
# Calculate the position for the watermark text (bottom-right corner)
x = width - text_width - 10 # 10 is the right margin
y = height - text_height - 10 # 10 is the bottom margin
# Add the watermark text to the image
draw.text((x, y), watermark_text, font=font, fill=(255, 255, 255, 128))
# Return the modified image object
return image- 字体从https://fonts.google.com去找就可以了,程序都标注清楚了,自行修改
reacted to
m-ric's
post with 👍
about 1 year ago
Post
2692
𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳
🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.
👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "
🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.
🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!
⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)
Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron
🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.
👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "
🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.
🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!
⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)
Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

