view post Post 3916 I am very sad to say that the budget in creating of SnowflakeCore-G1 1b and 7b MoE models ran out and I can't pre-train them anymore. See translation
view post Post 362 the training for SnowflakeCore-G1-1B and 7B would be retaken because now I implemented DeepSpeed and management to use two gpus. See translation
i3-architecture FlameF0X/i3-200m Updated 4 days ago • 4 FlameF0X/i3-22m 22.6M • Updated about 10 hours ago FlameF0X/i3-12m Text Generation • 12.7M • Updated 3 days ago • 153 • 1 FlameF0X/i3-tiny Text Generation • 711k • Updated 9 days ago • 28
SnowflakeCore G1 Pre-Train The base models of G1. All the Snowflake models are fully pre-train, not fine-tune of a pre-existing model. FlameF0X/SnowflakeCore-G1-Tiny2 Text Generation • Updated Sep 4 • 20 • 1 FlameF0X/SnowflakeCore-G1-Tiny Text Generation • Updated Jul 30 • 14
i3-architecture FlameF0X/i3-200m Updated 4 days ago • 4 FlameF0X/i3-22m 22.6M • Updated about 10 hours ago FlameF0X/i3-12m Text Generation • 12.7M • Updated 3 days ago • 153 • 1 FlameF0X/i3-tiny Text Generation • 711k • Updated 9 days ago • 28
SnowflakeCore G1 Pre-Train The base models of G1. All the Snowflake models are fully pre-train, not fine-tune of a pre-existing model. FlameF0X/SnowflakeCore-G1-Tiny2 Text Generation • Updated Sep 4 • 20 • 1 FlameF0X/SnowflakeCore-G1-Tiny Text Generation • Updated Jul 30 • 14