samsja commited on
Commit
73588cb
·
verified ·
1 Parent(s): e9ca55b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -19,8 +19,8 @@ pipeline_tag: text-generation
19
 
20
  **INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
21
  The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
22
- The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node
23
- The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead.
24
 
25
  For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).
26
 
 
19
 
20
  **INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
21
  The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
22
+ The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node.
23
+ The model was trained using the [DiLoCo](https://arxiv.org/abs/2311.08105) algorithms with 100 inner steps. The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead by a factor 400x.
24
 
25
  For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).
26