ibm-granite
/

granite-3.0-3b-a800m-base

Text Generation

Model card Files Files and versions Community

amezasor commited on 25 days ago

Commit

275b906

•

1 Parent(s): a1d5513

typo fix

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -255,8 +255,8 @@ output = tokenizer.batch_decode(output)
 print(output)
 ```
-**Model Architeture:**
-Granite-3.0-3B-A800M-Base is based on a decoder-only sparse Mixture of Experts(MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss.
 | Model                        | 2B Dense | 8B Dense | 1B MoE   | 3B MoE       |
 | :--------                    | :--------| :--------| :--------| :--------    |

 print(output)
 ```
+**Model Architecture:**
+Granite-3.0-3B-A800M-Base is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss.
 | Model                        | 2B Dense | 8B Dense | 1B MoE   | 3B MoE       |
 | :--------                    | :--------| :--------| :--------| :--------    |