Why are the training carbon emissions still not available ?
And when will they be ? :)
Also, the section "environmental impact" is misleading because an environmental impact assessment is far beyond CO2 emissions. Take for example the ecological (and societal !) disasters that the mining of the uranium used for the training causes.
What uranium is used in HPC?
What is HPC ? I am not native english speaker so please avoid abreviations.
High performance computing. It sounds like you think that uranium was used to train BLOOM? I’m confused as to why that is.
Thank you. So yes, here is from the Training section of the hugging face report : "The training supercomputer, Jean Zay (website), uses mostly nuclear energy"
Hello Charlo,
Thanks for your interest in the topic.
We are gathering data about carbon emissions related to BLOOM training as we speak.
We will provide information when all the data is collated and analysed by the team.
For now you can find information about training scheme as well as hardware used in the model card.
Regards, S.
Thank you for your answer. What about the second part of the message? Why do you choose to only consider the part of your environmental impact that is artificially reduced and hence is misleading ?
And why do you calculate it afterward?
You should have anticipated it before making the experiment to know if it was ethically worth it. Spoiler: it is not
Hi @Charlolegossbo – please keep the discussion civil and constructive. Thanks!
Hi ! Alright, sorry. Let me rephrase.
Could you please be more complete and transparent about the "environmental impact" section ?
Indeed, the name "environmental impact" is misleading because an environmental impact assessment is far beyond CO2 emissions. Take for example the ecological (and societal !) disasters that the mining of the uranium used for the training causes.
This is something that people often don't know. By being misleading you reinforce that ignorance. We are leaving at a time when people should be aware of the impact of the energy they use so please be completely transparent about your environmental impact.
Hi Charlo,
Thanks for the extra context you provide here.
It is true that environmental impact assessment can encompass a number of things.
From closer to the consumption (GPU power, computing cluster, datacentre...) to further and further (electric grid, source of energy, construction of infrastructure...).
Some information are easier to gather and quantify and other much harder.
We will provide all the information we can.
Regards, S.
Hi,
You are right, this is a highly complex topic and it is very complex to assess every part of it. It requires expert knowledge that your company does not have right now.
However, evaluating such impacts is much more critical for the world now than another large language model (LLM).
The world, and especially the people that mine the uranium you are using, don't need another LLM, even if it might have some advantage over the other LLMs. They need companies that spend those 5 million $ much more wisely. Like, for example, in the recruitment of an expert on the socio-environmental impact of energy consumption.
But. I don't think this misleading environmental impact is only due to a lack of competence in your company. I believe it is intentional greenwashing about your model (you even named it after a flowering process...).
If I am wrong, please rename your "environmental impact" section. Or admit in it that this section is way way way incomplete.
Hi Charlo,
I feel the conversation is stirring away from a constructive debate again.
Let's go back to your initial question: Why are the training carbon emissions still not available ?
We are working on it, this information will be ready soon.
Regards, S.
My last message was constructive. It might make you uncomfortable (and I am sorry for that) but it only contains facts and arguments
I agree with the user Charlolegossbo, the environmental impact section may be incomplete. So I have a suggestion for the carbon footprint analysis responsible team:
Just like airplane footprints are also reported per passenger, so should this model. This model has the potential to help millions, if not more, directly create new products and services and indirectly by advancing science itself with incalculable advancements for all humanity that can range from curing diseases to solving energy generation issues. A way more truthful real emission impact on the planet.
Keep creating awesome projects like this that benefit the whole World. Thank you.
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model
see also the open metadata PR in the present repo: https://huggingface.co/bigscience/bloom/discussions/140
Closing as this discussion seems to have come to an end + @julien-c provided a PR linking to the carbon footprint paper. Feel free to re-open if you want to continue the discussion.