Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
louisbrulenaudet 
posted an update Oct 17
Post
3094
🚨 I have $3,500 in Azure credits, including access to an H100 (96 Go), expiring on November 12, 2024.

I won’t be able to use it all myself, so I’m reaching out to the @huggingface community: Are there any open-source projets with data ready for some compute power?

Let’s collaborate and make the most of it together 🔗

What did you mean by "data ready"? I have a idea of creating a synthetic knowlege dataset with Llms this will be done by a method I created "No need for promoting" just run code and it will generate randomly High quality data.

Would you like to do this with me? Creating a dataset?
You can actually find a sample of dataset on my profile named as "nbi".

Let me know if we can make this happen.

I wanted to do this but I didn't had cost it was requiring.

·

Hi @Pankaj8922 ,

Thank you for reaching out and sharing your project concept! For this collaboration, I'm specifically seeking projects that already have data prepared and ready for immediate use, as the Azure credits are limited and focused on applications that can be initiated without additional data generation steps.

If you have any projects with data fully prepared, feel free to submit details through the form here: https://tally.so/r/w2xe0A.

Best of luck with your synthetic dataset project!

Hey Louis, We are an independent organization, we are building some cool stuff. Would love to chat about the compute . pullakhandam.siddartha@gmail.com Here is my email. Thanks!

·

Hello @Siddartha10 ,

Thank you for reaching out! I'm excited to hear about your work and the potential for collaboration.

To help assess how best to support your project, could you please share a bit more detail? Specifically:

  • Project Overview: A brief description of your project and its objectives.
  • Data Preparedness: Whether your data is ready for immediate use and the nature of this data.
  • Expected Outcomes: The goals or deliverables you anticipate achieving with this additional compute power.

Feel free to submit your details via this form Tally form (https://tally.so/r/w2xe0A) so we can proceed efficiently.

Looking forward to learning more about your project and potentially collaborating!

Best regards,
Louis

Can you share some resources with me? I'm looking to train a model on Hugging Face

Hey man, I have a strong performing finetuning method in mind tested it on qwen-2.5-1.5b and got insane responses (evaluating right now), but with my 3090 im limited, lets talk! Heres my discord: atanddev
Maybe we can iterate on the finetuning method too if you have time and interest!

·

Hello,

Thank you for reaching out. I'm interested in learning more about its potential applications and dataset specifics. To ensure we’re aligned on objectives and timelines, would you mind detailing a bit further on the following in the Tally form? (https://tally.so/r/w2xe0A)

  • Project Goals: What are the primary objectives for your model, and how do you envision deploying it?
  • Data and Compute Requirements: Could you outline the volume and nature of data you'd like to process and any specific requirements for H100 access?
  • Finetuning Method: I'd be interested to hear more about your finetuning approach. Do you have a plan for iterations or specific benchmarks in mind?

Please submit your responses via the form to streamline our discussion. Once we have the foundational details clarified, we can determine the next steps and see how best to leverage the Azure credits together.

Looking forward to exploring the possibilities.

Best regards, Louis

Hey man! I'm desperately trying to host the 70B reward model to score my datasets...

i have a project idea to build a reasoning model like o1, while there are many projects exist like that mine is a bit different, first of all I create some manual data myself and then train using a new formulation based on reinforce algorithm, its simple btw, I included a sample from my data below also, I can get help for these: 1. fine tuning models like mistral large or llama 3 using the data
2. creating more data using existing models which I already have a prompt ready, which will work like I prompt model to use this format and style of my samples I created manually, note, this does not work for hard questions, I have to manually do them sometimes, that samples is my longest sample:

: {
"user": "solve this sudoku: 000005920000902803000074000096210507057006000284050690670000409830000752040020316",
"model": {
"steps": {
"step1": "lets start with some one has most rows, 3x3 block or column filled",
"step2": "okay, 9. 3x3 block has only one number left",
"step3": "1, 2, 3, 4, 5, 6, 7, 9 are there",
…………….
"step105": "and they get filled by these numbers in order: 6 and 5",
"step106": "so full grid finished at end is: 763185924415962873928374165396218547157496238284753691672531489831649752549827316, and this is the answer."
},
"validity": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"answer": "The full grid as fully solved is: 763185924415962873928374165396218547157496238284753691672531489831649752549827316"
}
},