MATTHEW EYESAN

Mackin7
Ā·

AI & ML interests

Visions models Language Models ML Algorithms and Solutions Security and Data frameworks

Recent Activity

updated a dataset 14 days ago
Mackin7/my-distiset-5d7059ee
published a dataset 14 days ago
Mackin7/my-distiset-5d7059ee
updated a collection 23 days ago
FaceBChat_10
View all activity

Organizations

None yet

Mackin7's activity

New activity in Mackin7/my-distiset-55a6b53b about 1 month ago
reacted to fdaudens's post with šŸ”„ 4 months ago
reacted to clem's post with šŸ”„ 5 months ago
view post
Post
4135
Just crossed 200,000 free public AI datasets shared by the community on Hugging Face! Text, image, video, audio, time-series & many more... Thanks everyone!

http://hf.co/datasets
reacted to codelion's post with ā¤ļø 5 months ago
view post
Post
2048
We recently worked with OpenAI to fine-tune gpt-4o and built the SOTA model for the patched-codes/static-analysis-eval benchmark. All the code and data patched-codes/synth-vuln-fixes on how we did it is available on their GitHub - https://github.com/openai/build-hours/tree/main/5-4o_fine_tuning.

Here are some tips based on our experience:

ā†’ Establish baseline with "conditioning" / prompting

ā†’ Task-specific datasets are ideal for PEFT; hard to beat gpt-4o on "broad" tasks

ā†’ Add your best system prompt to each example

ā†’ Ensure training data distribution is similar to inference data

ā†’ Shorten instructions with concise prompts; may require more examples.

ā†’ Define clear evaluation metrics (seriously, please eval!)

You can see more details on the benchmark and process here - https://www.patched.codes/blog/the-static-analysis-evaluation-benchmark-measuring-llm-performance-in-fixing-software-vulnerabilities
updated a Space 7 months ago