Choms

Choms
ยท

AI & ML interests

None yet

Recent Activity

liked a model 22 days ago
perplexity-ai/r1-1776
liked a Space 4 months ago
Qwen/Qwen2.5-Coder-Artifacts
liked a Space 4 months ago
stabilityai/stable-diffusion-3.5-large
View all activity

Organizations

Amazon's profile picture ASYD Solutions's profile picture

Choms's activity

replied to TuringsSolutions's post 8 months ago
view reply

If you really think the issue is not charging money, you are in for a surprise...

reacted to singh96aman's post with ๐Ÿ”ฅ 9 months ago
view post
Post
2088
๐—๐˜‚๐—ฑ๐—ด๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—๐˜‚๐—ฑ๐—ด๐—ฒ๐˜€: ๐—˜๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—”๐—น๐—ถ๐—ด๐—ป๐—บ๐—ฒ๐—ป๐˜ ๐—ฎ๐—ป๐—ฑ ๐—ฉ๐˜‚๐—น๐—ป๐—ฒ๐—ฟ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐—ถ๐—ฒ๐˜€ ๐—ถ๐—ป ๐—Ÿ๐—Ÿ๐— ๐˜€-๐—ฎ๐˜€-๐—๐˜‚๐—ฑ๐—ด๐—ฒ๐˜€
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges (2406.12624)

๐‚๐š๐ง ๐‹๐‹๐Œ๐ฌ ๐ฌ๐ž๐ซ๐ฏ๐ž ๐š๐ฌ ๐ซ๐ž๐ฅ๐ข๐š๐›๐ฅ๐ž ๐ฃ๐ฎ๐๐ ๐ž๐ฌ โš–๏ธ?

We aim to identify the right metrics for evaluating Judge LLMs and understand their sensitivities to prompt guidelines, engineering, and specificity. With this paper, we want to raise caution โš ๏ธ to blindly using LLMs as human proxy.

Blog - https://huggingface.co/blog/singh96aman/judgingthejudges
Arxiv - https://arxiv.org/abs/2406.12624
Tweet - https://x.com/iamsingh96aman/status/1804148173008703509

@singh96aman @kartik727 @Srinik-1 @sankaranv @dieuwkehupkes
New activity in nerijs/pixel-art-xl 9 months ago

License

13
#7 opened over 1 year ago by
elie707