SoftMiner Group

AI & ML interests

None defined yet.

Recent Activity

leowin updated a dataset 9 days ago

SoftMINER-Group/HarmEval

rimahazra authored a paper 10 days ago

Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

leowin updated a dataset 14 days ago

SoftMINER-Group/CulturalKaleidoscope_Preference

View all activity

SoftMINER-Group's activity

leowin

updated a dataset 9 days ago

SoftMINER-Group/HarmEval

Viewer • Updated 9 days ago • 550 • 53 • 3

rimahazra

authored a paper 10 days ago

Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

Paper • 2501.01872 • Published 20 days ago • 2

leowin

updated a dataset 14 days ago

SoftMINER-Group/CulturalKaleidoscope_Preference

Viewer • Updated 14 days ago • 30k • 16 • 2

leowin

updated a dataset 2 months ago

SoftMINER-Group/TechHazardQA

Viewer • Updated Nov 16, 2024 • 7.75k • 35 • 4

rimahazra

authored a paper 2 months ago

Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models

Paper • 2410.12880 • Published Oct 15, 2024 • 3

leowin

updated a dataset 3 months ago

SoftMINER-Group/CulturalKaleidoscope

Preview • Updated Oct 20, 2024 • 41 • 7

rimahazra

updated a dataset 6 months ago

SoftMINER-Group/NicheHazardQA

Viewer • Updated Jul 28, 2024 • 388 • 45 • 5

caprion

authored 2 papers 7 months ago

How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

Paper • 2402.15302 • Published Feb 23, 2024 • 4

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

Paper • 2406.11801 • Published Jun 17, 2024 • 16

rimahazra

authored a paper 7 months ago

Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance

Paper • 2406.11139 • Published Jun 17, 2024 • 13

caprion

authored a paper 7 months ago

Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance

Paper • 2406.11139 • Published Jun 17, 2024 • 13

leowin

authored 3 papers 7 months ago

Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance

Paper • 2406.11139 • Published Jun 17, 2024 • 13

SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models

Paper • 2406.12274 • Published Jun 18, 2024 • 15

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

Paper • 2406.11801 • Published Jun 17, 2024 • 16

rimahazra

updated a dataset 7 months ago

SoftMINER-Group/TechHazardQA

Viewer • Updated Nov 16, 2024 • 7.75k • 35 • 4

rimahazra

authored 2 papers 7 months ago

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

Paper • 2406.11801 • Published Jun 17, 2024 • 16

SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models

Paper • 2406.12274 • Published Jun 18, 2024 • 15

caprion

authored a paper 7 months ago

SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models

Paper • 2406.12274 • Published Jun 18, 2024 • 15

rimahazra

posted an update 7 months ago

Post

770

🔥 🔥 Releasing our new paper on AI safety alignment -- Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations 🎯 with Sayan Layek, Somnath Banerjee and Soujanya Poria.

👉 We propose Safety Arithmetic, a training-free framework enhancing LLM safety across different scenarios: Base models, Supervised fine-tuned models (SFT), and Edited models. Safety Arithmetic involves Harm Direction Removal (HDR) to avoid harmful content and Safety Alignment to promote safe responses.

👉 Paper: https://arxiv.org/abs/2406.11801v1
👉 Code: https://github.com/declare-lab/safety-arithmetic

rimahazra

authored a paper 11 months ago

How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

Paper • 2402.15302 • Published Feb 23, 2024 • 4