lodrick-the-lafted
commited on
Commit
•
0a6a982
1
Parent(s):
dc131bc
Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@ v1 & v2 were destined for the bit bucket <br/>
|
|
3 |
<br/>
|
4 |
Each of these use different vectors and have some variations in where the new refusal boundaries lie. None of them seem totally jailbroken.
|
5 |
|
6 |
-
Advantage: only need to
|
7 |
Disadvantage: using the difference of means method is precisely targetted, while this method requires filtering for interesting control vectors from a selection of prompts
|
8 |
|
9 |
[https://huggingface.co/lodrick-the-lafted/llama-3.1-8b-instruct-ortho-v3](https://huggingface.co/lodrick-the-lafted/llama-3.1-8b-instruct-ortho-v3) <br/>
|
|
|
3 |
<br/>
|
4 |
Each of these use different vectors and have some variations in where the new refusal boundaries lie. None of them seem totally jailbroken.
|
5 |
|
6 |
+
Advantage: only need to alter down_proj for one layer, so there is usually very little brain damage. <br/>
|
7 |
Disadvantage: using the difference of means method is precisely targetted, while this method requires filtering for interesting control vectors from a selection of prompts
|
8 |
|
9 |
[https://huggingface.co/lodrick-the-lafted/llama-3.1-8b-instruct-ortho-v3](https://huggingface.co/lodrick-the-lafted/llama-3.1-8b-instruct-ortho-v3) <br/>
|