What exactly does xor_codec.py do?
I read through the code and I didn't follow what you were trying to do. thanks!
I won't speak for all the parts of it, but the general idea appears to be OpenAssistant ships weights that aren't actually the weights, but become the weights when XOR'd against the original LLaMa weights. This one uses numpy bitwise_xor().
Due to the license attached to LLaMa models by Meta AI it is not possible to directly distribute LLaMa-based models. Instead we provide XOR weights for the OA models....To use OpenAssistant LLaMa-Based Models, you need to have a copy of the original LLaMa model weights and add them to a llama subdirectory here.
This is basically saying we aren't allowed to give you the list [1,2,3,4,5] or our version of it [1,2,5,6,9] but we can give you [0, 0, 6, 2, 12] and if you XOR it against the original list...which we won't give you, you'll get our modified version
In other words, they won't give you LLaMa weights or the LLaMa based oasst-sft-6-llama-30b weights, but they will give you a way to create them yourself if you already have the LLaMa weights.
Toy example:
llama_weights = [1, 2, 3, 4, 5]
oa_xor_weights = [0, 0, 6, 2, 12]
# the actual desired weights of the model are [1, 2, 5, 6, 9]
# zip the two weights and perform an XOR, 1^0, 2^0, 3^6, 4^2, 5^12
oasst_sft_6_llama_30b = [a^b for a, b in zip(llama_weights, oa_xor_weights)]
# oasst_sft_6_llama_30b = [1, 2, 5, 6, 9]
Back to the actual code...so after all the initial steps in the model card instructions, the xor_codec.py
file gets run from the command line like this python xor_codec.py oasst-sft-6-llama-30b/ oasst-sft-6-llama-30b-xor/ llama30b_hf/
This passes the paths to both the original LLaMa weights and the OA "weights" to the xor_decode()
function here unless one of the flags is change from their defaults.
I've added some comments below
def xor_decode(dst, src_payload, src_base, block_size=4096):
# open both the LLaMa weights and the OA xor weights as bytes
fp_payload = gzip.open(src_payload, 'rb')
fp_base = open(src_base, 'rb')
# create/open a file at the destination path
with open(dst, 'wb') as fp:
while True:
# loop through both weights in block_size/4096 byte increments, create numpy arrays for each
buf1 = numpy.array(bytearray(fp_payload.read(block_size)), dtype=numpy.uint8)
buf2 = numpy.array(bytearray(fp_base.read(block_size)), dtype=numpy.uint8)
# Padding seems to just check that buf1 and buf2 are the same length and adds zeros or trims an array to make them same length if they're not.
padding = len(buf1) - len(buf2)
if padding > 0: buf2 = numpy.pad(buf2, (0, padding), 'constant', constant_values=(0,))
if padding < 0: buf2 = buf2[:len(buf1)]
# Then uses numpy bitwise_xor which performs an XOR between the chunks of the two weights. The resulting array is assigned to `buf` variable
buf = numpy.bitwise_xor(buf1, buf2)
# then write that array to the file and continue with the next loop until both files are exhausted
fp.write(buf)
if len(buf1) < block_size: break
fp_payload.close()
fp_base.close()
The end result is the XOR operation between the original LLaMA weights and the OA XOR weights, which means the XOR of the two spits out the real weights for the model.
Hopefully that helps and I didn't misinterpret it
Or you know... Meta could just support open research.
Or you know... Meta could just support open research.
Yep, but they don't, so for now we distribute XORs to comply with their license :)