Workaround for the corrupted jsons.

by qvrty - opened May 4, 2023

May 4, 2023

I was running on WSL to try and avoid the encoding issues mentioned by the author, but couldn't seem to get the three config.json files to xor correctly. All the other files correctly converted and had the correct hashes, but tokenizer_config.json, config.json and generation_config.json all ended up with corrupted gibberish. After a decent bit of trial and error, I decided to try a wild shot in the dark and just used the original un-xored copies of the relevant files, and threw it into oobabooga. With manually specifying the model type as llama, it loaded up and began producing output. I'm not sure what changes were originally intended for those files, but this hopefully can help someone else get the model running.

sslx

May 5, 2023

I tried downloading on Ubuntu, MacOS, Windows after installing git lfs, and they all come out corrupt.
I don't think it's an OS issue. Please fix this.

11b

Pygmalion org May 5, 2023

Indeed - my bad everyone. A couple of the JSONs seem to have contained local paths to my machine so they become corrupted when XORring anywhere else, which was kind of a stupid oversight. I'll try to re-do them over the weekend.

As a temporary workaround, you can just copy-paste the JSONs from the base LLaMA HF conversion and the model will work correctly. If using Kobold and you want proper EOS behavior, just edit the config.json to add "badwordsids": [[0]] inside the root object.

sslx

May 7, 2023

If I copy-paste the JSONs from the base LLaMA HF conversion, would that impact quantizing in any way? In other words, do I need to requantize when you release the fix?
Thanks!

shanginn

May 7, 2023

got the same issue on the macos.
can you just release JSON's as is? they should not be copyrighted I assume.

and thank you for you time!

KKcorps

May 7, 2023

This comment has been hidden

11b

Pygmalion org May 8, 2023

Just pushed a commit that keeps the JSON files raw. They were originally XORred because that's how the OASST script does it - but I doubt Meta will give us any legal trouble for the JSONs.

I'll close this discussion for now, but please don't hesitate to open a new one if something's still broken.

11b changed discussion status to closed May 8, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment