Commit History
fix double eos token for chatml (#1054) [skip ci]
651b7a3
unverified
winglian
commited on
Create preprocess CLI (#785)
e50ab07
unverified
casperhansen
commited on
improve: Enhance code readability of prompt_tokenizers.py (#707)
3a99495
unverified
seungduk
commited on
misc sharegpt fixes (#723)
f30afe4
unverified
winglian
commited on
don't strip the prompt for check since we don't strip to tokenize anymore (#650)
8662e8f
unverified
winglian
commited on
use fastchat conversations template (#578)
e7d3e2d
unverified
winglian
commited on
better handling and logging of empty sharegpt turns (#603)
a363604
unverified
winglian
commited on
split completion text to sequence_len (#616)
97d3776
unverified
winglian
commited on
improve handling for empty text on the tokenization step (#502)
1eebbd0
unverified
winglian
commited on
support custom field for completion from yml (#580)
f7a2263
unverified
winglian
commited on
improve llama pad token handling (#475)
cb9797e
unverified
winglian
commited on
gracefully handle empty input (#442)
9d629d8
unverified
winglian
commited on
better handling of empty input ids when tokenizing (#395)
85cf4f8
unverified
winglian
commited on
better handling since xgen tokenizer breaks with convert_tokens_to_ids
2a428e8
winglian
commited on
Adding logging enhancement
553a86b
theobjectivedad
commited on
Fix typing list
77bdb7d
unverified
Nanobit
commited on
initial wip to get sys prompt from dataset
8d20e0a
winglian
commited on
bugfix for potential off by one
7925ddc
winglian
commited on
Fix sharegpt prompt
25eeeeb
Nanobit
commited on
Fix security issue or ignore false positives
a1f9850
Nanobit
commited on
Apply isort then black
37293dc
Nanobit
commited on
Fix mypy typing
e9650d3
Nanobit
commited on
Fix unsupported operand type(s) for |
be22551
Nanobit
commited on
Refactor duplicate code between Prompter and Pygmalion
8e46c0f
Nanobit
commited on
Lint prompt_tokenizers
5d86137
Nanobit
commited on
refactor conversation plucking in sharegpt
21c8e2d
winglian
commited on
apply black formatting
ce34d64
winglian
commited on
tokenization fixes
4ea9a66
winglian
commited on
optionally be able to specify alpaca or chat style prompts
1d5ab84
winglian
commited on
concise multiple choice and tldr summarize
1365073
winglian
commited on
add alpaca multiple choice instruct dataset support
b46bc02
winglian
commited on
fix prompters, especially the sharegpt prompter
5e37144
winglian
commited on
black formatting
2bc1a5b
winglian
commited on
Rename variable to use same convention
174b74d
Nanobit
commited on
Add CompletionPrompt type
cf68153
Nanobit
commited on
Jeopardy bot! (#17)
a12fb0a
unverified
winglian
commited on
WIP large refactor to make finetune script a little more manageable (#3)
6045345
unverified
winglian
commited on
add support for alpaca reflect training (#2)
81de0ef
unverified
winglian
commited on
Tokenization open assistant (#1)
87d7825
unverified
winglian
commited on
suppport for alpaca-like instruction datasets without inputs
e107643
winglian
commited on
config chooser, update readme instructions, device config, llama flash attention, debug out the labels, fix config key checks, other bugfixes
f2a2029
winglian
commited on
black formatting
a6028d3
winglian
commited on
make it work with pythia in the cloud
8d959a7
winglian
commited on
WIP for axolotl trainer
ce24f5e
winglian
commited on