Upload 3 files

ad32eb7 verified 27 days ago

5.93 kB

	Documentation for the scripts in the `scripts` directory, starting with `batch-caption.py`, which is used to run JoyCaption in bulk. Other scripts might be added in the future.

	# batch-caption.py

	## Basic Command

	To run the script, use the following command:

	```sh
	./batch-caption.py --glob "path/to/images/*.jpg" --prompt "Write a descriptive caption for this image in a formal tone."
	```

	This command will caption all the `.jpg` images in the specified directory using the provided prompt, writing `.txt` files alongside each image.

	## Command-Line Arguments

	Note: You must specify either `--glob` or `--filelist` or `--input` to provide images, and either `--prompt` or `--prompt-file` to provide a prompt for caption generation.

	\| Argument \| Description \| Default \|
	\| ------------------ \| ---------------------------------------------------------- \| --------------------------------------------------------------------------------------------------------------------------- \|
	\| `--input` \| Input images \| N/A \|
	\| `--glob` \| Glob pattern to find images \| N/A \|
	\| `--filelist` \| File containing a list of images \| N/A \|
	\| `--prompt` \| Prompt to use for caption generation \| N/A \|
	\| `--prompt-file` \| JSON file containing prompts \| N/A \|
	\| `--batch-size` \| Batch size for image processing \| 1 \|
	\| `--greedy` \| Use greedy decoding instead of sampling \| False \|
	\| `--temperature` \| Sampling temperature (used when not using greedy decoding) \| 0.6 \|
	\| `--top-p` \| Top-p sampling value (nucleus sampling) \| 0.9 \|
	\| `--top-k` \| Top-k sampling value \| None \|
	\| `--max-new-tokens` \| Maximum length of the generated caption (in tokens) \| 256 \|
	\| `--num-workers` \| Number of workers loading images in parallel \| 4 \|
	\| `--model` \| Pre-trained model to use \| [John6666/llama-joycaption-alpha-two-hf-llava-nf4](https://huggingface.co/John6666/llama-joycaption-alpha-two-hf-llava-nf4) \|
	\| `--bf16` \| Load model on torch.bfloat16 \| False \|


	### Examples

	1. Caption images with a specific prompt

	```sh
	./batch-caption.py --glob "images/*.png" --prompt "Write a descriptive caption for this image in a formal tone."
	```
	or
	```sh
	./batch-caption.py --input "images/dog.png" --prompt "Write a descriptive caption for this image in a formal tone."
	```

	2. Use a JSON file for prompts

	```sh
	python batch-caption.py --filelist "image_paths.txt" --prompt-file "prompts.json"
	```

	3. Use Greedy Decoding

	```sh
	python batch-caption.py --glob "images/*.jpg" --prompt "Write a descriptive caption for this image in a formal tone." --greedy
	```

	## Prompt Handling

	- For a list of prompts that the model understands, please refer to the project's root README.

	- You can specify a prompt directly using the `--prompt` argument or use a JSON file containing a list of prompts with weights using `--prompt-file`.

	- If multiple prompts are specified in the prompt file, the prompt used for each image will be randomly selected.

	- Prompt File Format: The JSON file should contain either strings or objects with `prompt` and `weight` fields.

	- Weighting: The `weight` field indicates the probability of selecting a particular prompt during caption generation. Higher weights make a prompt more likely to be chosen. For example, if one prompt has a weight of 2.0 and another has a weight of 1.0, the first prompt will be twice as likely to be used.

	Example `prompts.json`:

	```json
	[
	{ "prompt": "Describe the scene in detail.", "weight": 2.0 },
	{ "prompt": "Summarize the main elements of the image.", "weight": 1.0 }
	]
	```

	## Output

	- Captions are saved as `.txt` files in the same directory as the corresponding image.
	- If a `.txt` caption file already exists for an image, the script will skip that image.