Spaces:
No application file
A newer version of the Gradio SDK is available:
5.6.0
[Stale/Deprecated] Experimental Pipeline Code
This subdirectory contains reproducibility artifacts for the experiments described in the paper. All code here is deprecated in favor of the implementation and demo in the root of the repository.
In effect, the file /watermark_processor.py
in the root of the repo, is a clean, user friendly reimplementation of the watermarking and detection logic from watermark.py
. We suggest using the official release version over any code found in the experiments
directory.
Overview
Unless stated, all files discussed here are in the experiments
directory. The bl
naming convention across many variables and function definition refers to "blacklist". Black/white was the original language used in the development of the paper and was updated to green/red based on feed back from the community.
The implementation for the main experiments in the paper have two high level steps:
- (1) generate watermarked samples
- (2) compute metrics
The code provided here implements these steps in the following files: run_watermarking.py
and process_rows.py
, where the core logic is implemented in watermark.py
a single file library.
Generally speaking, the code implementing the watermark itself is a series of classes and functions based on the LogitsProcessor
abstraction from huggingface/transformers and the code that turns it into a workflow is based on the dataset.map
functionality from huggingface/datasets.
The files io_utils.py
, submitit_utils.py
and launch.py
contain utilites for file operations (mostly jsonl
) and for hyperparameter sweeping via jobs launched on our compute cluster (managed using SLURM). The submitit
workflow tool is an extra dependency only required if using launch.py
.
Generation (run_watermarking.py
)
run_watermarking.py
is a command line script that:
- loads a huggingface
dataset
that will be used to create text prompts for the language model - loads a huggingface language model that can perform text generation via
model.generate
, and prepares to call the generation method with a specialLogitsProcessor
that implements watermarking at the current hyperparameter values - composes a series of functions that are applied to the dataset via
map
that preprocess and tokenize the prompt data, and generate completions to it via the model - loads a second huggingface language model to be used as perplexity "oracle" for evaluating the quality of the texts generated by the watermarked model
- Computes the teacher-forced loss (and perplexity) of the oracle model on the generated outputs
Here is an example of the argument set required to implement a single (representative) hyperparameter combination from the paper:
python run_watermarking.py \
--model_name facebook/opt-1.3b \
--dataset_name c4 \
--dataset_config_name realnewslike \
--max_new_tokens 200 \
--min_prompt_tokens 50 \
--limit_indices 500 \
--input_truncation_strategy completion_length \
--input_filtering_strategy prompt_and_completion_length \
--output_filtering_strategy max_new_tokens \
--dynamic_seed markov_1 \
--bl_proportion 0.5 \
--bl_logit_bias 2.0 \
--bl_type soft \
--store_spike_ents True \
--num_beams 1 \
--use_sampling True \
--sampling_temp 0.7
--oracle_model_name facebook/opt-2.7b \
--run_name example_run \
--output_dir ./all_runs \
The result of each run is a directory with three files in it:
gen_table_meta.json
(hyperparameters passed from cmdline)gen_table.jsonl
gen_table_w_metrics.jsonl
gen_table_w_metrics
="generation table with metrics" meaning that it is the same as the first jsonl
file in the lines/row dimension, but contains more columns/features, such as perplexity.
If you run multiple hyperparameter combinations, we suggest storing each of the run directories with those output files within one enclosing directory such as all_runs
to facilitate the next step.
Computing Metrics (process_rows.py
)
.. and merging hyperparameter runs by concatenation.
After running a few combinations of hyperparameters (individual runs of the run_watermarking.py
script), the result is a bunch of directories, each containing a file full of model outputs (gen_table_w_metrics.jsonl
).
To prepare to analyze the performance of the watermark, we enrich each one of these generation sets with more metrics and derived features. The script that accomplishes this is process_rows.py
- each prompt, output pair is considered a "row".
The script isn't fully command line parameterized, but inside you can see that the main method looks into a directory (such as the all_runs
suggested above) and collects all of the sub dirs that contain gen_table_w_metrics.jsonl
files. Each set of generations is reloaded from jsonl
into a huggingface Dataset
object so that a metric computation function compute_bl_metrics
can be applied to it.
This adds the critical fields like w_bl_whitelist_fraction
which represent the raw measurement of the watermark presence. In the final analysis step, this is used compute a z-score and perform the detection hypothesis test.
Note: to clarify explicitly, compute_bl_metrics
is therefore the old "detection" step of the pipeline. In this earlier version, there was no dedicated sub/class structure to share the logic of the watermark between a generation object and a detector object. It was just located within the score_sequence
function of the watermark.py
file.
The final step in process_rows.py
is a concatenation of these results. Each gen_table_w_metrics.jsonl
from a hyperparameter run (within an all_runs
) is transformed into a new dataset with the watermark detection measurement, and then all of these dataset objects are concatenated in the row dimension, forming one large dataset that has the generations and metrics from all of the different hyperparameter settings that were run.
This object is shaped like (rows,columns) where samples=rows, and features=columns, and for the paper it had a size ~ (3e4,25) since there were about 30 to 40 hyperparameter settings and between 500-1000 generations per setting. Huggingface datasets conveniently implements a dataset.to_pandas()
function and this allows us to treat this result as a dataframe and slice and dice it however we like during the analysis phase.
Analysis
The result of the above steps is a somewhat standard "datascience" format, a pandas.DataFrame
and we suggest that you analyze it in whatever way you see fit. Since this part was very interactive and exploratory, there isn't a stable script version of this stage.
That said, the analysis code is in a notebook called watermarking_analysis.ipynb
. Unfortunately, this notebook is monolithic. Pointers have been indicated as to which parts produce which figures. However, at this time, there is not a way to click once/run all and generate every chart and table from the paper.
A second notebook watermarking_example_finding.ipynb
is solely for extracting some actual text prompts and outputs for tabulation in the paper.