lm-watermarking-main/experiments/README.md

[Stale/Deprecated] Experimental Pipeline Code

This subdirectory contains reproducibility artifacts for the experiments described in the paper. All code here is deprecated in favor of the implementation and demo in the root of the repository.

In effect, the file /watermark_processor.py in the root of the repo, is a clean, user friendly reimplementation of the watermarking and detection logic from watermark.py. We suggest using the official release version over any code found in the experiments directory.

Overview

Unless stated, all files discussed here are in the experiments directory. The bl naming convention across many variables and function definition refers to "blacklist". Black/white was the original language used in the development of the paper and was updated to green/red based on feed back from the community.

The implementation for the main experiments in the paper have two high level steps:

(1) generate watermarked samples
(2) compute metrics

The code provided here implements these steps in the following files: run_watermarking.py and process_rows.py, where the core logic is implemented in watermark.py a single file library.

Generally speaking, the code implementing the watermark itself is a series of classes and functions based on the LogitsProcessor abstraction from huggingface/transformers and the code that turns it into a workflow is based on the dataset.map functionality from huggingface/datasets.

The files io_utils.py, submitit_utils.py and launch.py contain utilites for file operations (mostly jsonl) and for hyperparameter sweeping via jobs launched on our compute cluster (managed using SLURM). The submitit workflow tool is an extra dependency only required if using launch.py.

Generation (`run_watermarking.py`)

run_watermarking.py is a command line script that:

loads a huggingface dataset that will be used to create text prompts for the language model
loads a huggingface language model that can perform text generation via model.generate, and prepares to call the generation method with a special LogitsProcessor that implements watermarking at the current hyperparameter values
composes a series of functions that are applied to the dataset via map that preprocess and tokenize the prompt data, and generate completions to it via the model
loads a second huggingface language model to be used as perplexity "oracle" for evaluating the quality of the texts generated by the watermarked model
Computes the teacher-forced loss (and perplexity) of the oracle model on the generated outputs

Here is an example of the argument set required to implement a single (representative) hyperparameter combination from the paper:

python run_watermarking.py \
    --model_name facebook/opt-1.3b \
    --dataset_name c4 \
    --dataset_config_name realnewslike \
    --max_new_tokens 200 \
    --min_prompt_tokens 50 \
    --limit_indices 500 \
    --input_truncation_strategy completion_length \
    --input_filtering_strategy prompt_and_completion_length \
    --output_filtering_strategy max_new_tokens \
    --dynamic_seed markov_1 \
    --bl_proportion 0.5 \
    --bl_logit_bias 2.0 \
    --bl_type soft \
    --store_spike_ents True \
    --num_beams 1 \
    --use_sampling True \
    --sampling_temp 0.7
    --oracle_model_name facebook/opt-2.7b \
    --run_name example_run \
    --output_dir ./all_runs \

The result of each run is a directory with three files in it:

gen_table_meta.json (hyperparameters passed from cmdline)
gen_table.jsonl
gen_table_w_metrics.jsonl

gen_table_w_metrics="generation table with metrics" meaning that it is the same as the first jsonl file in the lines/row dimension, but contains more columns/features, such as perplexity.

If you run multiple hyperparameter combinations, we suggest storing each of the run directories with those output files within one enclosing directory such as all_runs to facilitate the next step.

Computing Metrics (`process_rows.py`)

.. and merging hyperparameter runs by concatenation.

After running a few combinations of hyperparameters (individual runs of the run_watermarking.py script), the result is a bunch of directories, each containing a file full of model outputs (gen_table_w_metrics.jsonl).

To prepare to analyze the performance of the watermark, we enrich each one of these generation sets with more metrics and derived features. The script that accomplishes this is process_rows.py - each prompt, output pair is considered a "row".

The script isn't fully command line parameterized, but inside you can see that the main method looks into a directory (such as the all_runs suggested above) and collects all of the sub dirs that contain gen_table_w_metrics.jsonl files. Each set of generations is reloaded from jsonl into a huggingface Dataset object so that a metric computation function compute_bl_metrics can be applied to it.

This adds the critical fields like w_bl_whitelist_fraction which represent the raw measurement of the watermark presence. In the final analysis step, this is used compute a z-score and perform the detection hypothesis test.

Note: to clarify explicitly, compute_bl_metrics is therefore the old "detection" step of the pipeline. In this earlier version, there was no dedicated sub/class structure to share the logic of the watermark between a generation object and a detector object. It was just located within the score_sequence function of the watermark.py file.

The final step in process_rows.py is a concatenation of these results. Each gen_table_w_metrics.jsonl from a hyperparameter run (within an all_runs) is transformed into a new dataset with the watermark detection measurement, and then all of these dataset objects are concatenated in the row dimension, forming one large dataset that has the generations and metrics from all of the different hyperparameter settings that were run.

This object is shaped like (rows,columns) where samples=rows, and features=columns, and for the paper it had a size ~ (3e4,25) since there were about 30 to 40 hyperparameter settings and between 500-1000 generations per setting. Huggingface datasets conveniently implements a dataset.to_pandas() function and this allows us to treat this result as a dataframe and slice and dice it however we like during the analysis phase.

Analysis

The result of the above steps is a somewhat standard "datascience" format, a pandas.DataFrame and we suggest that you analyze it in whatever way you see fit. Since this part was very interactive and exploratory, there isn't a stable script version of this stage.

That said, the analysis code is in a notebook called watermarking_analysis.ipynb. Unfortunately, this notebook is monolithic. Pointers have been indicated as to which parts produce which figures. However, at this time, there is not a way to click once/run all and generate every chart and table from the paper.

A second notebook watermarking_example_finding.ipynb is solely for extracting some actual text prompts and outputs for tabulation in the paper.

[Stale/Deprecated] Experimental Pipeline Code

Overview

Generation (run_watermarking.py)

Computing Metrics (process_rows.py)

Analysis

Generation (`run_watermarking.py`)

Computing Metrics (`process_rows.py`)