## [Stale/Deprecated] Experimental Pipeline Code This subdirectory contains reproducibility artifacts for the experiments described in the paper. All code here is deprecated in favor of the implementation and demo in the root of the repository. In effect, the file `/watermark_processor.py` in the root of the repo, is a clean, user friendly reimplementation of the watermarking and detection logic from `watermark.py`. We suggest using the official release version over any code found in the `experiments` directory. ## Overview Unless stated, all files discussed here are in the `experiments` directory. The `bl` naming convention across many variables and function definition refers to "blacklist". Black/white was the original language used in the development of the paper and was updated to green/red based on feed back from the community. The implementation for the main experiments in the paper have two high level steps: - **(1) generate watermarked samples** - **(2) compute metrics** The code provided here implements these steps in the following files: `run_watermarking.py` and `process_rows.py`, where the core logic is implemented in `watermark.py` a single file library. Generally speaking, the code implementing the watermark itself is a series of classes and functions based on the `LogitsProcessor` abstraction from [huggingface/transformers](https://github.com/huggingface/transformers) and the code that turns it into a workflow is based on the `dataset.map` functionality from [huggingface/datasets](https://github.com/huggingface/datasets). The files `io_utils.py`, `submitit_utils.py` and `launch.py` contain utilites for file operations (mostly `jsonl`) and for hyperparameter sweeping via jobs launched on our compute cluster (managed using [SLURM](https://slurm.schedmd.com/documentation.html)). The [`submitit`](https://github.com/facebookincubator/submitit) workflow tool is an extra dependency only required if using `launch.py`. ## Generation (`run_watermarking.py`) `run_watermarking.py` is a command line script that: 1. loads a huggingface `dataset` that will be used to create text prompts for the language model 2. loads a huggingface language model that can perform text generation via `model.generate`, and prepares to call the generation method with a special `LogitsProcessor` that implements watermarking at the current hyperparameter values 3. composes a series of functions that are applied to the dataset via `map` that preprocess and tokenize the prompt data, and generate completions to it via the model 4. loads a second huggingface language model to be used as perplexity "oracle" for evaluating the quality of the texts generated by the watermarked model 5. Computes the teacher-forced loss (and perplexity) of the oracle model on the generated outputs Here is an example of the argument set required to implement a single (representative) hyperparameter combination from the paper: ``` python run_watermarking.py \ --model_name facebook/opt-1.3b \ --dataset_name c4 \ --dataset_config_name realnewslike \ --max_new_tokens 200 \ --min_prompt_tokens 50 \ --limit_indices 500 \ --input_truncation_strategy completion_length \ --input_filtering_strategy prompt_and_completion_length \ --output_filtering_strategy max_new_tokens \ --dynamic_seed markov_1 \ --bl_proportion 0.5 \ --bl_logit_bias 2.0 \ --bl_type soft \ --store_spike_ents True \ --num_beams 1 \ --use_sampling True \ --sampling_temp 0.7 --oracle_model_name facebook/opt-2.7b \ --run_name example_run \ --output_dir ./all_runs \ ``` The result of each run is a directory with three files in it: - `gen_table_meta.json` (hyperparameters passed from cmdline) - `gen_table.jsonl` - `gen_table_w_metrics.jsonl` `gen_table_w_metrics`="generation table with metrics" meaning that it is the same as the first `jsonl` file in the lines/row dimension, but contains more columns/features, such as perplexity. If you run multiple hyperparameter combinations, we suggest storing each of the run directories with those output files within one enclosing directory such as `all_runs` to facilitate the next step. ## Computing Metrics (`process_rows.py`) .. and merging hyperparameter runs by concatenation. After running a few combinations of hyperparameters (individual runs of the `run_watermarking.py` script), the result is a bunch of directories, each containing a file full of model outputs (`gen_table_w_metrics.jsonl`). To prepare to analyze the performance of the watermark, we enrich each one of these generation sets with more metrics and derived features. The script that accomplishes this is `process_rows.py` - each prompt, output pair is considered a "row". The script isn't fully command line parameterized, but inside you can see that the main method looks into a directory (such as the `all_runs` suggested above) and collects all of the sub dirs that contain `gen_table_w_metrics.jsonl` files. Each set of generations is reloaded from `jsonl` into a huggingface `Dataset` object so that a metric computation function `compute_bl_metrics` can be applied to it. This adds the critical fields like `w_bl_whitelist_fraction` which represent the raw measurement of the watermark presence. In the final analysis step, this is used compute a z-score and perform the detection hypothesis test. **_Note_**: to clarify explicitly, `compute_bl_metrics` is therefore the old "detection" step of the pipeline. In this earlier version, there was no dedicated sub/class structure to share the logic of the watermark between a generation object and a detector object. It was just located within the `score_sequence` function of the `watermark.py` file. The final step in `process_rows.py` is a concatenation of these results. Each `gen_table_w_metrics.jsonl` from a hyperparameter run (within an `all_runs`) is transformed into a new dataset with the watermark detection measurement, and then all of these dataset objects are concatenated in the row dimension, forming one large dataset that has the generations and metrics from all of the different hyperparameter settings that were run. This object is shaped like (rows,columns) where samples=rows, and features=columns, and for the paper it had a size ~ (3e4,25) since there were about 30 to 40 hyperparameter settings and between 500-1000 generations per setting. Huggingface datasets conveniently implements a `dataset.to_pandas()` function and this allows us to treat this result as a dataframe and slice and dice it however we like during the analysis phase. ## Analysis The result of the above steps is a somewhat standard "datascience" format, a `pandas.DataFrame` and we suggest that you analyze it in whatever way you see fit. Since this part was very interactive and exploratory, there isn't a stable script version of this stage. That said, the analysis code is in a notebook called `watermarking_analysis.ipynb`. Unfortunately, this notebook is monolithic. Pointers have been indicated as to which parts produce which figures. However, at this time, there is not a way to click once/run all and generate every chart and table from the paper. A second notebook `watermarking_example_finding.ipynb` is solely for extracting some actual text prompts and outputs for tabulation in the paper.