Alignment Whack-a-Mole:

Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

Experiment types:

(1) Cross-author: the model is finetuned exclusively on books by Haruki Murakami, then tested on books by entirely different authors.

(2) Within-author: the model is finetuned on a subset of books by an author, then tested on a held-out book by the same author.

Metric — BMC@5 (Book Memorization Coverage at 5-grams):

The fraction of words in a test book B covered by at least one verbatim extracted span. For each book excerpt, 100 generations are sampled from the finetuned model. All contiguous spans of ≥ k (= 5) matching words between each generation and the entire book are identified. Spans that overlap with the prompt instruction are trimmed (with threshold m = 5). The remaining matched positions form a binary coverage mask over the book. The final score is:

BMC@k = ∑ covered / |B|

Click a model to view its colored book text below.
highlighted = verbatim match (bmc@5, m=5) underlined = paragraph's longest span
Loading coloring data...

Book text shown for research purposes only. All books are copyrighted by their respective authors and publishers.

Generation text is truncated to memorized spans only; no full reproductions are displayed.