Alignment Whack-a-Mole :

Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

Xinyue Liu

Niloofar Mireshghallah

Jane C. Ginsburg

Tuhin Chakrabarty

Paper Code

Experiment types

(1) Cross-author: the model is finetuned exclusively on books by Haruki Murakami, then tested on books by entirely different authors.

(2) Within-author: the model is finetuned on a subset of books by an author, then tested on a held-out book by the same author.

Metric: bmc@5 (Book Memorization Coverage at 5-grams)

The fraction of words in a test book B covered by at least one verbatim extracted span. For each book excerpt, 100 generations are sampled from the finetuned model. All contiguous spans of ≥ k (= 5) matching words between each generation and the entire book are identified. Spans that overlap with the prompt instruction are trimmed (with threshold m = 5). The remaining matched positions form a binary coverage mask over the book. The final score is:

bmc@k = ∑ covered / |B|

Citation

@misc{liu2026alignmentwhackamolefinetuning,
  title={Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models},
  author={Xinyue Liu and Niloofar Mireshghallah and Jane C. Ginsburg and Tuhin Chakrabarty},
  year={2026},
  eprint={2603.20957},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.20957}
}