Prompt changes, scoring tweaks, provider routing edits, and service-layer changes all silently affect documentation quality. Without a fixed benchmark, "I think this got better" is the entire ...