Metannoying”:

[W]hy do LLMs do [the tiresome “not X but Y” formulation]? Because of limitations on how models represent meaning. In vector space models, word meaning is defined by distributional context. Synonyms have high cosine similarity because they appear in similar sentences. Antonyms also have high cosine similarity, because they appear in identical sentences. “I like hot coffee” and “I like cold coffee” occupy the same distributional space. The models see that hot and cold are mathematically close. They do not inherently compute the oppositeness relation. One way to understand the “not X but Y” construction is as a workaround for the model’s inability to compute opposition the way humans do. By explicitly stating both the rejected term and the replacement, the model externalizes onto the page an operation it cannot perform internally.

The “corrective contrast” construction reduces ambiguity in the output space. Users want clarity. “Not X but Y” to the LLM is an insurance policy on clarity.

Luckily for the designers of LLMs, corrective contrast also sounds cool, memorable, and often profound, at least in moderation.

Classical rhetoric had a name for the deliberate version: metanoia, or correctio, the performed self-correction where a speaker revises mid-sentence to find the more precise or more forceful formulation. When Brutus tells the Roman crowd “Not that I loved Caesar less, but that I loved Rome more,” the audience holds “loved Caesar less” and suppresses the idea to receive the reframe. The delay and the cognitive cost is the point. Shakespeare knows the negated proposition will linger as a kind of understatement that makes the correction feel like an escalation.

But LLMs are not Shakespeare (yet) and there’s no rhetorical reason for it and worse, there’s no limiting function, which is why you can get “not x but y” every other paragraph. LLMs are corrective-contrast-maxxing for maximum comprehension across the widest possible readership.

The more skilled you are as a reader, the more this construction costs you.

Psycholinguistic research on negation shows that when readers encounter a negated proposition, they first simulate the affirmed state of affairs, building a model of “lack of resources,” before constructing a mental model of the actual, negated situation.

[T]he reader is paying twice: once to suppress, once to rebuild.

There’s a social cost on top of the cognitive cost. Linguists use the term “K-position” to describe the status of presupposed ignorance, the stance a speaker assigns to a listener who is assumed not to know something. Every instance of “not X but Y” places the reader in the K-position. The construction implies the reader was holding X and needed correction. When you, the reader, were not holding X, you feel like you’re being talked down to.

The more trust, the more the prose rewards fast processors. The less trust, the more it feels like being walked through eighth-grade math when you’re ready for calculus.