Colloquium: "Transformers and the Format of Thought"
Abstract: Transformers are an extraordinarily powerful computational architecture, applicable across a range of domains. They are, notably, the computational foundation of contemporary Large Language Models (LLMs). LLMs’ facility with language have led many to draw analogies between LLMs and human cognitive processing. This paper investigates the format of the residual stream in transformers. I first give a definition of representational format in terms of the operations and guarantees supported by a representation in a computational systems. Then, drawing out the consequences of what seems like an innocuous step—the need for positional encoding of the input to LLMs—I argue that transformers are broad precisely because they have so little built-in representational structure; their format is far weaker than what has been proposed for the format of cognition. This naturally raises questions about the need for structured representations and what (if any) advantage structured representations might have over mere representation of structure.