Corpus
A large, structured collection of text used for linguistic research or model training. A training corpus might include web pages, books, and code. The composition and quality of the corpus profoundly influence a language model's capabilities and biases.