Token
In AI and natural language processing (NLP), a token is the smallest unit into which text or a sentence is split for processing. Generally, words, symbols, punctuation marks, and even parts of words (subwords) are each treated as a single token. For example, when processing the English sentence "AI is amazing!", a model might split it into "AI," " is," " amazing," and "!"—four tokens. A token is the processing unit when inputting text to an AI model, and is closely related to the model's input limit (maximum token count) and its pricing structure. For example, large language models (LLMs) such as ChatGPT and Claude count usage as "X tokens per message," and there is also an upper limit on the number of tokens in the output. Key concepts related to tokens: • Tokenizer: A pre-processor that splits text into tokens • Subword units: Breaking complex or unknown words into smaller pieces for greater flexibility • Token limit: When the total number of input and output tokens exceeds the limit, errors or truncation occur Note that "token" is also a term used in Web3 and security contexts, but as an AI term it specifically means **the basic unit of text processing**. In AI-powered applications, correctly understanding and managing token counts is essential for reasoning about processing accuracy, cost, and performance.