A token is symbols of the vocabulary of the language.
Each token is a single atomic unit of the language.
The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it.
A token is:
The process of finding and categorizing tokens from an input stream is called “tokenizing” and is performed by a Lexer (Lexical analyzer).
Token represents symbols of the vocabulary of a language.
A token is the result of parsing the document down to the atomic elements generally of a language.
See also Natural Language - Token (Word|Term)
A token might be:
Example:
Consider the following programming expression:
sum = 3 + 2;
Tokenized in the following table:
| Token | |
|---|---|
| Lexeme | Lexeme type |
| sum | Identifier |
| = | Assignment operator |
| 3 | Integer literal |
| + | Addition operator |
| 2 | Integer literal |
| ; | End of statement |
A token that has a name is called an identifier
A symbol table is a table of all token with a name (ie an identifier)