The GenericSymbolState class allows you to add multi-character symbols and obtain a symbol token from a scanner.
Description
The GenericSymbolState class allows you to add multi-character symbols and obtain a symbol token from a scanner.
Important points
- The idea of a symbol is a character that stands on its own, such as an ampersand or a parenthesis.
- For example, when tokenizing the expression (isReady)& (isWilling), a typical tokenizer would return 7 tokens, including one for each parenthesis and one for the ampersand. Thus a series of symbols such as )&( becomes three tokens, while a series of letters such as isReady becomes a single word token.
- Multi-character symbols are an exception to the rule that a symbol is a standalone character.
- For example, a tokenizer may want less-than-or-equals to tokenize as a single token. This class provides a method for establishing which multi-character symbols an object of this class should treat as single symbols. This allows, for example, “cat <= dog” to tokenize as three tokens, rather than splitting the less-than and equals symbols into separate tokens.
- By default, this state recognizes the following multi-character symbols: !=, :-, <=, >=
Constructors
NewGenericSymbolState
Creates new instance of the component
NewGenericSymbolState() *GenericSymbolState
Methods
Add
Adds a multi-character symbol.
(c *GenericSymbolState) Add(value string, tokenType int)
- value: string - symbol to add, such as "=:="
- tokenType: int - type of token (TokenType)
NextToken
Returns a symbol token from a scanner.
(c *GenericSymbolState) NextToken(scanner IScanner, tokenizer ITokenizer) *Token
- scanner: IScanner - text string to be tokenized.
- tokenizer: ITokenizer - tokenizer class that controls the process.
- returns: *Token - next token from the top of the stream.