Description
The GenericWordState class allows you to create a WordState that returns a word from a scanner.
Important points
- Like other states, a tokenizer transfers the job of reading to this state, depending on an initial character.
- This state determines which characters may appear as a second or later character in a word. These are typically different sets of characters. In particular, it is typical for digits to appear as parts of a word, but not as the initial character of a word.
- By default, the following characters may appear in a word (The method setWordChars() allows customizing this):
As well as: minus sign, underscore, and apostrophe.From To 'a', 'z' 'A', 'Z' '0', '9'
Constructors
NewGenericWordState
Constructs a word state with a default idea of what characters are admissible inside a word (as described in the class comment).
NewGenericWordState() *GenericWordState
Methods
ClearWordChars
Clears definitions of word chars.
c *GenericWordState) ClearWordChars()
NextToken
Ignores a word (such as blanks and tabs), and returns the tokenizer’s next token.
(c *GenericWordState) NextToken(scanner IScanner, tokenizer ITokenizer) *Token
- scanner: IScanner - textual string to be tokenized.
- tokenizer: ITokenizer - tokenizer class that controls the process.
- returns: *Token - next token from the top of the stream.
SetWordChars
Establishes characters in the given range as valid characters for part of a word after the first character. Note that the tokenizer must determine which characters are valid as the beginning character of a word.
(c *GenericWordState) SetWordChars(fromSymbol rune, toSymbol rune, enable bool)
- fromSymbol: rune - first character index of the interval.
- toSymbol: rune - last character index of the interval.
- enable: bool - true if this state should ignore characters in the given range.