AbstractTokenizer

Defines a general tokenizer.

Implements: ITokenizer

Description

The AbstractTokenizer class defines a general tokenizer.

Fields

_last_token_type

Last token type

_last_token_type: TokenType = TokenType.Unknown

_next_token

Next token

_next_token: Token

_scanner

Scanner

_scanner: IScanner

comment_state

Comment state

comment_state: ICommentState

decode_strings

Boolean that defines the option to decode strings or not.

decode_strings: bool

merge_whitespaces

Boolean that defines the option to unify white spaces.

merge_whitespaces: bool

number_state

Number state

number_state: INumberState

quote_state

Quote state

quote_state: IQuoteState

skip_comments

Boolean that defines the option to skip comments.

skip_comments: bool

skip_eof

Boolean that defines the option to skip EOF.

skip_eof: bool

skip_unknown

Boolean that defines the option to skip unknowns.

skip_unknown: bool

skip_whitespaces

Boolean that defines the option to skip white spaces.

skip_whitespaces: bool

symbol_state

Symbol state

symbol_state: ISymbolState

unify_numbers

Boolean that defines the option to unify numbers.

unify_numbers: bool

whitespace_state

White space state.

whitespace_state: IWhitespaceState

word_state

Word state.

word_state: IWordState

Properties

scanner

Scanner

scanner(): IScanner

scanner(value: IScanner)

Instance methods

clear_character_states

Clears all character states.

clear_character_states()

get_character_state

Gest the state for a given character.

get_character_state(symbol: int): ITokenizerState

has_next_token

Finds out if the tokenizer has a next token.

has_next_token(): bool

  • returns: bool - true if it has a next token, false otherwise.

next_token

Gets the next token.

next_token(): Token

  • returns: Token - next token

read_next_token

Reads the next token.

read_next_token(): Token

  • returns: Token - next token

set_character_state

Sets the characters' state.

set_character_state(from_symbol: int, to_symbol: int, state: ITokenizerState): void

  • from_symbol: int - first symbol
  • to_symbol: int - last symbol
  • state: ITokenizerState - tokenizer state

tokenize_buffer

Provides a token for a string buffer.

tokenize_buffer(buffer: str): List[Token]

  • buffer: str - buffer
  • returns: List[Token] - token

tokenize_buffer_to_strings

Creates a list of token values.

tokenize_buffer_to_strings(buffer: str): List[str]

  • buffer: str - buffer
  • returns: List[str] - list of token values

tokenize_stream

Creates a list of tokens

tokenize_stream(scanner: IScanner): List[Token]

  • scanner: IScanner - scanner
  • returns: List[Token] - list of tokens

tokenize_stream_to_strings

Creates a list of token values.

tokenize_stream_to_strings(scanner: IScanner): List[str]

  • scanner: IScanner - scanner
  • returns: List[str] - list of token values