Constructs a word state with a default idea of what characters are admissible inside a word (as described in the class comment).
Clears definitions of word chars.
Ignore word (such as blanks and tabs), and return the tokenizer's next token.
A textual string to be tokenized.
A tokenizer class that controls the process.
The next token from the top of the stream.
Establish characters in the given range as valid characters for part of a word after the first character. Note that the tokenizer must determine which characters are valid as the beginning character of a word.
First character index of the interval.
Last character index of the interval.
true
if this state should use characters in the given range.
Generated using TypeDoc
A wordState returns a word from a scanner. Like other states, a tokenizer transfers the job of reading to this state, depending on an initial character. Thus, the tokenizer decides which characters may begin a word, and this state determines which characters may appear as a second or later character in a word. These are typically different sets of characters; in particular, it is typical for digits to appear as parts of a word, but not as the initial character of a word.
By default, the following characters may appear in a word. The methodsetWordChars()
allows customizing this.as well as: minus sign, underscore, and apostrophe.