Ok, I have maybe finished the tokenizer class now. Dont think its perfect, far from it. So I thought I would post it here and if anyone has any input feel free to give it
. It is meant to be a part of a simple scripting language I am making. The tokenizer is used to divide a source file into a stream of tokens, which can then be used to build an abstract syntax tree. Have some own issues with the current implementation of the class. For example I use some static inner classes and then I create static instances of them, which feels odd (they are stateless, so I figured its better to have one instance anyway rather then to create a new one anytime they are needed... which is a lot). Also I am rather terrible at comments.
package growse.parser;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.Reader;
import java.util.NoSuchElementException;
import java.util.Objects;
/**
* Used to read a series of tokens from a reader
*
* @author anders
*/
class Tokenizer {
private Reader in;
private Token tok;
private int currentCh;
private int lineNum;
/**
* Constructor, accepts a reader object
*
* @param in the reader object used as input
*/
public Tokenizer(BufferedReader in) throws IOException {
Objects.requireNonNull(in);
this.in = in;
lineNum = 1;
nextCoI(true);
}
/**
* Moves on to the next token and return if the operation
* was successful
*
* @return if there was more tokens
*/
public boolean next() throws IOException, IllegalCharException {
// Get the next character
ignoreCoW(false);
if (currentCh == EOF) {
tok = null;
return false;
}
// Get the next token
if (CharUtils.isWord(currentCh)) {
String word = buildString(WORD_COND);
TokenType tokType = Keywords.isKeyword(word) ? TokenType.KEYWORD : TokenType.IDENTIFIER;
tok = new Token(tokType, word, lineNum);
}
else if (CharUtils.isNumeric(currentCh)) {
tok = new Token(TokenType.NUMBER, buildString(NUMBER_COND), lineNum);
}
else if (CharUtils.isOperator(currentCh)) {
tok = new Token(TokenType.OPERATOR, buildString(new OperatorCondition(currentCh)), lineNum);
}
else if (CharUtils.isPunctuation(currentCh)) {
tok = new Token(TokenType.PUNCTUATION, String.valueOf((char)currentCh), lineNum);
if (currentCh == '\n')
nextCoI(true);
else
nextCoI(false);
}
else if (currentCh == '"') {
nextChar();
tok = new Token(TokenType.STRING, buildString(STRING_COND), lineNum);
nextChar();
}
else {
throw new IllegalCharException((char)currentCh);
}
// Returns that there are more tokens
return true;
}
/**
* Returns the current token
*
* @return the current token
*/
public Token value() {
if (tok == null)
throw new NoSuchElementException("There are no more tokens");
return tok;
}
/**
* Builds a string with the given condition
*
* @return the condition used to check where the string should end
* @throws IOException
*/
private String buildString(Condition cond) throws IOException {
StringBuilder word = new StringBuilder();
word.append((char)currentCh);
nextChar();
while (cond.accept(currentCh)) {
word.append((char)currentCh);
nextChar();
}
return word.toString();
}
/**
* Moves to the next char of interest, which i.e. is any character that
* is not on a commented line. Returns the character, or -1 if the end
* of file was encountered.
*
* @param ignoreNewln if a new line should be ignored as well as the whitespaces or not
* @return the next character
* @throws IOException
*/
private void nextCoI(boolean ignoreNewln) throws IOException {
nextChar();
ignoreCoW(ignoreNewln);
}
/**
* Called to ignore comments and whitespace characters
*
* @throws IOException
*/
private void ignoreCoW(boolean ignoreNewln) throws IOException {
boolean loop = true;
while (loop) {
if (currentCh == '#') {
while (currentCh == '#') {
nextChar();
while (currentCh != '\n' && currentCh != EOF)
nextChar();
}
}
else if (CharUtils.isWhitespace(currentCh) || (ignoreNewln && currentCh == '\n')) {
nextChar();
}
else {
loop = false;
}
}
}
/**
* Moves on to the next character and stores it in currentCh
*
* @throws IOException
*/
private void nextChar() throws IOException {
currentCh = in.read();
if (currentCh == '\n')
++lineNum;
}
/**
* =====================
* === INNER CLASSES ===
* =====================
*/
// Used to build a word
private static class WordCondition implements Condition {
@Override
public boolean accept(int codePoint) {
return CharUtils.isWord(codePoint) || CharUtils.isNumeric(codePoint) || codePoint == '_';
}
}
private static final Condition WORD_COND = new WordCondition();
// Used to build a number
private static class NumberCondition implements Condition {
@Override
public boolean accept(int codePoint) {
return CharUtils.isNumeric(codePoint) || codePoint == '_';
}
}
private static final Condition NUMBER_COND = new NumberCondition();
// Used to build an operator
private static class OperatorCondition implements Condition {
int count;
int firstCh;
public OperatorCondition(int codePoint) {
count = 0;
firstCh = codePoint;
}
@Override
public boolean accept(int codePoint) {
++count;
return (count < 2) && codePoint == '=';
}
}
// Used to build a string literal
private static class StringCondition implements Condition {
@Override
public boolean accept(int codePoint) {
return codePoint != '"';
}
}
private static final StringCondition STRING_COND = new StringCondition();
/**
* =================
* === CONSTANTS ===
* =================
*/
private static final int EOF = -1;
}
Originally Posted by
bgroenks96
How you go about handling this task depends on what the goal is. What do you want to do with the parsed information? Write it to another file?
If you want to keep it in memory, I recommend StringBuilder (java.lang.StringBuilder)
I want to keep it in memory. Creating a primitive scripting language (for training purposes, its nothing serious). StringBuilder is the way to go when it comes to that I guess.
Originally Posted by
piulitza
There is one more way to read from a file: using Scanner class. You can use this constructor:
Scanner in = new Scanner (new File(url));
And after you can read Strings from the file, or Integers or double just like using this class to read stuff from the keyboard, and this class also has method hasNext() which checks if it did not reach the end of the file. I see this easiest way to read from a .txt file.
I can check it out, but I am not sure I will use it, since I may need more in depth control of how things are read from the file (and because I kind of like doing things the hard way :p).