I'm supposed to be making a parser for grammars, or basically parsing a simple language definition.
Basically I could be parsing either number or a symbol (+, *, (, ), or lambda, maybe lambda anyway, etc)
I want it to read in 123 as "123", not as 1, then 2, then 3 as three separate chars.
I want it only to advance the tokens thought based upon certain conditions. Not always.
I've started the actual code of this thing, though haven't gotten far.
import java.util.StringTokenizer; import java.io.*; import java.util.*; public class LL1Parser { private StringTokenizer tokenizer; private boolean valid; public LL1Parser(String input) { tokenizer = new StringTokenizer(input); } public boolean isValid(StringTokenizer st) { } public static void main(String[] args) { LL1Parser parser; try { parser = new LL1Parser(args[0]); } catch(NullPointerException npe) { System.out.println("args[0] doesn't exist!"); System.exit(0); } }
The pseudo code for the parsing is:
E(): switch(token) { case n: T(); E'(); build E -> TE'; break; case +: error; break; case *: error; break; case (: T(); E'(); build E ->TE'; break; case ): error; break; case $: error; break; } E'(): Switch(token) { case n: error; break; case +: get(+);T(); E'(); break; case *; error; break; case (: error; break; case ): build E' ->lambda; break; case $: build E' -> lambda; break; } T(): Switch(token) { case n: build T->FT'; break; case +: error; break; case *: error; break; case (: build T->FT'; break; case ): error; break; case $: error; break; } T'(): Switch(token) { case n: error; break; case +: build T' -> lambda; break; case *: get(*); F(); T'(); build T'-> *FT'; break; case (: error; break; case ): build T'->lambda; break; case $: build T'->lambda; break; } F(): Switch(token) { case n: get(n); build F->n; break; case +: error; break; case *: error; break; case (: get('(');E(); get(')'); build F->(E); break; case ): error; break; case $: error; break; }
Basically it's kind of recursive. I'm not so much concerned about what to write right now as how to work a Tokenizer.
If you're confused, which I have a strange feeling you are, I'm going to try and clarify:
It will read in a 1. As the original starting part is an E, it will go into the E part.
Since 1 is n where n is any integer less than 32767 it will go to the n case in the E part and will check out the
T(); E'(); part. Since it's going from left to right, it checks the T part first, and still has the same token as it hasn't removed it yet.
It goes to case n of the T part.
It tells it to go to FT' so now it's
F(); T'(); E'();
it checks out the F part first with the same token still.
It goes to case n and asks to get(n)
If this token WEREN'T a number, it would have something saying the grammar was incorrect.
As it is, this test of mine has it as a number so it removes that token from the input string and then checks out
T'(); E'();
The next token now is a +. It will check out the + case for T' which tells it to go to lambda, basically just getting rid of the T' but not changing the Token. Now it checks + in E'
get(+);T(); E'();
break;
which removes the + token and then has evaluates the next token, (, as in T.
so now it's
F(); T'(); E'();
get('(');E(); get(')')
The ( is in quotes so it won't get confused with the regular parenthesis.
This removes the ( and the checks
E(); get(')'); T'(); E'();
the next token happens to be ( so now it's
T(); E'(); get(')'); T'(); E'();
F(); T'(); E'(); get(')'); T'(); E'();
get('(');E(); get(')'); T'(); E'(); get(')'); T'(); E'();
so it removes the ( and goes to
E(); get(')'); T'(); E'(); get(')'); T'(); E'();
The next token is now a 23
T(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
F(); T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
get(n); T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
so remove the 23 and then + is the new token.
T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
It goes to lambda so now it's
E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
get(+);T(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
That removes the + and 2 is the next token.
F(); T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
get(n); T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
so now the 2 is gone and * is the next token.
T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
get; F(); T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
So * is gone and 3 is the next token.
F(); T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
get(n); T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
so now 3 is gone and ) is the new token.
T'(); E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
So T' goes to lambda and now it's
E'(); get(')'); T'(); E'(); get(')'); T'(); E'();
E' goes to lambda so it's
get(')'); T'(); E'(); get(')'); T'(); E'();
So now the ) is gone.
Now ) is the next token.
T'(); E'(); get(')'); T'(); E'();
T' goes to lambda so it's
E'(); get(')'); T'(); E'();
E' goes to lambda too so now it's
get(')'); T'(); E'();
so the ) is gone and now, say, ) is the next token. (This will ruin the grammar so I'll show a correct part later.)
T'(); E'();
Ok, I've messed up somewhere in there, but you get the point.
I don't know how to use a Tokenizer to do this.