Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 21 of 21

Thread: What is the best way to read from a text file?

  1. #1
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default What is the best way to read from a text file?

    Hi all.

    I have been playing around with parsing text files. Basically what I want to do is to create a tokenizer (Lexical analysis - Wikipedia, the free encyclopedia), but I am unsure what the best way to do it is. The way I am doing now is to read one character at a time and build the tokens from that, but I am not sure that is very efficient. Anyone has an idea? What is the best way to do this? For a beginner, that is, this is something I am quite new to.

    Also, I have been wanting to try the java.nio package. Would this be a good use of it?

    And if you wonder about any code, well, my code is so far from functional it wouldnt serve any actual purpose to post it. Quite literally... it does nothing at all.

    Take care,
    Kerr


  2. #2
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,424
    My Mood
    Hungover
    Thanks
    144
    Thanked 636 Times in 540 Posts

    Default Re: What is the best way to read from a text file?

    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  3. #3
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: What is the best way to read from a text file?

    To clarify, though, I didnt mean that I am new to IO but to parsing text files. Really not sure how to do it (and I am of course talking about the IO part of parsing a text file). But looking at that link I think it will still help me . Thanks! Will return if I have more questions.

  4. #4
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,162
    Thanks
    65
    Thanked 2,725 Times in 2,675 Posts

    Default Re: What is the best way to read from a text file?

    the IO part of parsing a text file)
    Do you mean you want to use the file I/O to move around in the text on disk vs reading the text into a buffer/array/String and moving around in the text in memory?

  5. #5
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: What is the best way to read from a text file?

    The second one I guess. Let me take an example. If the first character the tokenizer finds is a letter it will create a string containing that letter and all letters that follows. When it encounters something that is not a letter it will just stop building the string and return it. That string is a token. The next time the tokenizer is called, it will continue from where it stopped. This time it may encounter a digit. So it will create a new string, this time from the digit and all the digits that directly follows it, until it encounters a character that is not a digit. And so on, until the end of the file is reached.

    Just not sure what the best way to do something like this is. Or if this is the best approach to the solution.

  6. #6
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,162
    Thanks
    65
    Thanked 2,725 Times in 2,675 Posts

    Default Re: What is the best way to read from a text file?

    That sounds like a reasonable approach.
    The String class has methods for looking at the contents of a String and for extracting parts of the String.

  7. #7
    Think of me.... Mr.777's Avatar
    Join Date
    Mar 2011
    Location
    Pakistan
    Posts
    1,136
    My Mood
    Grumpy
    Thanks
    20
    Thanked 82 Times in 78 Posts
    Blog Entries
    1

    Default Re: What is the best way to read from a text file?

    Quote Originally Posted by Kerr View Post
    The second one I guess. Let me take an example. If the first character the tokenizer finds is a letter it will create a string containing that letter and all letters that follows. When it encounters something that is not a letter it will just stop building the string and return it. That string is a token. The next time the tokenizer is called, it will continue from where it stopped. This time it may encounter a digit. So it will create a new string, this time from the digit and all the digits that directly follows it, until it encounters a character that is not a digit. And so on, until the end of the file is reached.

    Just not sure what the best way to do something like this is. Or if this is the best approach to the solution.
    What's your approach? What are your constraints, you must need to follow? It varies, how developers think.

  8. #8
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: What is the best way to read from a text file?

    Quote Originally Posted by Norm View Post
    That sounds like a reasonable approach.
    The String class has methods for looking at the contents of a String and for extracting parts of the String.
    I suppose I could just read the entire file into a string and then move through it. Would it be better then using a BufferedReader (like I am now)?
    Last edited by Kerr; December 26th, 2011 at 08:29 AM.

  9. #9
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: What is the best way to read from a text file?

    Quote Originally Posted by Mr.777 View Post
    What's your approach? What are your constraints, you must need to follow? It varies, how developers think.
    I am not sure, hence the question. Quite new to this, and my way of learning tend to be to jump into it head first and see what works and not. So you can say that this is just one giant training exercise for me :p.

    The closest thing to an approach is the one I describe above, though.

  10. #10
    Think of me.... Mr.777's Avatar
    Join Date
    Mar 2011
    Location
    Pakistan
    Posts
    1,136
    My Mood
    Grumpy
    Thanks
    20
    Thanked 82 Times in 78 Posts
    Blog Entries
    1

    Default Re: What is the best way to read from a text file?

    Quote Originally Posted by Kerr View Post
    I suppose I could just read the entire file into a string and then move through it. Would it be better then using a BufferedReader (like I am now)?
    Well, reading the entire file into a String can be quiet expensive operation. Assume if file is 600 MB (i know it's too huge but what if we assume)

  11. #11
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: What is the best way to read from a text file?

    Quote Originally Posted by Mr.777 View Post
    Well, reading the entire file into a String can be quiet expensive operation. Assume if file is 600 MB (i know it's too huge but what if we assume)
    I know, which is one of the reasons I hesitate to do that. Dont think it is too much of a problem, since this is not a serious thing, but I prefer to do it in a good way. Atm I am using a BufferedReader, and I think I may stick with that.

  12. #12
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,162
    Thanks
    65
    Thanked 2,725 Times in 2,675 Posts

    Default Re: What is the best way to read from a text file?

    You can do some buffering yourself. Just like the getToken method will move thru a String that you have read token by token, you can call the BufferedReader methods to get the next line when the current String in memory has been used up to go thru the lines in the file one by one.

  13. #13
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: What is the best way to read from a text file?

    Quote Originally Posted by Norm View Post
    You can do some buffering yourself. Just like the getToken method will move thru a String that you have read token by token, you can call the BufferedReader methods to get the next line when the current String in memory has been used up to go thru the lines in the file one by one.
    Dont think I have to read the file per-line tbh. BufferedReader has a rather large internal buffer (8192 characters) that it appear to fill, so that should be more then enough .

  14. #14
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,162
    Thanks
    65
    Thanked 2,725 Times in 2,675 Posts

    Default Re: What is the best way to read from a text file?

    I don't know how you get at the data in the BufferedReader's buffer without calling a read method which removes the data from the buffer. One of the constructors allows you to specify the size of the buffer.
    I don't see how much is buffered is not relevant to your project.
    You could use the mark and reset methods to move around in the contents of the file

  15. #15
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: What is the best way to read from a text file?

    Quote Originally Posted by Norm View Post
    I don't know how you get at the data in the BufferedReader's buffer without calling a read method which removes the data from the buffer. One of the constructors allows you to specify the size of the buffer.
    I don't see how much is buffered is not relevant to your project.
    You could use the mark and reset methods to move around in the contents of the file
    Think I missunderstood your post, lol. Thought you mean that I should read an entire line (the "readLine" method) and then go through it rather then just go through it (the "read" method).

    May have to confess that my brain is, at the moment, not functioning that well because I am rather tired.

  16. #16
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,162
    Thanks
    65
    Thanked 2,725 Times in 2,675 Posts

    Default Re: What is the best way to read from a text file?

    I'm sure there are many different ways to scan through the characters coming from a file.
    readLine gives you a String and skips over the end of line characters.
    read() would give you a single character (or an array full with the other read() method)

    It's up to you.

  17. #17
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: What is the best way to read from a text file?

    Quote Originally Posted by Norm View Post
    I'm sure there are many different ways to scan through the characters coming from a file.
    readLine gives you a String and skips over the end of line characters.
    read() would give you a single character (or an array full with the other read() method)

    It's up to you.
    Since I would have to step through the characters anyway I just figured I might just as well use the read() method.

  18. #18
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,162
    Thanks
    65
    Thanked 2,725 Times in 2,675 Posts

    Default Re: What is the best way to read from a text file?

    Makes sense to do it what you find to be the easiest way.

  19. #19
    Member
    Join Date
    Jun 2011
    Posts
    182
    My Mood
    Where
    Thanks
    15
    Thanked 8 Times in 8 Posts

    Default Re: What is the best way to read from a text file?

    How you go about handling this task depends on what the goal is. What do you want to do with the parsed information? Write it to another file?

    If you want to keep it in memory, I recommend StringBuilder (java.lang.StringBuilder)

  20. #20
    Member
    Join Date
    Jul 2011
    Posts
    53
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Default Re: What is the best way to read from a text file?

    There is one more way to read from a file: using Scanner class. You can use this constructor:
    Scanner in = new Scanner (new File(url));

    And after you can read Strings from the file, or Integers or double just like using this class to read stuff from the keyboard, and this class also has method hasNext() which checks if it did not reach the end of the file. I see this easiest way to read from a .txt file.

  21. #21
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: What is the best way to read from a text file?

    Ok, I have maybe finished the tokenizer class now. Dont think its perfect, far from it. So I thought I would post it here and if anyone has any input feel free to give it . It is meant to be a part of a simple scripting language I am making. The tokenizer is used to divide a source file into a stream of tokens, which can then be used to build an abstract syntax tree. Have some own issues with the current implementation of the class. For example I use some static inner classes and then I create static instances of them, which feels odd (they are stateless, so I figured its better to have one instance anyway rather then to create a new one anytime they are needed... which is a lot). Also I am rather terrible at comments.
    package growse.parser;
     
    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.Reader;
    import java.util.NoSuchElementException;
    import java.util.Objects;
     
    /**
     * Used to read a series of tokens from a reader
     *
     * @author anders
     */
    class Tokenizer {
        private Reader in;
        private Token tok;
        private int currentCh;
        private int lineNum;
     
        /**
         * Constructor, accepts a reader object
         * 
         * @param in the reader object used as input
         */
        public Tokenizer(BufferedReader in) throws IOException {
            Objects.requireNonNull(in);
            this.in = in;
            lineNum = 1;
            nextCoI(true);
        }
     
        /**
         * Moves on to the next token and return if the operation
         * was successful
         * 
         * @return if there was more tokens
         */
        public boolean next() throws IOException, IllegalCharException {
            // Get the next character
            ignoreCoW(false);
            if (currentCh == EOF) {
                tok = null;
                return false;
            }
     
            // Get the next token
            if (CharUtils.isWord(currentCh)) {
                String word = buildString(WORD_COND);
                TokenType tokType = Keywords.isKeyword(word) ? TokenType.KEYWORD : TokenType.IDENTIFIER;
                tok = new Token(tokType, word, lineNum);
            }
            else if (CharUtils.isNumeric(currentCh)) {
                tok = new Token(TokenType.NUMBER, buildString(NUMBER_COND), lineNum);
            }
            else if (CharUtils.isOperator(currentCh)) {
                tok = new Token(TokenType.OPERATOR, buildString(new OperatorCondition(currentCh)), lineNum);
            }
            else if (CharUtils.isPunctuation(currentCh)) {
                tok = new Token(TokenType.PUNCTUATION, String.valueOf((char)currentCh), lineNum);
                if (currentCh == '\n')
                    nextCoI(true);
                else
                    nextCoI(false);
            }
            else if (currentCh == '"') {
                nextChar();
                tok = new Token(TokenType.STRING, buildString(STRING_COND), lineNum);
                nextChar();
            }
            else {
                throw new IllegalCharException((char)currentCh);
            }
     
            // Returns that there are more tokens
            return true;
        }
     
        /**
         * Returns the current token
         * 
         * @return the current token
         */
        public Token value() {
            if (tok == null)
                throw new NoSuchElementException("There are no more tokens");
            return tok;
        }
     
        /**
         * Builds a string with the given condition
         * 
         * @return the condition used to check where the string should end
         * @throws IOException 
         */
        private String buildString(Condition cond) throws IOException {
            StringBuilder word = new StringBuilder();
            word.append((char)currentCh);
            nextChar();
            while (cond.accept(currentCh)) {
                word.append((char)currentCh);
                nextChar();
            }
            return word.toString();
        }
     
        /**
         * Moves to the next char of interest, which i.e. is any character that
         * is not on a commented line. Returns the character, or -1 if the end
         * of file was encountered.
         * 
         * @param ignoreNewln if a new line should be ignored as well as the whitespaces or not
         * @return the next character
         * @throws IOException 
         */
        private void nextCoI(boolean ignoreNewln) throws IOException {
            nextChar();
            ignoreCoW(ignoreNewln);
        }
     
        /**
         * Called to ignore comments and whitespace characters
         * 
         * @throws IOException 
         */
        private void ignoreCoW(boolean ignoreNewln) throws IOException {
            boolean loop = true;
            while (loop) {
                if (currentCh == '#') {
                    while (currentCh == '#') {
                        nextChar();
                        while (currentCh != '\n' && currentCh != EOF)
                            nextChar();
                    }
                }
                else if (CharUtils.isWhitespace(currentCh) || (ignoreNewln && currentCh == '\n')) {
                    nextChar();
                }
                else {
                    loop = false;
                }
            }
        }
     
        /**
         * Moves on to the next character and stores it in currentCh
         * 
         * @throws IOException 
         */
        private void nextChar() throws IOException {
            currentCh = in.read();
            if (currentCh == '\n')
                ++lineNum;
        }
     
        /**
         * =====================
         * === INNER CLASSES ===
         * =====================
         */
     
        // Used to build a word
        private static class WordCondition implements Condition {
            @Override
            public boolean accept(int codePoint) {
                return CharUtils.isWord(codePoint) || CharUtils.isNumeric(codePoint) || codePoint == '_';
            }
        }
        private static final Condition WORD_COND = new WordCondition();
     
        // Used to build a number
        private static class NumberCondition implements Condition {
            @Override
            public boolean accept(int codePoint) {
                return CharUtils.isNumeric(codePoint) || codePoint == '_';
            }
        }
        private static final Condition NUMBER_COND = new NumberCondition();
     
        // Used to build an operator
        private static class OperatorCondition implements Condition {
            int count;
            int firstCh;
     
            public OperatorCondition(int codePoint) {
                count = 0;
                firstCh = codePoint;
            }
     
            @Override
            public boolean accept(int codePoint) {
                ++count;
                return (count < 2) && codePoint == '=';
            }
        }
     
        // Used to build a string literal
        private static class StringCondition implements Condition {
            @Override
            public boolean accept(int codePoint) {
                return codePoint != '"';
            }
        }
        private static final StringCondition STRING_COND = new StringCondition();
     
        /**
         * =================
         * === CONSTANTS ===
         * =================
         */
        private static final int EOF = -1;
    }

    Quote Originally Posted by bgroenks96 View Post
    How you go about handling this task depends on what the goal is. What do you want to do with the parsed information? Write it to another file?

    If you want to keep it in memory, I recommend StringBuilder (java.lang.StringBuilder)
    I want to keep it in memory. Creating a primitive scripting language (for training purposes, its nothing serious). StringBuilder is the way to go when it comes to that I guess.

    Quote Originally Posted by piulitza View Post
    There is one more way to read from a file: using Scanner class. You can use this constructor:
    Scanner in = new Scanner (new File(url));

    And after you can read Strings from the file, or Integers or double just like using this class to read stuff from the keyboard, and this class also has method hasNext() which checks if it did not reach the end of the file. I see this easiest way to read from a .txt file.
    I can check it out, but I am not sure I will use it, since I may need more in depth control of how things are read from the file (and because I kind of like doing things the hard way :p).
    Last edited by Kerr; January 4th, 2012 at 05:50 PM.

Similar Threads

  1. Read text file
    By cardamis in forum Java IDEs
    Replies: 2
    Last Post: November 4th, 2011, 11:59 PM
  2. Re: How to read a text from an Image file?
    By aparnaverma in forum File I/O & Other I/O Streams
    Replies: 3
    Last Post: October 5th, 2011, 08:01 AM
  3. How to read a text from an Image file?
    By GautamRy in forum File I/O & Other I/O Streams
    Replies: 4
    Last Post: October 5th, 2011, 05:02 AM
  4. Replies: 8
    Last Post: March 25th, 2011, 02:34 PM
  5. Read a text file and parse the contents of file
    By HelloAll in forum File I/O & Other I/O Streams
    Replies: 1
    Last Post: March 3rd, 2011, 05:47 AM