Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 4 of 4

Thread: How to use MappedByteBuffer efficiently with extremely large files

  1. #1
    Junior Member
    Join Date
    Sep 2012
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default How to use MappedByteBuffer efficiently with extremely large files

    I am currently working on a database-like software that is meant to process very large amounts of data. Most of this data is kept in a number of files on the disk. One thing to note is that these files exist only temporarily, i.e. they are created when the program is executed, and deleted when the Java VM shuts down. Another thing to note is that they get extremely big, in the range of 2-20 GB for practical applications of the software.

    My supervisor told me to use a MappedByteBuffer to map parts of this file to the memory, and said that it was a lot more efficient than simply using a FileChannel. So I read the Java API for MappedByteBuffer and did some experiments to figure out how it works.
    Since the files are way too big to ever fit into the memory, my basic idea was to map a specific part of each file to the memory, which ranges from a point A to a point B. Every time the file is accessed by a read or write operation, I check whether or not this operation is inside that window. If it isn't, I map a different part of the file to the buffer so that the starting point of the requested operation is now in the middle between A and B.

    I ran some tests with this and got mixed results. Some test cases run nearly twice as fast compared to when I simply use a FileChannel. In others, it's extremely slow or simply crashes with an I/O exception.

    Before I get into specific details, I'd like to ask if there are any tutorials or similar things that deal with the scenario. I have serached for things like this for quite a while now, but all I found were tutorials and examples that use a pre-existing file, which is mapped to the memory in its entirety. I also don't know anybody who has experience in using a MappedByteBuffer that I could talk to.
    Since I wasn't able to find any examples about what I'm trying to do, I'm also wondering whether or not using a MappedByteBuffer even makes sense in my case, or if I should just scratch the idea and simply use a FileChannel instead. However, the increase in performance I get in some test cases is very hard to ignore.

    Any help would be very appreciated. If you want more information, I'll be happy to provide it.


  2. #2
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,895
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: How to use MappedByteBuffer efficiently with extremely large files

    What kind of access patterns do you have? Check to make sure you're not swapping buffers too often (you can probably do a few tests to determine what "too often" means). If possible, try re-arranging your data set so you have to swap buffers as little as possible.

  3. The Following User Says Thank You to helloworld922 For This Useful Post:

    Vyse (September 13th, 2012)

  4. #3
    Junior Member
    Join Date
    Sep 2012
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: How to use MappedByteBuffer efficiently with extremely large files

    Quote Originally Posted by helloworld922 View Post
    What kind of access patterns do you have? Check to make sure you're not swapping buffers too often (you can probably do a few tests to determine what "too often" means). If possible, try re-arranging your data set so you have to swap buffers as little as possible.
    It depends on the test case. In some cases, the file is accessed from start to finish in an orderly fashion, and other cases rapidly access random positions in the file. The cases that go through the file in a straight line are the ones where I get a vast increase in performance, whereas the cases that randomly access the file take unreasonably long.

    I could construct my test cases in such a way that the buffers aren't swapped "too often", but that probably wouldn't be a good idea in the long term since the behaviour of the program will be heavily dependent on user input for practical applications.

    I think I'll ditch the MappedByteBuffers and stick to using a FileChannel.

  5. #4
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,895
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: How to use MappedByteBuffer efficiently with extremely large files

    I didn't mean format your test cases so they would give you the best performance, but format them to best represent how the user likely will use the program. I don't know what kind of data you have, but it might have some usage pattern by your customers that you can exploit for performance reasons. You could have the customer test out the software and provide usage statistics which you can analyze to determine what way to best format your data so you get good performance as much as possible.

    You could also provide the generic random access commands using FileChannel and add some "high capacity" commands designed for accessing large contiguous blocks and use MappedByteBuffers then. This way you get decent performance when you have random access and still get large performance boosts for contiguous access.

Similar Threads

  1. Efficiently processing data ....
    By rdegrijs in forum Java Servlet
    Replies: 0
    Last Post: July 10th, 2012, 06:25 PM
  2. Comparing 2 large ResultSets
    By daniel_el in forum JDBC & Databases
    Replies: 1
    Last Post: May 23rd, 2012, 08:56 AM
  3. Using arrays more efficiently in Java?
    By mjballa in forum Java Theory & Questions
    Replies: 1
    Last Post: February 4th, 2012, 08:53 PM
  4. Extremely basic -- help with multiple arguments
    By jhnhskll in forum Object Oriented Programming
    Replies: 9
    Last Post: January 17th, 2012, 04:25 PM
  5. Replies: 2
    Last Post: January 6th, 2012, 10:50 PM