I am currently working on a database-like software that is meant to process very large amounts of data. Most of this data is kept in a number of files on the disk. One thing to note is that these files exist only temporarily, i.e. they are created when the program is executed, and deleted when the Java VM shuts down. Another thing to note is that they get extremely big, in the range of 2-20 GB for practical applications of the software.
My supervisor told me to use a MappedByteBuffer to map parts of this file to the memory, and said that it was a lot more efficient than simply using a FileChannel. So I read the Java API for MappedByteBuffer and did some experiments to figure out how it works.
Since the files are way too big to ever fit into the memory, my basic idea was to map a specific part of each file to the memory, which ranges from a point A to a point B. Every time the file is accessed by a read or write operation, I check whether or not this operation is inside that window. If it isn't, I map a different part of the file to the buffer so that the starting point of the requested operation is now in the middle between A and B.
I ran some tests with this and got mixed results. Some test cases run nearly twice as fast compared to when I simply use a FileChannel. In others, it's extremely slow or simply crashes with an I/O exception.
Before I get into specific details, I'd like to ask if there are any tutorials or similar things that deal with the scenario. I have serached for things like this for quite a while now, but all I found were tutorials and examples that use a pre-existing file, which is mapped to the memory in its entirety. I also don't know anybody who has experience in using a MappedByteBuffer that I could talk to.
Since I wasn't able to find any examples about what I'm trying to do, I'm also wondering whether or not using a MappedByteBuffer even makes sense in my case, or if I should just scratch the idea and simply use a FileChannel instead. However, the increase in performance I get in some test cases is very hard to ignore.
Any help would be very appreciated. If you want more information, I'll be happy to provide it.