Modern Java applications do a lot of string manipulations due to Webservice API calls (i.e. JSON, REST, SOAP, …), external data sources calls (SQL, data returned back from DB, …), text parsing, text building,… Thus, string objects easily occupy at least 30% of memory. Apparently, major of those String objects are duplicates. Because of string duplication, considerable amount of memory is wasted. Thus, to optimize the memory wasted by duplicate string objects JEP 192 (i.e. JDK Enhancement Proposal) was implemented. JEP 192 is a welcoming enhancement to Java.
What does JEP 192 do?
When G1 GC algorithm runs, it removes garbage objects from the memory. Along with it, it also removes duplicate string objects from the memory. Technical term for removal of duplicate strings is: “String deduplication”. This feature can be activated by passing following JVM arguments:
Note 1: In order to use this feature, you need to run on Java 8 update 20 or later versions.-XX:+UseG1GC -XX:+UseStringDeduplication
Note 2: In order to use ‘-XX:+UseStringDeduplication’, you need to be using G1 GC algorithm.
Let’s study with an example
Let’s validate this feature with this simple program. This example has been chosen basically to study how JVM handles duplicate strings.
This program basically creates:public class StringDeduplicationExample { public static List<String>myStrings = new ArrayList<>(); public static void main(String[] args) throws Exception { for (int counter = 0; counter < 200; ++counter) { for (int secondCounter = 0; secondCounter < 1000; ++secondCounter) { // Add it 1000 times. myStrings.add(("Hello World-" + counter)); } System.out.println("Hello World-" + counter + " has been added 1000 times"); } } }
1000 instances of “Hello World-0” strings
1000 instances of “Hello World-1” strings
1000 instances of “Hello World-2” strings
:
:
:
1000 instances of “Hello World-199” strings
We ran this program couple of times with two different JVM arguments.
Run #1:
First time we ran the program by passing ‘-XX:+UseStringDeduplication’ JVM argument. i.e.:
Run #2:-Xmx20M -XX:+UseG1GC -XX:+UseStringDeduplication
Second time we ran the same program without passing ‘-XX:+UseStringDeduplication’ argument:
During both the runs, we captured heap dumps and analyzed it through heap dump analysis tool: HeapHero.io. HeapHero.io tool has intelligence to detect amount of memory wasted due to various inefficient programming practices, including amount of memory wasted due to duplicate strings.-Xmx20M -XX:+UseG1GC
Here are the reports generated by HeapHero.io:
1. Heap Dump analysis report for run#1
2. Heap Dump analysis report for run#2
Here are few interesting observations from the report
java.JPG
Even though it was the same code that was executed, in Run #1 (‘-XX:+UseStringDeduplication’ is passed), you can notice the overall heap size to be 7.94mb, whereas in Run #2 (‘-XX:+UseStringDeduplication’ is not passed), there is a considerable increase in the overall heap size 15.89mb.
Even though there is equivalent number of string objects in both the runs (i.e. 206k), but amount of memory wasted due to duplicate strings in Run #1 is 5.6mb whereas in Run #2 is 13.81mb.
This dramatic reduction in memory consumption was made possible, because ‘-XX:+UseStringDeduplication’ argument, which evicted significant amount of duplicate strings from the application.
Thus we encourage you to take advantage of ‘-XX:+UseG1GC -XX:+UseStringDeduplication’ and reduce memory wastage caused by duplicate strings. This change has potential to reduce overall memory footprint of your application.