Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 1 of 1

Thread: Micrometirics for CI/CD pipeline

  1. #1
    Member
    Join Date
    Nov 2017
    Location
    USA
    Posts
    148
    Thanks
    6
    Thanked 1 Time in 1 Post

    Default Micrometirics for CI/CD pipeline

    Continuous Integration/Continuous Deployment (CI/CD) has become central to software development. To ensure high-quality software releases smoke tests, regression tests, performance tests, static code analysis & security scans are run in CI/CD pipeline. Despite of all these quality measures, still applications are facing OutOfMemoryError, CPU spikes, unresponsiveness, degradation in response time in production environment.

    These sort of performance problems surfaces in production because in CI/CD pipeline only macro level metrics such as: Static code quality metrics, test/code coverage, CPU Utilization, memory consumption, response time… are studied. In this article let’s review the micrometrics that should be studied in CI/CD pipeline to delivery high quality releases in production. We will also learn how to source this micrometrics and integrate it in to CI/CD pipeline.

    How Tsunamis are forecasted?

    You might wonder why Tsunami forecasting is related to this article. There is a relationship :-). A normal sea wave travels at a speed of 5 – 60 miles/hr, whereas Tsunami waves travel at a speed of 500 – 600 miles/hr. Even though Tsunami wave travels at a speed of 10x – 100x speed of normal waves, it’s very hard to forecast Tsunami waves. Thus, modern day technologies use micrometrics to forecast to Tsunami waves. Multiple DART (Deep-ocean Assessment and Reporting of Tsunami) devices are installed all throughout the world.

    untitled-design-3.jpg
    Fig: DART Device to detect Tsunami

    DART contains two parts:

    a. Surface Buoy: Device which floats at the top of ocean water
    b. Seabed Monitor: Device which is stationed at the bottom of the ocean

    Deep ocean water is about 6000 meters in depth. (20x of tallest San Francisco Sales Force tower). Whenever the sea level raises more than 1 mm then DART automatically detects it and transmits this information to satellite. This 1 mm raise in sea water is a lead indicator of Tsunami origination. I would like to request you to pause here for a second and visualize length of 1 mm in the scale of 6000 meters sea depth. It’s nothing, negligible. But this micrometric analysis is what used for forecasting Tsunamis.

    How to forecast Performance Tsunamis through Micrometrics?

    Similarly, there are few micrometrics that you can monitor in your CI/CD pipeline. This micrometrics are lead indicators of several performance problems that you will face in production. Raise or drop in values of these micrometrics are the great indicators for the origination of performance problems.

    1. Garbage Collection Throughput
    2. Average GC pause time
    3. Maximum GC pause time
    4. Object creation rate
    5. Peak heap size
    6. Thread Count
    7. Thread States
    8. Thread Groups
    9. Wasted Memory
    10. Object Count
    11. Class Count

    Let’s study each micrometrics in detail:

    1. GARBAGE COLLECTION THROUGHPUT

    Garbage Collection throughout is the amount of time your application spends in processing customer transactions vs amount of time your application spends in doing garbage collection.

    Let’s say your application has been running for 60 minutes. In this 60 minutes, 2 minutes is spent on GC activities.
    It means application has spent 3.33% on GC activities (i.e. 2 / 60 * 100)
    It means Garbage Collection throughput is 96.67% (i.e. 100 – 3.33).

    When there is a degradation in the GC throughput, it’s an indication of some sort of memory problem. Now the question is: What is the acceptable throughput %? It depends on the application and business demands. Typically, one should target for more than 98% throughput.

    2. AVERAGE GARBAGE COLLECTION PAUSE TIME

    When Garbage Collection event runs, entire application pauses. Because Garbage Collection has to mark every object in the application, see whether those objects are referenced, if no one is referencing then will be evicted from memory. Then fragmented memory is compacted. To do all these operations, application will be paused. Thus when Garbage collection runs, customer will experience pauses/delays. Thus one should always target to attain low average GC pause time.

    3. MAX GARBAGE COLLECTION PAUSE TIME

    Some Garbage collection events might take a few milliseconds, whereas some garbage collection events might also take several seconds to minutes. You should measure maximum garbage collection pause time, to understand the worst possible impact to the customer. Proper tuning (and if needed application code changes) are needed to reduce the maximum Garbage Collection pause time.

    4. OBJECT CREATION RATE

    Object creation rate is the average amount of objects created by your application. Maybe in your previous code commit, application was creating 100mb/sec. Starting from recent code commit, application started to create 150mb/sec. This additional object creation rate can trigger lot more GC activity, CPU spikes, potential OutOfMemoryError, memory leaks when application is running for longer period.

    5. PEAK HEAP SIZE

    Peak heap size is the maximum amount of memory consumed by your application. If peak heap size goes beyond a limit you must investigate it. Maybe there is a potential memory leak in the application, newly introduced code (or 3rd libraries/frameworks) is consuming lot of memory, maybe there is legitimate use of it, if it is the case you will have to change your JVM arguments to allocate more memory.

    #“Garbage collection throughput, average GC pause time, maximum GC pause time, object creation rate, peak heap size micrometrics can be sourced only from garbage collection logs. No other tools can be used for this purpose.
    As part of your CI/CD pipeline, you need to run regression test suite or performance test (ideal). Garbage Collection logs generated from the test, should be passed to GCeasy’s REST API. This API analyzes garbage collection logs and responds back with above mentioned micrometrics. To learn where this micrometrics are sent in the API response and JSON path expression for them, to this article. If any value is breached, then build can be failed. GCeasy REST API has intelligence to detect various other garbage collection problems such as: memory leaks, user time > sys + real time, sys time > user time, invocation of System.gc() API calls,… Any detected GC problems will be reported in the ‘problem’ element of API response. You might want to track this element as well”.


    6. THREAD COUNT

    Thread count can be another key metric to monitor. If thread count goes beyond a limit it can cause CPU, memory problems. Too many threads can cause ‘java.lang.OutOfMemoryError: unable to create new native thread’ in the long-running production environment.

    7. THREAD STATES

    Application threads can in different thread states. To learn about various thread states, refer to this quick video clip. Too many RUNNABLE threads can cause CPU spike. Too many BLOCKED threads can make application unresponsive. If number of threads in a particular thread state crosses certain threshold then you may consider generating appropriate alerts/warning.

    8. THREAD GROUPS

    A thread group represents a collection of threads performing similar tasks. There could be a servlet container thread group that processes all the incoming HTTP requests. There could be a JMS thread group, which handles all the JMS sending, receiving activity. There could be some sensitive other thread groups in the application as well. You might want to track those sensitive thread groups size. You don’t want their size neither to drop below a threshold nor go beyond a threshold. Less number of threads in a thread group can stall the activities. More number of threads can lead to memory, CPU problems.

    #“Thread count, thread states, thread groups micrometrics can be sourced from thread dumps. As part of your CI/CD pipeline, you need to run regression test suite or performance test (ideal). 3 Thread dumps in a gap of 10 seconds interval should be captured when tests are running. Captured thread dumps should be passed to FastThread’s REST API. This API analyzes thread dumps and responds back with the above mentioned micrometrics.
    To learn where this micrometrics are sent in the API response and JSON path expression for them, refer to this article. If any value is breached, then build can be failed”. FastThread REST API has intelligence to detect several threading problems such as: Deadlocks, CPU spiking threads, prolonged blocking threads, … Any detected problems will be reported in the ‘problem’ element of API response. You might want to track this element as well”.


    9. WASTED MEMORY

    In modern computing world lot of memory is wasted because of poor programming practices such as: duplicate object creation, duplicate string creation, inefficient collections implementations, sub-optimal data type definitions, inefficient finalizations,.. Heap Hero API detects an amount of memory wasted due to all these inefficient programming practices. This can be a key metric to track. In case if amount wasted memory goes beyond certain percentage, then CI/CD build can be failed, or warnings can be generated.

    10. OBJECT COUNT

    You might also want to track the total number of objects that are present in the application’s memory. Object count can spike up because of inefficient code, new introduction of 3rd party libraries, frameworks. Too many objects can cause OutOfMemoryError, memory leak, CPU spike in production.

    11. CLASS COUNT

    You might also want to track the total number of classes that are present in the application’s memory. Sometimes class count can spike because of an introduction of any 3rd party libraries, frameworks. Spike in classes count, can cause problems in Metaspace/PermGen space of the memory.

    #“Wasted Memory size, object count, class count micrometrics can be sourced from heap dumps. As part of your CI/CD pipeline, you need to run a regression test suite or performance test (ideal). Heap dumps should be captured after the test run is complete. Captured heap dumps should be passed to HeapHero’s RESTAPI. This API analyzes heap dumps and responds back with this micrometrics.
    To learn where this micrometrics are sent in the API response and JSON path expression for them, refer to this article. If any value is breached, then build can be failed”. HeapHero REST API has the intelligence to detect several memory related problems such as: memory leaks, objects finalization,… Any detected problems will be reported in the ‘problem’ element of API response. You might want to track this element as well”.

    Last edited by Ram Lakshmanan; July 11th, 2018 at 01:51 AM.

Similar Threads

  1. Constructing a threaded pipeline using buffer to sort a list of integers
    By harryjava in forum What's Wrong With My Code?
    Replies: 10
    Last Post: November 14th, 2013, 04:05 PM

Tags for this Thread