Build your own Application Performance Monitoring (APM) solution with ELK and JMX – part 1

Application Performance Monitoring is one key area to implement in projects to ensure proper and smooth operations from day 1. APM solutions provide development and operations team with near real-time insights on how the applications and services are performing in production, allowing for proactive tune of services, as well as for early detection of possible production issues.

There are many APM solutions in the market, with diverse range of capabilities, from the inner workings of the JVM (threads in use, memory pools, garbage collection and what not) to user experience in web and mobile applications.

APM solutions can also be expensive, which causes that many projects, small or short-termed, simply cannot afford one.

In this series of two blog posts we will show how to leverage FOSS tools to build an APM solution, which although more limited in capabilities, may be of use for small projects.

We will be leveraging ELK – Elasticsearch, Logstash, Kibana -, the Java Management Extension library (JMX) and the Spring Framework support for JMX to easily expose application data out of the JVM.

NOTE: The following guide is written on ELK stack 5.2.1. Other versions may need slight adjustments to commands and/or configuration.

Introduction – What we need

The first step in this journey is, logically, to have data to collect. For our APM solution we are going to gather three different types of data:

  • Application events logged with a specific, known format by a logging framework.
  • Application data which is stored by the business services in one or multiple beans (beans with shared state at the JVM level, e.g. singleton).
  • JVM performance data like memory pool usage, CPU usage, etc.

Logging should be pretty familiar. Using a logging framework allows for decoupling the action of logging some event by the application code with the actual log implementation, which filters to apply (and the ability to modify them at runtime without changes in code), how to persist the log statements (disk, queues, database), etc. In terms of logging, there is little to uncover, but we need to pay attention to a couple of things that are of interest to the APM we are building: a) where are the log files located; and b) what is the format of each log statement, including which information is stored in it (e.g. timestamp, log severity, logger name/class, etc.).

Application data implies two different things to consider: a) which business/application information would be useful to monitor the performance and health; and b) how to expose that information out of the runtime as a time series.

JVM performance data is pretty straightforward in the case of an application running on the JVM, as the JVM already instruments many of its inner aspects and the ability to collect them requires to simply activate the publishing of them.

For all three types of data we would need some mechanism to gather the data up to some central location, to index it so it is easily searchable, and finally to present it to the users in an attractive, useful way.

The technical solution that we will leverage to build the APM will be composed of:

  • Event logging trough slf4j and log4j 2 – although any other logging framework (either standalone or as a combination of wrapper and implementation) will suffice.
  • Spring support for JMX.
  • Goode olde JMX.
  • Logstash to collect the data coming from all sources.
  • Elasticsearch to index the data and provide the data access capabilities.
  • Kibana as the dashboard where data will be visualized.

Step 1 – Preparing the Application

For this example we will leverage Spring Pet Clinic application. Simple enough to not require too many resources or knowledge to master it, and complex enough as to demonstrate typical use cases of Java/Web applications. With the latest version you may even choose from a variety of frontend technologies, although we will just leverage the ‘classic’ one adapted to work with more modern Spring and Tomcat versions. This version, with all the updated configurations, can be found on GitHub here: https://github.com/deors/deors.demos.petclinic

In Spring Pet Clinic, the application is already logging events using slf4j and log4j 2, and collecting some application data – the number of service calls and the average time for them to complete – which are stored in a bean and published via JMX.

We will reflect for a bit on how Spring Pet Clinic achieves that, as it can be seen as a best practices for real world applications.

1.1 Logging like the grown-ups!

Let’s assume that it is required to log certain events occurring on the application. To do things properly, a logging framework is setup or maybe just JDK logging is chosen to be used, instead of relying of System.out and System.err streams.

Once the logging framework is ready, the developers working in the project will start using the logging framework API to log events, leveraging a hierarchy of loggers (typically one logger per class) and one or several priority levels (typically FATAL, ERROR, WARN, INFO, DEBUG, TRACE or similar levels, ordered in this list from the highest priority to the lowest priority, reserved for fine grain debugging).

However, is that the most optimal way of logging events? Although there will be many that require developers whit to be created, i.e. developers know better where log statements are relevant and which information is relevant to be logged, there are many others that could be bulk-logged, i.e. events every time a service is called, or events every time an exception is thrown.

To do that, Aspect Oriented Programming is a perfect fit, and AspectJ is arguably the best library for doing AOP in Java.

In the case of Pet Clinic, let’s leverage the CallMonitoringAspect that already exists, located in the package org.springframework.samples.petclinic.aspects.

First, let’s define the logger object, using the class name as the name of the logger:

@Aspect
public class CallMonitoringAspect {

    private static final Logger logger = LoggerFactory.getLogger(CallMonitoringAspect.class);

This aspect has an invoke() method which is defined to be called every time a method in a class annotated with @Service stereotype is called.

Therefore, in the invoke method, let’s add a log statement which will be recorded every time a service is called:

@Around("within(@org.springframework.stereotype.Service *)")
public Object invoke(ProceedingJoinPoint joinPoint) throws Throwable {
    if (this.isEnabled) {
    StopWatch sw = new StopWatch(joinPoint.toShortString());

    sw.start("invoke");
    try {
        if (logger.isInfoEnabled()) {
            logger.info("service call {}.{}()",
                joinPoint.getSignature().getDeclaringTypeName(),
                joinPoint.getSignature().getName());
        }
        return joinPoint.proceed();
    }
...rest of method
}

By leveraging the methods in ProceedingJoinPoint and Signature classes from AspectJ API, we can write a log statement that precisely refers to service that is being invoked, by logging the service class name and method name.

This approach is very elegant, and very productive as it does not require a lot of effort to trace every single service call in the application.

We also need to take note on the log format itself, as we would need it later to align parsing of logged messages to this format. In this case, the log format configuration that is used is the following:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration monitorInterval="5">
    <Appenders>
        <Console name="console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d %-5level [%logger{3.}] - %msg%n" />
        </Console>
        <RollingFile name="file"
            fileName="${sys:catalina.base}/logs/petclinic.log"
            filePattern="${sys:catalina.base}/logs/petclinic-%i.log.gz">
            <PatternLayout pattern="%d %-5level [%logger{3.}] - %msg%n"/>
            <Policies>
                <SizeBasedTriggeringPolicy size="512 KB"/>
            </Policies>
            <DefaultRolloverStrategy max="3"/>
        </RollingFile>
    </Appenders>
    <Loggers>
        <Root level="INFO">
            <AppenderRef ref="console" />
            <AppenderRef ref="file" />
        </Root>
    </Loggers>
</Configuration>
1.2 Exposing Application Data

Now it’s the turn to expose application data to the outside world. This mechanism allows for easy instrumentation of applications and the ability to get insights for nearly every aspect of the application on real time. In the case of Pet Clinic, the application is already doing that in the very same class, CallMonitoringAspect, that was modified in the previous step. The curious may have already noticed that the CallMonitoringAspect class is annotated as a JMX managed resource bean:

@ManagedResource("petclinic:type=CallMonitor")
@Aspect
public class CallMonitoringAspect {

This annotation, provided by Spring JMX support, will instruct the framework to expose the class as a JMX bean, and accordingly, methods annotated with either @ManagedAttribute or @ManagedOperation, will become available in any JMX-aware tool, as attributes or operations, respectively.

Pet Clinic is already calculating the two metrics that we want to instrument: total number of service calls, and average time of service calls. These metrics are calculated in the invoke method that we already know, so it is not needed to calculate them in every single service call (which is not only unproductive, but easy to make mistakes if calculations are not consistent in every call).

The metric values are stored in bean properties and exposed as attributes:

@ManagedAttribute
public int getCallCount() {
    return callCount;
}

@ManagedAttribute
public long getCallTime() {
    return (this.callCount > 0 ? this.accumulatedCallTime / this.callCount : 0);
}

There are other attributes and operations available, for example to reset the metric values or to enable/disable the calculation.

1.3 Enabling JMX in Apache Tomcat

JMX interface is not enabled in Apache Tomcat by default. As we are leveraging Tomcat to run Pet Clinic, it is needed to explicitly enable JMX when Tomcat is started.

To do that, without needing to update the startup script, you may simply leverage the JAVA_OPTS environment variable which is automatically pulled in by the script and its content passed to the JVM running Tomcat:

JAVA_OPTS=-Dcom.sun.management.jmxremote.port=28089 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false

In this example, JMX interface will be available in port 28089, and security is not enabled. Please note this is not recommended for use in a production system unless the instance is properly secured inside a firewall, overlay network or other mechanism. Take into consideration that the JVM interface will be exposing internal JVM and application information (which may even be sensitive) as well as operations that may affect the way applications behave.

Once Tomcat is launched with JMX enabled, you may check with any JMX-aware tool that information is exposed as expected. You may check with jconsole, VisualVM, Mission Control or Hawt.io, to name a few options.

jconsole1jconsole2jconsole3

Connecting to JMX interface, initial view and Pet Clinic data, with jconsole.

jmc1 jmc2 jmc3

Connecting to JMX interface, initial view and Pet Clinic data, with Java Mission Control.

Step 2 – Running Elasticsearch

The next step is to configure and launch Elasticsearch. Elasticsearch is the element that will ingest and index all the data. In addition to that, Elasticsearch API allows for easy, natural way of asking for data and getting results back to any consuming frontend.

Elasticsearch can be downloaded here: https://www.elastic.co/downloads/elasticsearch

Extract the Zip contents to any folder of your choice. We will refer to it as ELASTICSEARCH_HOME if needed.

The file ELASTICSEARCH_HOME/config/elasticsearch.yml is used to configure Elasticsearch settings. In the case you have multiple network adapters it is recommended to configure to which IP or name the Elasticsearch server will bind. In our case, there is a DNS alias for localserver:

# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: localserver
#

Please note that default Elasticsearch port is 9200. The rest of configuration settings will work well for the APM we are building.

Next, launch Elasticsearch by running the command:

ELASTICSEARCH_HOME/bin/elasticsearch

If everything is ok, you should be able to confirm that Elasticsearch is up and running by accessing to this URL in any browser (replace localserver by your server name or IP address):

http://localserver:9200

es-up

Elasticsearch responding to the ‘hello world’ API call.

Step 3 – Configuring Logstash for log processing

The next piece in the puzzle is Logstash. Logstash can play many different roles as its architecture, plug-in oriented, enables the tool to collect, filter and pre-process information coming from many disparate source and in all possible formats.

Logstash can be downloaded here: https://www.elastic.co/downloads/logstash

Extract the Zip contents to any folder of your choice. We will refer to it as LOGSTASH_HOME if needed.

Before running Logstash, we would need to create the configuration file with instructions at where to look for data, filters, expected formats and other settings.

The Logstash configuration file will include the instructions for all the information sources: the application log, a text file which will be parsed to extract individual tokens from the actual log message, and JMX, source for performance and application metrics.

There are three sections in any Logstash configuration file: input, filter and output.

  • Input contains the information on the data sources.
  • Filter contains the configuration of filters and parsers to obtain the relevant information pieces from the sources.
  • Output contains the information on the ways to publish out the collected data.

Let’s begin with setting up the input for the log file:

input {
    file {
        path => "c:\java\servers\apache-tomcat-8.0.32\logs\petclinic.log"
        type => "log"
        add_field => [ "app-name", "petclinic"]
        codec => multiline {
            pattern => "^%{TIMESTAMP_ISO8601}"
            negate => true
            what => previous
     }
 }

With this configuration, we are telling Logstash:

  • Where the application log file can be found.
  • Assigned a type name to this source of data. Useful to separate different data sources (i.e. app log, server log, performance metrics), and also aggregate disparate sources of the same type (i.e. app1 log, app2, log,… appN log).
  • Added a custom field containing the application name. Ideally, every single application or service in a system will have a distinct app-name field. When services have multiple instances (like in a high-availability cluster, or in a microservices architecture) or simply can move from one server to another, aggregating by app-name will collate all events and metrics regardless of the physical source.
  • As it is possible that some log messages contain more than one line, the multiline codec is configured. It is considered that a new event will being when the line starts with a timestamp in ISO8601 format, and in any other case, the line will be appended to the previous to build a multiline event.

Next, we configure the filter section, using the grok plug-in to parse the text in every logged message:

filter {
    if [type] == "log" {
        grok {
            match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel}\s? \[%{JAVACLASS:class}\] - %{GREEDYDATA:logmessage}" }
        }
    }
 
    date {
        match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
    }
}

The if sentence is there to ensure we apply the specific log message format only for inputs of type “log”, which is the type we configured to the input file. Different services may have different log formats, so it is always a good practice to do this, reducing the risk of parse failures.

Obviously, the matcher regular expression must be aligned with the format used by log4j 2.

The date entry is used to instruct Logstash that the event will leverage that field that was parsed from the log message as the timestamp for the event. By default, Logstash uses as timestamp the moment in which the event is processed, which may be different from the time when the event was produced. It usually differs only by a few milliseconds, and it is also common to have multiple events with the same timestamp. This way, the timestamp is clearer and firmly connected to the source of the event, producing less misalignments in the visualization of the logged events.

Next, let’s configure the output section, which is the simplest of the three as we only need to tell Logstash where to send the collected data:

output {
    elasticsearch {
        hosts => localserver
    }
}

With these together, let’s launch Logstash to see that everything is working fine and the log statements are read, parsed and sent to Elasticsearch. Let’s assume that the three bits of configuration above are together in a file named petclinic.conf in a folder represented by PATH_TO_CONF.

Launch Logstash by running the command:

LOGSTASH_HOME/bin/logstash agent -f PATH_TO_CONF/petclinic.conf

Assuming that Pet Clinic is up and running in Tomcat, and that you have accessed the application and navigated a bit to produce some logs, you should be able to see the collected logs by asking Elasticsearch for them. In the browser, access this URL:

http://localserver:9200/logstash-*/log/_search?q=*&size=5&pretty=true

In the URL above, “logstash-*” is the index name (we are simply using the default one), and “log” is the type of the events. As we are asking for everything (“q=*”) and size is restricted to five, you should see in the browser five logged events formatted in a JSON data structure. For example:

logstash-is-sending-data

Elasticsearch returning data parsed and sent by Logstash from Pet Clinic logs.

Step 4 – Configuring Logstash for application metrics processing

In Step 3 we have configured Logstash only to read, parse and publish log messages, thus it is pending to add the JMX bits to it.

First, we need to install the JMX plug-in. To do that, it is needed to issue this command in a console:

LOGSTASH_HOME/bin/logstash-plugin install logstash-input-jmx

Once the plug-in is installed, we need to add a new input source to the configuration file:

input {
    ...
    jmx {
        path => "c:\java\monitor\conf\jmx"
        polling_frequency => 15
        nb_thread => 4
        type => "jmx"
        add_field => [ "app-name", "petclinic"]
    }
}

The settings above will not require further explanation, except maybe for path. The path setting points not to a file, but to a directory where Logstash expects to find one or more JSON files with the JMX specific configuration. To keep things tidy, we will have two files: one for JVM performance metrics, and another for application specific metrics.

As we are not interested in every single metric exposed by the JVM, we will select which ones to collect in the configuration file:

{
    "host": "localserver",
    "port": 28089,
    "alias": "jvm",
    "queries": [
        {
            "object_name": "java.lang:type=OperatingSystem",
            "object_alias": "${type}"
        }, {
            "object_name": "java.lang:type=Memory",
            "object_alias": "${type}"
        }, {
            "object_name": "java.lang:type=Runtime",
            "attributes": [ "Uptime", "StartTime" ],
            "object_alias": "${type}"
        }, {
            "object_name": "java.lang:type=GarbageCollector,name=*",
            "attributes": [ "CollectionCount", "CollectionTime" ],
            "object_alias": "${type}.${name}"
        }, {
            "object_name": "java.lang:type=MemoryPool,name=*",
            "attributes": [ "Usage", "PeakUsage" ],
            "object_alias": "${type}.${name}"
        }, {
            "object_name": "java.nio:type=BufferPool,name=*",
            "object_alias": "${type}.${name}"
        }
    ]
}

The second file is simpler, as it needs to get only application metrics, which are fewer:

{
    "host": "localserver",
    "port": 28089,
    "alias": "app",
    "queries": [
        {
            "object_name": "petclinic:type=CallMonitor",
            "object_alias": "${type}"
        }
    ]
}

Note the host name and port, which are aligned to the configuration in Tomcat.

Once both files are created and saved in the expected folder, re-launch Logstash, give it a few seconds to start collecting data, and then query Elasticsearch to verify everything is working as expected:

http://localserver:9200/logstash-*/jmx/_search?q=*&size=5&pretty=true

This is an example output showing the first metric returned:

logstash-jmx-ok

Elasticsearch returning JMX metric values from Pet Clinic.

End of part 1

We have covered a lot in this post, but there are still a few things to do before completing our APM. In part 2, we will add the last piece to the puzzle, Kibana, the visualization tool and dashboard that we will configure, step by step, to easily consume all the data produced by our applications.

Advertisements

Author: deors

senior technology architect in accenture, with a passion for technology related stuff, celtic music and the best sci-fi, among other thousand things!

1 thought on “Build your own Application Performance Monitoring (APM) solution with ELK and JMX – part 1”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s