Note: phiên bản Tiếng Việt của bài này ở link dưới.

https://duongnt.com/meterfilter-datadog-vie

Use MeterFilter to change Resilience4j metrics

Circuit Breaker is a well-known pattern to prevent an application from performing an operation that is likely to fail. While developing Kotlin applications, I often use the Resilience4j library to implement this pattern. This library also provides a module for Micrometer to help with integrating with most popular monitoring systems.

Today, we will take a look at some issues when integrating Resilience4j running on Amazon EC2 with Datadog; and see how the MeterFilter interface can help us solving those problems.

Note: this article assumes some familiarity with the Resilience4j library, especially how to create a CircuitBreakerRegistry and how to use that registry to create CircuitBreaker objects.

Use Resilience4j with Micrometer and Datadog

As mentioned above, Micrometer is a vendor-neutral facade to simplify the integration between our application and a monitoring system. However, we will only consider Datadog in this article. The code to bind a CircuitBreakerRegistry object to a MeterRegistry is very simple.

val circuitBreakerRegistry = CircuitBreakerRegistry.ofDefaults()
TaggedCircuitBreakerMetrics
    .ofCircuitBreakerRegistry(circuitBreakerRegistry)
    .bindTo(meterRegistry)

After that, we can use the CircuitBreakerRegistry to create CircuitBreaker objects. And all calls made through those breakers will automatically emit Datadog metrics. Please see this link for the full list of supported metrics.

However, the default metrics are not always suitable to our needs. In the following sections, we will go through these two issues and find solutions for them.

  • Collision of the name tag in various metrics.
  • Limiting the amount of metrics that Resilience4j emits.

Collision of the "name" tag

It this scenario, our application uses two breakers (named serviceA and serviceB) to protect calls made to two services. And it runs on Amazon EC2 clusters managed by Kubernetes.

A simple dashboard to display the total number of calls

As we can see in this link, the resilience4j.circuitbreaker.calls metric can be used to monitor the total number of calls made via circuit breakers. It also supports a name tag to differentiate between services using different breakers. But when we try to group calls count by services using that name tag, an application that only uses two breakers can create a graph like this.

Calls count grouped by name

It’s obvious that no one can get any useful information from such a graph. But why are there so many lines? Shouldn’t we have only two entries for our two services? Let’s take a look at the Overview tab of this dashboard.

Calls count grouped by name, details

We do see the values serviceA and serviceB in the name tag. But they are mixed with a bunch of seemingly random hostnames. Where do they come from?

Turns out this tag is just one of many tags automatically emitted by Datadog when we integrate it with an application running on Amazon EC2. It records the names of all EC2 instances in our cluster.

Modify a tag in Resilience4j metrics with MeterFilter

In a perfect world, the best way to solve this issue is to stop Datadog from emitting its own name tag. Unfortunately, as of April 2nd 2023, I haven’t found a way to do that. Instead, we will use a MeterFilter to rename the name tag in Resilience4j metrics to servicename. That way, it won’t collide with the tag emitted by Datadog.

To do that, we need to use the renameTag method. Let’s look at its signature.

  • meterNamePrefix: the prefix of the metrics whose tag we want to rename. MeterFilter will look for the given tag in all metrics with this prefix and rename them.
  • fromTagKey: the origin tag we want to rename.
  • toTagKey: the new name for our tag

Thus, the code to rename all name tags in Resilience4j metrics to serviceName is below.

meterRegistry.config().meterFilter(
    MeterFilter.renameTag(
        "resilience4j.circuitbreaker", // Modify all Resilience4j metrics, not just resilience4j.circuitbreaker.calls
        "name",
        "servicename"
    )
)

We can then bind our CircuitBreakerRegistry object with this meterRegistry, and our dashboard will look much cleaner.

Calls count grouped by servicename

Calls count grouped by servicename, details

Stop Resilience4j from emitting selected metrics

As mentioned in this link, Resilience4j emits a total of seven metrics, each with multiple tags. The name tag (or serviceName if we modify it like we described in the previous section) is particularly troublesome. This is because Datadog charges by the cardinality of possible tags. Which means for every new breaker, we will incur an additional charge.

Fortunately, we can mitigate this issue by limiting which metrics Resilience4j can emit. There are two different approaches.

  • Create a blacklist to deny metrics we don’t need.
  • Create a whitelist to only allow metrics we need.

Create a blacklist with MeterFilter

If there are only a few metrics we wish to omit, we can use either the deny method. This method receives a predicate to test if a metric should be omitted or not.

// Stop emitting the "resilience4j.circuitbreaker.buffered.calls" metric
meterRegistry.config().meterFilter(
    MeterFilter.deny { it.name == "resilience4j.circuitbreaker.buffered.calls" }
)

Or we can use the denyNameStartsWith method. This method only checks if the name of a metric matches the given prefix.

// Omit both "resilience4j.circuitbreaker.state" and "resilience4j.circuitbreaker.slow.call.rate"
meterRegistry.config().meterFilter(
    MeterFilter.denyNameStartsWith("resilience4j.circuitbreaker.s")
)

Create a whitelist with MeterFilter

On the other hand, if we only want to emit a few metric then we can use the denyUnless method to create a whitelist. This method receives a predicate to test if a metric is allowed or not.

// Only emit "resilience4j.circuitbreaker.state" and "resilience4j.circuitbreaker.slow.call.rate"
meterRegistry.config().meterFilter(
    MeterFilter.denyUnless { it.name.startsWith("resilience4j.circuitbreaker.s") }
)

In all cases, we can bind our CircuitBreakerRegistry object with the newly configured meterRegistry.

Conclusion

In most situations, we write the metrics emitting code ourselves and have full control over how we emit them. But in case we need to modify metrics emitted by third party libraries, the MeterFilter interface is a powerful tool.

A software developer from Vietnam and is currently living in Japan.

One Thought on “Use MeterFilter to change Resilience4j metrics”

Leave a Reply