Note: phiên bản Tiếng Việt của bài này ở link dưới.
https://duongnt.com/meterfilter-datadog-vie
Circuit Breaker is a well-known pattern to prevent an application from performing an operation that is likely to fail. While developing Kotlin applications, I often use the Resilience4j library to implement this pattern. This library also provides a module for Micrometer to help with integrating with most popular monitoring systems.
Today, we will take a look at some issues when integrating Resilience4j running on Amazon EC2 with Datadog; and see how the MeterFilter interface can help us solving those problems.
Note: this article assumes some familiarity with the Resilience4j library, especially how to create a CircuitBreakerRegistry
and how to use that registry to create CircuitBreaker
objects.
Use Resilience4j with Micrometer and Datadog
As mentioned above, Micrometer is a vendor-neutral facade to simplify the integration between our application and a monitoring system. However, we will only consider Datadog in this article. The code to bind a CircuitBreakerRegistry
object to a MeterRegistry
is very simple.
val circuitBreakerRegistry = CircuitBreakerRegistry.ofDefaults()
TaggedCircuitBreakerMetrics
.ofCircuitBreakerRegistry(circuitBreakerRegistry)
.bindTo(meterRegistry)
After that, we can use the CircuitBreakerRegistry
to create CircuitBreaker
objects. And all calls made through those breakers will automatically emit Datadog metrics. Please see this link for the full list of supported metrics.
However, the default metrics are not always suitable to our needs. In the following sections, we will go through these two issues and find solutions for them.
- Collision of the
name
tag in various metrics. - Limiting the amount of metrics that Resilience4j emits.
Collision of the "name" tag
It this scenario, our application uses two breakers (named serviceA
and serviceB
) to protect calls made to two services. And it runs on Amazon EC2 clusters managed by Kubernetes.
A simple dashboard to display the total number of calls
As we can see in this link, the resilience4j.circuitbreaker.calls
metric can be used to monitor the total number of calls made via circuit breakers. It also supports a name
tag to differentiate between services using different breakers. But when we try to group calls count by services using that name
tag, an application that only uses two breakers can create a graph like this.
It’s obvious that no one can get any useful information from such a graph. But why are there so many lines? Shouldn’t we have only two entries for our two services? Let’s take a look at the Overview tab of this dashboard.
We do see the values serviceA
and serviceB
in the name
tag. But they are mixed with a bunch of seemingly random hostnames. Where do they come from?
Turns out this tag is just one of many tags automatically emitted by Datadog when we integrate it with an application running on Amazon EC2. It records the names of all EC2 instances in our cluster.
Modify a tag in Resilience4j metrics with MeterFilter
In a perfect world, the best way to solve this issue is to stop Datadog from emitting its own name
tag. Unfortunately, as of April 2nd 2023, I haven’t found a way to do that. Instead, we will use a MeterFilter to rename the name
tag in Resilience4j metrics to servicename
. That way, it won’t collide with the tag emitted by Datadog.
To do that, we need to use the renameTag method. Let’s look at its signature.
meterNamePrefix
: the prefix of the metrics whose tag we want to rename.MeterFilter
will look for the given tag in all metrics with this prefix and rename them.fromTagKey
: the origin tag we want to rename.toTagKey
: the new name for our tag
Thus, the code to rename all name
tags in Resilience4j metrics to serviceName
is below.
meterRegistry.config().meterFilter(
MeterFilter.renameTag(
"resilience4j.circuitbreaker", // Modify all Resilience4j metrics, not just resilience4j.circuitbreaker.calls
"name",
"servicename"
)
)
We can then bind our CircuitBreakerRegistry
object with this meterRegistry
, and our dashboard will look much cleaner.
Stop Resilience4j from emitting selected metrics
As mentioned in this link, Resilience4j emits a total of seven metrics, each with multiple tags. The name
tag (or serviceName
if we modify it like we described in the previous section) is particularly troublesome. This is because Datadog charges by the cardinality of possible tags. Which means for every new breaker, we will incur an additional charge.
Fortunately, we can mitigate this issue by limiting which metrics Resilience4j can emit. There are two different approaches.
- Create a blacklist to deny metrics we don’t need.
- Create a whitelist to only allow metrics we need.
Create a blacklist with MeterFilter
If there are only a few metrics we wish to omit, we can use either the deny method. This method receives a predicate to test if a metric should be omitted or not.
// Stop emitting the "resilience4j.circuitbreaker.buffered.calls" metric
meterRegistry.config().meterFilter(
MeterFilter.deny { it.name == "resilience4j.circuitbreaker.buffered.calls" }
)
Or we can use the denyNameStartsWith method. This method only checks if the name of a metric matches the given prefix.
// Omit both "resilience4j.circuitbreaker.state" and "resilience4j.circuitbreaker.slow.call.rate"
meterRegistry.config().meterFilter(
MeterFilter.denyNameStartsWith("resilience4j.circuitbreaker.s")
)
Create a whitelist with MeterFilter
On the other hand, if we only want to emit a few metric then we can use the denyUnless method to create a whitelist. This method receives a predicate to test if a metric is allowed or not.
// Only emit "resilience4j.circuitbreaker.state" and "resilience4j.circuitbreaker.slow.call.rate"
meterRegistry.config().meterFilter(
MeterFilter.denyUnless { it.name.startsWith("resilience4j.circuitbreaker.s") }
)
In all cases, we can bind our CircuitBreakerRegistry
object with the newly configured meterRegistry
.
Conclusion
In most situations, we write the metrics emitting code ourselves and have full control over how we emit them. But in case we need to modify metrics emitted by third party libraries, the MeterFilter
interface is a powerful tool.
One Thought on “Use MeterFilter to change Resilience4j metrics”