Note: phiên bản Tiếng Việt của bài này ở link dưới.
https://duongnt.com/redis-raw-bytes-vie
Normally, when we use RedisTemplate
to interact with Redis, it converts our data into its string representation, using UTF-8, before performing serialization. Although this approach is good enough in most cases, we can squeeze out extra memory by sending the data in raw bytes format to Redis. But is that something worth doing? In today’s article, we will try implementing that solution, and see what downsides it has.
You can download all sample code from the link below.
https://github.com/duongntbk/redis-raw-bytes-demo
Prerequisites
We need a Redis instance to run the test code in this article. To keep things simple, we will use Docker. Run the following command in your command line and replace yourpassword with a password of your choice.
docker run -d --name redis -p 6379:6379 redis:latest redis-server --requirepass <yourpassword>
Our problem
Let’s say we have to implement a caching solution using Redis, with the following requirements.
- The key is a
Long
object with 10 ~ 11 digits on average. - The value is a
Double
object that has 17 decimal digits (the maximum allowed byDouble
type). - It should support pipelining, so that we can send multiple entries and set their expiration using just one connection.
- We don’t want to manually serialize/deserialize data every time we interact with Redis.
Use RedisTemplate with built-in serializers
Setting up RedisTemplate bean
Here is how we set up a RedisTemplate<Long, Double>
. The interesting part is below, where we use GenericToStringSerializer
as our serializers.
redisTemplate.keySerializer = GenericToStringSerializer(Long::class.java)
redisTemplate.valueSerializer = GenericToStringSerializer(Double::class.java)
The class GenericToStringSerializer transforms the key and value into their string representations before performing the serialization.
We can then get an instance of the bean from the app context.
val redisTemplateSerialize = context.getBean(
"redisTemplateSerialize", RedisTemplate::class.java
) as RedisTemplate<Long, Double>
Size of an entry when using the built-in serializer
Below is the code to send two entries to Redis.
redisTemplate.opsForValue().multiSet(
mapOf(
6359284517L to 0.5238106733071787,
Long.MAX_VALUE to 0.6238106733071787,
)
Here are the entries inside Redis. Both the keys and values have been converted into String
.
127.0.0.1:6379> get "6359284517"
"0.5238106733071787"
127.0.0.1:6379> get "9223372036854775807"
"0.6238106733071787"
Now let’s look at their size.
127.0.0.1:6379> memory usage "6359284517"
(integer) 80
127.0.0.1:6379> memory usage "9223372036854775807"
(integer) 88
127.0.0.1:6379>
As we can see, the size of 9223372036854775807 is bigger than the size of 6359284517. This is because the former has more digits, which results in a longer String
. And those String
objects take up more space than the original Long
objects. Similarly, because our values have a lot of digits, their String
form also takes up more space than the size of a Double
object.
Use custom serializers to send raw bytes to Redis
Set up a RedisTemplate with custom serializers
We need to replace the GenericToStringSerializer
to stop RedisTemplate
from converting out entries into String
objects. Unfortunately, Spring Data Redis does not support Long <-> ByteArray
and Double <-> ByteArray
serializers out of the box. But we can create them ourselves by implementing the RedisSerializer<T>
interface. We can find such implementations in the serializer folder of the demo repo.
Then we can simply use those custom serializers to set up a new bean, as can be seen here.
redisTemplate.keySerializer = LongToByteArraySerializer()
redisTemplate.valueSerializer = DoubleToByteArraySerializer()
And we can get an instance of that bean from the app context.
val redisTemplateByteArray = context.getBean(
"redisTemplateByteArray", RedisTemplate::class.java
) as RedisTemplate<Long, Double>
Size of an entry as raw bytes
The code using redisTemplateByteArray
to send data to Redis is identical to the one in the previous section. But let’s look at the entries in Redis.
127.0.0.1:6379> keys *
1) "%\xfb\n{\x01\x00\x00\x00"
2) "\xff\xff\xff\xff\xff\xff\xff\x7f"
127.0.0.1:6379> get "%\xfb\n{\x01\x00\x00\x00"
"?\xe0\xc3\x0e\x99\xe4\xcde"
127.0.0.1:6379> get "\xff\xff\xff\xff\xff\xff\xff\x7f"
"?\xe3\xf6A\xcd\x18\x00\x98"
All the keys and values are now in ByteArray format. Now let’s check their size.
127.0.0.1:6379> memory usage "%\xfb\n{\x01\x00\x00\x00"
(integer) 72
127.0.0.1:6379> memory usage "\xff\xff\xff\xff\xff\xff\xff\x7f"
(integer) 72
As we can see, the size of both entries are now 72 bytes. This is a 10% reduction in our typical case, and a 18% reduction in the extreme case, where the key has the maximum number of digits.
Use the connection to send raw bytes to Redis
A RedisTemplate
object encapsulates the serialization process and generally is the recommended way to interact with Redis. However, it does not support batching commands before sending. To use batching, we need to use the underlying LettuceConnetion
.
The steps are below.
- Create a bean of type
RedisConnectionFactory
. - Retrieve the factory from the AppContext.
- Get a connection from the factory, use it to open a pipeline, send the data, then close it (code).
As we can see, we need to take care of the conversion from Long/Double
to ByteArray
by ourselves. However, the ability to batch multiple requests in one connection can be worth the trouble.
The downsides
At 10 ~ 18%, the amount of memory we managed to save might seem lower than expected. After all, the string “6359284517” is 10 bytes, and the string “0.5238106733071787” is 18 bytes. A naive calculation would put the size of a String
converted entry as 28 bytes, compared to 16 bytes of a Long
and Double
combination. However, Redis also needs to allocate memory for the data structure of the entry, as well as to store its expiration date (if any).
And another downside is that sending raw bytes to Redis actually costs more memory if the numbers are short (not many digits). This is because an UTF-8 String
can be as small as 1 byte, while a Long
or Double
object always occupies 8 bytes, no matter how many digits it has.
Below is the size of the entry { 1: 1.0 }
.
127.0.0.1:6379> memory usage "1"
(integer) 56
127.0.0.1:6379> memory usage "\x01\x00\x00\x00\x00\x00\x00\x00"
(integer) 72
Not only did we not save any memory, we even increased memory usage by almost 30% per entry. Unless we are sure that our keys and values always have many digits, perhaps we should stick to the default serializer.
Can we use a pass-through serializer?
What if we use a serializer that does nothing and perform the conversion to/from ByteArray ourselves? Here is how we set it up. Note that the type of the template is now RedisTemplate<ByteArray, ByteArray>
fun redisTemplate(): RedisTemplate<ByteArray, ByteArray> {
//...
redisTemplate.keySerializer = RedisSerializer.byteArray()
redisTemplate.valueSerializer = RedisSerializer.byteArray()
//...
}
And here is how we use it.
val key = // code to convert 6359284517L to ByteArray
val value = // code to convert Long.MAX_VALUE to ByteArray
redisTemplate.opsForValue().set(key, value)
Functionally, this approach works just as well as the previous one using custom serializers. But I don’t think there is any reason to do this. After all, why litter our code base with byte conversion logic if we can encapsulate it inside a serializer class when we can’t even take advantage of batching using pipeline?
Conclusion
Sending data as raw bytes to Redis can save us some memory. But we should perform detailed benchmarks before making this change. As it can backfire if our data doesn’t follow a very specific format.