Spring Data Redis — Simple, yet Challenging

Published On: 3. Dezember 2021|Categories: Tech|
Seilbahn
Database migra­tions aren’t always easy.

We recently migrated our session management from MongoDB to Redis. The migration itself was motivated by our experience with MongoDB, which didn’t handle high frequent updates and even more frequent reads parti­cu­larly well. Redis on the other hand is known as a proven storage to handle exactly that use case.

Database migra­tions aren’t always easy, because we need to learn new patterns, best practices, and quirks of another service. We aimed at keeping our Java service layer as simple as possible, so that it would be stable and future proof: the session management is certainly one of those services with a quite stable feature set and its code won’t be touched very often. So, keeping it simple and compre­hen­sible for anybody peeking into it after several years is an important aspect.

Due to our, well, naïve approach we faced two issues:

  1. Spring Data’s concept of imple­menting secondary indices and how it works together with EXPIRE
  2. Redis’ scope of atomicity and Spring Data’s update mechanism

This article summa­rizes our learnings adopting the Redis with a thin Java service using Spring Data as persis­tence layer.

Spring Data Redis with Secondary Indices and EXPIRE/TTL

Adopting Spring Data with Redis starts straight forward: all you need are the depen­dencies for your Gradle or Maven build along with a @EnableRedisRepositories annotation in a Spring Boot app. Most defaults of Spring Boot make sense and get you running with a Redis instance quite smoothly.

The actual imple­men­tation of a generic repository isn’t required, because Spring Data lets you declare a simple interface leading to a generic instance at runtime.

Our repository started like this:

Copy to Clipboard

Our entities managed by that repository also started as simple as it might get:


Copy to Clipboard

You’ll notice that we opted to model the TimeToLive as ttl property, which is trans­lated as EXPIRE for the entity. We didn’t want to manually track expiring sessions, but wanted Redis to remove expired sessions trans­par­ently. The ttl is regularly refreshed to its initial value during user activity — otherwise a user might be logged out in the midst of working with our platform.

What happens when a user actually pushes the Logout button or how could we disable a user account and invalidate a running session? Easy: we also have a userId as part of the SessionData and can perform a query to find every session for that userId. The required changes for the classes above look like this:

Copy to Clipboard

Sessi­on­Dat­a­Crud­Re­po­sitory


Copy to Clipboard

Sessi­onData


The innocent looking @Indexed annotation triggers a special behaviour in Spring Data. The annotation actually tells Spring Data to create and maintain another index on the entities so that we can query a whole list of SessionData for a given userId. The combi­nation of a secondary index and automated expiry of entities makes the setup a bit more complex, though. Redis won’t automa­ti­cally update the secondary index when a referenced entity is deleted, so Spring Data needs to handle that case. Yet, Spring Data doesn’t constantly query Redis for expiring entities (keys), which is why Spring Data relies on Redis Keyspace Notifi­ca­tions for expiring keys along with so called Phantom Copies:

When the expiration is set to a positive value, the corre­sponding EXPIRE command is run. In addition to persisting the original, a phantom copy is persisted in Redis and set to expire five minutes after the original one. This is done to enable the Repository support to publish Redis­Key­Ex­pi­re­dEvent, holding the expired value in Spring’s Appli­ca­tion­Event­Pu­blisher whenever a key expires, even though the original values have already been removed.

There’s a little detail to notice in the next paragraph:

By default, the key expiry listener is disabled when initia­lizing the appli­cation. The startup mode can be adjusted in @EnableRedisRepositories or Redis­KeyVa­lue­Ad­apter to start the listener with the appli­cation or upon the first insert of an entity with a TTL. See Enable­Key­spa­ceE­vents for possible values.

Sadly, we didn’t read so far. Thats’s why we experi­enced the effects of enabling EXPIRE with disabled key expiry listeners, combined with an ever growing secondary index. Long story short: we observed an ever growing amount of keys and growing memory usage — until the Redis’ memory limit was reached.

Inspecting the Redis keys made it obvious where to find the confi­gu­ration error, which ultim­ately made us fix the @EnableRedisRepositories annotation enabling keyspace events. We also disabled the automated server confi­gu­ration of the notify-keyspace-events property, because we enabled that setting server-side:

Copy to Clipboard

We also had to manually cleanup the stale data, so let’s also mention that you should always prefer SCAN instead of KEYS when working with large data sets. Netflix’s nf-data-explorer might help, if you don’t fancy working with the native redis-cli.

Missing Entities During Concurrent Reads and Writes

With the issue of an ever growing memory usage being fixed we eventually made the new service the primary source for our sessions.

When requests hit our security chain, we always verify the users’ session to be valid. Those verifi­ca­tions are simple lookups of a sessionId at the session management. Usually, a status 404 NOT FOUND from the session management indicates either the sessionId to be invalid (unknown), or the session being expired (and deleted by Redis).

Along with some related changes in our appli­ca­tions consuming the new api we observed another strange behaviour: some sessions couldn’t be found, although we were 100% sure that the sessions should still be valid (known and not expired). After a session lookup having failed, most retries succeeded, so we knew that the data wasn’t lost and simply couldn’t be found.

We couldn’t actively reproduce the erroneous behaviour and collecting logs, metrics, and traces didn’t give lead. Along the way we added caching and other workarounds, with some changes being improv­ments for the overall behaviour, but we didn’t actually fix the issue.

If you carefully read the first part of this article, you might remember the little detail about us refreshing the ttl. We not only refresh the ttl but a lastResponse timestamp as part of the SessionData, too:

Copy to Clipboard

So, let’s have a more detailed look at the request processing regarding the session management. The user sends a request, along with a sessionId, indicating that they are logged in. We perform a lookup with that sessionId to verify the user’s session. If the session is considered to be valid, the appli­cation can proceed with the requested action. After the appli­cation has processed the request, the security chain regularly updates the session resetting the ttl and writing the current lastResponse timestamp. Ususally, the user performs several requests — probably not the actual human, but a frontend appli­cation running in the browser. That frontend appli­cation doesn’t truly care how frequently it sends new requests, so we can assume that several requests might hit our backends at the same time.

Several requests being verified. Several requests triggering a session refresh along with the write operation on the SessionData.

We were still using Spring Data’s CrudRepository for reading and updating sessions, using the following code:

Copy to Clipboard

reading

Copy to Clipboard

updating


Sometimes, the repository.findById(...) didn’t yield anything, so we focussed on that part. The problem was triggered by the repository.save(...) call, though. After several weeks of googling and staring at logs and traces we found a corre­lation between refreshSessionTtl and getSession calls.

Many articles in the internet already trained us to think of Redis as a single threaded service, performing every request sequen­tially. Googling with “spring data redis concurrent writes” as part of the search query led us to stack­overflow and the issue at spring-projects/­­spring-data-redis/is­­sues/1826, where our problem was described and even explained — along with a fix.

Long story short: Spring Data imple­ments updates as a sequence of DEL and HMSET, without any transac­tional guarantees. In other words: updating entities via CrudRe­po­si­tories doesn’t provide atomicity. Our HGETALL requests sometimes happened exactly between DEL and HMSET, resulting in an empty result or sometimes with a result, but a negative ttl.

Our issue could now be repro­duced with an integration test and fixed using PartialUpdate. So the imple­men­tation above changed to:

Copy to Clipboard

Summary

A combi­nation of expiring keys, secondary indices, and delegating all the magic to Spring Data Redis requires proper confi­gu­ration of keyspace event listeners. Otherwise your used memory grows over time due to the phantom copies. Consider using a confi­gu­ration like @EnableRedisRepositories(enableKeyspaceEvents = ON_STARTUP) in your app.

In an environment with concurrent reads and updates, beware that Spring Data’s CrudRepository imple­ments updates as a two-step process of DEL and HMSET. If you observe spora­di­cally missing keys or results with a negative TTL, you might have hit a concur­rency issue. Check your write opera­tions and consider updating the changed properties with a PartialUpdate and Spring Data’s RedisKeyValueTemplate#update method.