You know those pages you receive in the middle of the night? Not a full-blown fire, mind you, but rather a slow-burning panic? Let me tell you one of those stories that changed the way my team built software forever. It was 2 a.m., and the graphs looked bad. Not dead, mind you, but sick. Our microservices were still talking, but P95 latencies were rising high in the sky, like a lazy balloon. And retries were starting to cascade. The whole system felt like it was in a swamp.
So what was the problem? A “safe” configuration change to our API gateway, a new rate limit, and slight change of routing. It turned out that this change and a previous deploy of an unrelated service that occurred at least an hour earlier had collided in some silent serpentine handshake. The result was a slow, luscious, and irresistible drain on performance.
Read More from DZone.com Feed
