New Relic’s “Request Queuing” metric is an often misunderstood and badly named figure that often confuses and worries application developers and DevOps engineers. In 2013, after receiving the umpteenth support request worried about this metric while working with Engine Yard’s Application Support team, I took some time and wrote this guide to show how request queueing is actually request latency, and does not indicate that your application server has its request queue backed up, but instead that this is the difference in time between front-end proxy and the execution of the New Relic Agent code in your app.

Understanding New Relic Queuing: Engine Yard Blog (PDF)

(PDF is self-hosted in the event Engine Yard discontinues this article or their past blog posts.)

Request queuing in New Relic is often misinterpreted as “there’s a bunch of unserved HTTP requests from users piled up in the (Puma/Thin/Unicorn/Passenger) queue! Oh no!”. Thankfully - and somewhat confusingly - that’s not the case at all. At Engine Yard, we added a custom header to nginx that would mark the point in time when nginx itself received the HTTP request. This is after it’s been routed from the user, across the internet, to the first entry point of an application (wherever your A or AAAA record sticks to), and through the load balancer (in our case, haproxy), and finally to nginx. Then, as the request was processed by the application server in use - whatever it may have been - New Relic’s agent code in your application would look for this header and subtract the time difference between then, and the point in time it executed that code. Hence, this isn’t a queue at all, it’s latency.

The article goes into more detail and shows you how you can use various tools for both Phusion Passenger and Unicorn - the latter requiring some custom code that I’ve distilled into a tool - to find out how many requests are ACTUALLY piled up under the hood in the global request queue.

The resulting tool for checking Unicorn request queue (for real), unicorn_status.rb, was a joint effort between myself, Adam Holt, and Christopher Rigor.

People using Phusion’s excellent Passenger application server can get this information much easier: simply run passenger-status on the VM that’s processing your application’s requests.

How is request latency useful?

If this figure gets unreasonably high (what’s “unreasonable” depends on your application and architecture, but in general up to around ~50-100ms of latency is “normal”) it could be a signal that something is getting between your reverse proxy (e.g. nginx) and your application’s ability to process request code.

Here’s a non-exhaustive summary list of possible causes when you’re seeing high request latency in your application environment: