Searching for a “stack trace”

Situation 1: upstream server did not reply to our SYN within proxy_connect_timeout

... upstream timed out (110: Connection timed out) while connection to upstream, ...

First, proxy_connect_timeout is used in ngx_http_upstream_connect() to set a timer on c->write.

    if (rc == NGX_AGAIN) {
        ngx_add_timer(c->write, u->conf->connect_timeout);
        return;
    }

Search for the string "upstream timed out" returns 4 matches in src/http/ngx_http_upstream.c, and we’d like to find out which function handles such timeouts.

Among the 4 functions, ngx_http_upstream_process_upstream() and ngx_http_upstream_process_non_buffered_upstream() are called in ngx_http_upstream_send_response(), which is only called after reading the response header.

ngx_http_upstream_process_upgraded() is called for WebSocket upgrade, so we ignore that as well.

Then we focus on the remaining ngx_http_upstream_next().

    if (ft_type == NGX_HTTP_UPSTREAM_FT_TIMEOUT) {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, NGX_ETIMEDOUT,
                      "upstream timed out");
    }

A new keyword NGX_HTTP_UPSTREAM_FT_TIMEOUT, this brings our focus to ngx_http_upstream_send_request_handler().

    if (c->write->timedout) {
        ngx_http_upstream_next(r, u, NGX_HTTP_UPSTREAM_FT_TIMEOUT);
        return;
    }

On the other hand, let’s search for "connecting to upstream", which is set in c->log->action and can be seen from the error log. There are 3 matches, but two of them has a different error text, e.g. "connect() failed".

The last match is ngx_http_upstream_connect(). This function calls ngx_event_connect_peer() to establish a connection, and sets u->write_event_handler to ngx_http_upstream_send_request_handler.

Therefore, the function call chain is

ngx_http_upstream_connect() ngx_event_connect_peer() ^> ngx_add_timer(c->write, u->conf->connect_timeout) ^>

event loop in ngx_worker_process_cycle() ngx_process_events_and_timers() ngx_event_expire_timers() ev->handler =

c->write->handler ngx_http_upstream_handler() u->write_event_handler ngx_http_upstream_send_request_handler() ngx_http_upstream_next()

  1. u->peer.free ngx_http_upstream_free_round_robin_peer() ^>
  2. ngx_log_error() ^>
  3. (if no more tries) ngx_http_upstream_finalize_request()
  4. ngx_close_connection() ngx_close_socket() i.e. close() ^>
  5. ngx_http_upstream_connect() (retry next server)

Note: ^> represents subsequent execution after returning to the caller function, and is a function call or pointer dereference.

If connection succeeded, the timer will be reset to send_timeout or cleared.

        if (!c->write->ready || u->request_body_blocked) {
            ngx_add_timer(c->write, u->conf->send_timeout);
 
        } else if (c->write->timer_set) {
            ngx_del_timer(c->write);
        }

Sequence of events

  1. ngx_event_connect_peer(): Connection is attempted.
  2. ngx_http_upstream_handler(): Event loop detects the timeout and calls this handler.
  3. ngx_http_upstream_free_round_robin_peer(): fails is accounted for and compared with max_fails to temporarily disable upstream servers.
  4. ngx_log_error(): “upstream timed out” is logged.
  5. ngx_close_connection(): old connection to upstream is closed.

Situation 2: upstream server did not reply to our SYN within OS timeout

When ngx_event_connect_peer() is called, both c->read and c->write are registered to the event loop.

    rc = connect(s, pc->sockaddr, pc->socklen);
 
    if (ngx_add_event(rev, NGX_READ_EVENT, event) != NGX_OK) {
        goto failed;
    }
 
    if (rc == -1) {
 
        /* NGX_EINPROGRESS */
 
        if (ngx_add_event(wev, NGX_WRITE_EVENT, event) != NGX_OK) {
            goto failed;
        }
 
        return NGX_AGAIN;
    }

Let’s see what connect(2) in System Calls Manual says about the comment NGX_EINPROGRESS.

EINPROGRESS

The socket is nonblocking and the connection cannot be completed immediately. (UNIX domain sockets failed with EAGAIN instead.) It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure).

In this case, epoll receives the error as a write event and calls its handler. ngx_http_upstream_send_request_handler() then calls ngx_http_upstream_send_request(), which calls ngx_http_upstream_test_connect() and detects the failure.

            if (flags & NGX_POST_EVENTS) {
                queue = rev->accept ? &ngx_posted_accept_events
                                    : &ngx_posted_events;
 
                ngx_post_event(rev, queue);
 
            } else {
                rev->handler(rev);
            }

In ngx_http_upstream_test_connect(), error is logged as “connect() failed” on Linux.

        if (err) {
            c->log->action = "connecting to upstream";
            (void) ngx_connection_error(c, err, "connect() failed");
            return NGX_ERROR;
        }

Therefore, the function call chain is

ngx_http_upstream_connect() ngx_event_connect_peer() ^> ngx_add_timer(c->write, u->conf->connect_timeout) ^>

event loop in ngx_worker_process_cycle() ngx_process_events_and_timers() ngx_process_events() = ngx_epoll_process_events() rev->handler =

c->write->handler ngx_http_upstream_handler() u->write_event_handler ngx_http_upstream_send_request_handler()

ngx_http_upstream_send_request()

  • ngx_http_upstream_test_connect() ngx_connection_error() ^>
  • ngx_http_upstream_next()

Sequence of events

  1. ngx_event_connect_peer(): Connection is attempted.
  2. ngx_http_upstream_handler(): Event loop detects the error and calls this handler.
  3. ngx_http_upstream_test_connect(): check for failure and log “connect() failed”.
  4. ngx_http_upstream_free_round_robin_peer(): fails is accounted for and compared with max_fails to temporarily disable upstream servers.
  5. ngx_close_connection(): old connection to upstream is released.

Notes

A possibly better way to do this: set log level to debug, and simulate the connect timeout with a very low proxy_connect_timeout.

References