nginx 設定 proxy_next_upstream 會出現未預期 502 錯誤問題排查

2023-11-07 15:00:52

當使用nginx代理多個閘道器範例時,
當被請求服務的get 介面異常時,如 error timeout invalid_header http_500 http_502 http_503 http_504,
nginx 會響應 502狀態碼,

在我之前的認知裡,nginx 只會轉發 後端服務的響應,一般不會對狀態碼進行修改

nginx 設定如下:

worker_processes  1;
daemon off;
master_process off; 
error_log  logs/error.log  debug; 
events {
    worker_connections  1024;
}
http {
    include       mime.types;
    default_type  application/octet-stream;
     log_format apm '[$time_local]\tclient=$remote_addr\t'
               'upstream_addr=$upstream_addr\t'
               'upstream_status=$upstream_status\t'
               'document_root="$document_root"\t'
               'fastcgi_script_name="$fastcgi_script_name"\t'
               'request_filename="$request_filename"\t'
               'request_time=$request_time\t'
               'upstream_response_time=$upstream_response_time\t'
               'upstream_connect_time=$upstream_connect_time\t'
               'upstream_header_time=$upstream_header_time\t';
    access_log  logs/access.log  apm;
    sendfile        on; 
    keepalive_timeout  65;
    upstream gateway {
        server 192.168.2.102:12012;
        server 192.168.2.102:12011;
    }
    server {
        listen       80;
        server_name  localhost; 
        location / {
            root   html;
            index  index.html index.htm;
        }
        location /api/ {
            proxy_pass http://gateway/;
            proxy_next_upstream error http_503 http_502;
        } 
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        } 
    }
}

範例測試程式碼:

    @GetMapping("/excep503")
    public ResponseEntity<String>  excep503(HttpServletRequest request, Integer times) throws InterruptedException {
        Thread.sleep(200);
        return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).body("服務不可用");
    }

測試方法:

多次 get 請求一個異常介面

現象:

有時報錯 502 ,有時報錯 503


返回 503時

access_log 中的 upstream_addr 會有兩條: 192.168.2.102:12012, 192.168.2.102:12011
error_log 會出現分別請求 兩臺閘道器的紀錄檔:
首先請求 connect to 192.168.2.102:12011 ;
102:12011 返回 503 Service Unavailable
報錯

upstream server temporarily disabled while reading response header from upstream

然後 重新指向 connect to 192.168.2.102:12012
102:12012 同樣 返回 503 Service Unavailable

返回 502時

access_log 中的 upstream_addr 只會有一條:upstream_addr=192.168.2.102:12011

error_log 只會出現一次請求閘道器的紀錄檔:
請求 connect to 192.168.2.102:12011 ;
102:12011 返回 503 Service Unavailable
報錯

upstream server temporarily disabled while reading response header from upstream,
no live upstreams while connecting to upstream,

返回502的原因

根據 查閱相關資料

傳入的ft_type為 40000000 匹配到 default ,所以最終狀態碼為 NGX_HTTP_BAD_GATEWAY ,即 502

nginx-1.24.0\src\http\ngx_http_upstream.c(ngx_http_upstream_next) 4370行;

switch (ft_type) {

    case NGX_HTTP_UPSTREAM_FT_TIMEOUT:
    case NGX_HTTP_UPSTREAM_FT_HTTP_504:
        status = NGX_HTTP_GATEWAY_TIME_OUT;
        break;

    case NGX_HTTP_UPSTREAM_FT_HTTP_500:
        status = NGX_HTTP_INTERNAL_SERVER_ERROR;
        break;

    case NGX_HTTP_UPSTREAM_FT_HTTP_503:
        status = NGX_HTTP_SERVICE_UNAVAILABLE;
        break;

    /*
     * NGX_HTTP_UPSTREAM_FT_BUSY_LOCK and NGX_HTTP_UPSTREAM_FT_MAX_WAITING
     * never reach here
     */

    default:
        status = NGX_HTTP_BAD_GATEWAY;
    }

502 與 503 的 邏輯分岔路:

nginx-1.24.0\src\http\ngx_http_upstream_round_robin.c(ngx_http_upstream_get_round_robin_peer)449 行

peers = rrp->peers;
    ngx_http_upstream_rr_peers_wlock(peers);

    if (peers->single) {
        peer = peers->peer;

        if (peer->down) {
            goto failed;
        }

        if (peer->max_conns && peer->conns >= peer->max_conns) {
            goto failed;
        }

        rrp->current = peer;

    } else {

        peer = ngx_http_upstream_get_peer(rrp);

        if (peer == NULL) {
            goto failed;
        }

        ngx_log_debug2(NGX_LOG_DEBUG_HTTP, pc->log, 0,
                       "get rr peer, current: %p %i",
                       peer, peer->current_weight);
    }

其中的 single 標誌位是一個用於標識後端伺服器組是否只有一個成員的標誌,即 upstream_addr 為單個

所以現在的問題是:

為什麼 有時upstream_addr是兩個 ,有時是一個

debug nginx 原始碼

nginx啟動時 給每個後端節點賦值了一個預設的超時時間 10s

發生異常時將節點標記為不可用:

nginx-1.24.0/src/http/ngx_http_upstream_round_robin.c(ngx_http_upstream_get_peer) 522 行

    for (peer = rrp->peers->peer, i = 0;
         peer;
         peer = peer->next, i++)
    {
        n = i / (8 * sizeof(uintptr_t));
        m = (uintptr_t) 1 << i % (8 * sizeof(uintptr_t));

        if (rrp->tried[n] & m) {
            continue;
        }

        if (peer->down) {
            continue;
        }

        if (peer->max_fails
            && peer->fails >= peer->max_fails
            && now - peer->checked <= peer->fail_timeout)
        {
            continue;
        }

        if (peer->max_conns && peer->conns >= peer->max_conns) {
            continue;
        }

        peer->current_weight += peer->effective_weight;
        total += peer->effective_weight;

        if (peer->effective_weight < peer->weight) {
            peer->effective_weight++;
        }

        if (best == NULL || peer->current_weight > best->current_weight) {
            best = peer;
            p = i;
        }
    }

驗證

不斷請求介面,發現每過10秒,就會恢復503 錯誤,符合猜測