nginx upstream DNS解析问题
Tag nginx, upstream, dns, on by view 9

最近发现我香港服务器上放置的几个 web 站点经常会偶尔出现无法访问的情况,这个香港服务器上放置的是 trojan 加 nginx,流量从trojan进入,部分转发出去,另外部分是web站点的流量,转发到nginx,从而实现流量代理和web访问。

这个香港节点出现web访问异常,之前也遇到过几次,都是重启nginx就正常了。这次决定仔细看下是什么情况,登陆节点,首先查看trojan日志,发现在正常转发,再看下nginx的日志,如下

2022/11/14 01:27:11 [error] 22#22: *473474 upstream timed out (110: Connection timed out) while connecting to upstream, client: 61.177.173.46, server: 0.0.0.0:22, upstream: "36.36.106.166:23000", bytes from/to client:0/0, bytes from/to upstream:0/0
2022/11/14 01:27:27 [error] 22#22: *473486 upstream timed out (110: Connection timed out) while connecting to upstream, client: 203.205.141.115, server: 0.0.0.0:22, upstream: "36.36.106.166:23000", bytes from/to client:0/0, bytes from/to upstream:0/0
2022/11/14 01:28:36 [error] 22#22: *473490 upstream timed out (110: Connection timed out) while connecting to upstream, client: 203.205.141.115, server: 0.0.0.0:22, upstream: "36.36.106.166:23000", bytes from/to client:0/0, bytes from/to upstream:0/0
2022/11/14 01:29:19 [error] 22#22: *473492 upstream timed out (110: Connection timed out) while connecting to upstream, client: 61.177.173.52, server: 0.0.0.0:22, upstream: "36.36.106.166:23000", bytes from/to client:0/0, bytes from/to upstream:0/0

发现nginx日志显示有连接超时,于是我决定判断一下是否真的连接不上,telnet

➜  trojan git:(master) telnet 36.36.106.166 23000
Trying 36.36.106.166...
^C

果然连接不上,我web站点配置如下

server {
    listen 10110 ssl http2;
    server_name xxx.duguying.net;

    root /usr/share/nginx/html;
    index index.php index.html;
    ssl_certificate /data/certs/_.duguying.net.crt; 
    ssl_certificate_key /data/certs/_.duguying.net.key;
    ssl_stapling on;
    ssl_stapling_verify on;
    add_header Strict-Transport-Security "max-age=31536000";

    location / {
        include git.deny;
        proxy_set_header X-Forwarded-For remote_addr;
        proxy_set_header Host            http_host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_pass http://jxx.duguying.net:****;
        proxy_buffering    off;
        proxy_buffer_size  128k;
        proxy_buffers 100  128k;
    }
}

web站点的流量经香港节点转发到 jxx.duguying.net ,dig一下

➜  ~ dig jxx.duguying.net

; <<>> DiG 9.11.5-P4-5.1+deb10u7-Debian <<>> jxx.duguying.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25787
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;jxx.duguying.net.             IN      A

;; ANSWER SECTION:
jxx.duguying.net.      60      IN      A       222.248.21.219

;; Query time: 22 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: 一 11月 14 11:17:24 CST 2022
;; MSG SIZE  rcvd: 51

发现nginx转发的节点居然不是目前upstream域名解析的节点,这说明我upstream域名dns更新了,但是nginx上upstream域名解析没更新。网上查询之后才知道,域名作为upstream,它的解析节点并不会时事更新。解决方案

方案一:每次dns有变化,重启Nginx (最开始出现故障重启nginx恢复就是这种解决方案)
方案二:使用Nginx Resolver
方案三:使用 Nginx-upstream-dynamic-server (nginx模块)
方案四:使用 ngx_upstream_jdomain (nginx模块)

这里介绍一下方案二,添加resolver相关配置,只需要将nginx配置改为如下

server {
    listen 10110 ssl http2;
    server_name xxx.duguying.net;

    resolver 127.0.0.1 valid=60s;    // 这里设置dns服务器
    resolver_timeout 3s;             // 这里设置dns解析超时时间

    root /usr/share/nginx/html;
    index index.php index.html;
    ssl_certificate /data/certs/_.duguying.net.crt; 
    ssl_certificate_key /data/certs/_.duguying.net.key;
    ssl_stapling on;
    ssl_stapling_verify on;
    add_header Strict-Transport-Security "max-age=31536000";

    location / {
        include git.deny;
        proxy_set_header X-Forwarded-For remote_addr;
        proxy_set_header Host            http_host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_pass http://jxx.duguying.net:****;
        proxy_buffering    off;
        proxy_buffer_size  128k;
        proxy_buffers 100  128k;
    }
}