記一次DNS問題排查

2023-06-29 15:00:30

一、問題:域名flow.nzkong.com解析很慢:

排查過程

抓包分析:tcpdump -i eth0 -n -s 500 port domain

 1 14:40:44.548553 IP 10.13.21.38.29551 > 10.13.255.1.domain: 18621+ A? flow.nzkong.com. (33)
 2 14:40:44.549297 IP 10.13.255.1.domain > 10.13.21.38.29551: 18621| 0/0/0 (33)
 3 14:40:44.549395 IP 10.13.21.38.51700 > 10.13.255.1.domain: Flags [S], seq 472124968, win 64952, options [mss 1412,sackOK,TS val 217687079 ecr 0,nop,wscale 8], length 0
 4 14:40:44.549913 IP 10.13.255.1.domain > 10.13.21.38.51700: Flags [S.], seq 4192295067, ack 472124969, win 24160, options [mss 1220,sackOK,TS val 2624377172 ecr 217687079,nop,wscale 9], length 0
 5 14:40:44.549949 IP 10.13.21.38.51700 > 10.13.255.1.domain: Flags [.], ack 1, win 254, options [nop,nop,TS val 217687079 ecr 2624377172], length 0
 6 14:40:44.550135 IP 10.13.21.38.51700 > 10.13.255.1.domain: Flags [P.], seq 1:71, ack 1, win 254, options [nop,nop,TS val 217687080 ecr 2624377172], length 7018621+ A? flow.nzkong.com. (68)
 7 14:40:44.550260 IP 10.13.255.1.domain > 10.13.21.38.51700: Flags [.], ack 71, win 48, options [nop,nop,TS val 2624377173 ecr 217687080], length 0
 8 14:40:44.558209 IP 10.13.255.1.domain > 10.13.21.38.51700: Flags [P.], seq 1:52, ack 71, win 48, options [nop,nop,TS val 2624377180 ecr 217687080], length 5118621 1/0/0 A 106.75.178.212 (49)
 9 14:40:44.558224 IP 10.13.21.38.51700 > 10.13.255.1.domain: Flags [.], ack 52, win 254, options [nop,nop,TS val 217687088 ecr 2624377180], length 0
10 14:40:44.791651 IP 10.13.255.1.domain > 10.13.21.38.51700: Flags [P.], seq 52:161, ack 71, win 48, options [nop,nop,TS val 2624377414 ecr 217687088], length 10939640 0/1/0 (107)
11 14:40:44.791698 IP 10.13.21.38.51700 > 10.13.255.1.domain: Flags [.], ack 161, win 254, options [nop,nop,TS val 217687321 ecr 2624377414], length 0
12 14:40:44.792061 IP 10.13.21.38.51700 > 10.13.255.1.domain: Flags [F.], seq 71, ack 161, win 254, options [nop,nop,TS val 217687321 ecr 2624377414], length 0
13 14:40:44.792766 IP 10.13.255.1.domain > 10.13.21.38.51700: Flags [F.], seq 161, ack 72, win 48, options [nop,nop,TS val 2624377415 ecr 217687321], length 0
14 14:40:44.792774 IP 10.13.21.38.51700 > 10.13.255.1.domain: Flags [.], ack 162, win 254, options [nop,nop,TS val 217687322 ecr 2624377415], length 0
15 
16 
17 
18 14:41:10.737255 IP 10.13.21.38.21895 > 10.13.255.1.domain: 35380+ A? flow.nzkong.com. (33)
19 14:41:10.737760 IP 10.13.255.1.domain > 10.13.21.38.21895: 35380 1/0/0 A 106.75.178.212 (49)
20 14:41:10.737848 IP 10.13.21.38.54115 > 10.13.255.1.domain: 19011+ AAAA? flow.nzkong.com. (33)
21 14:41:10.738131 IP 10.13.255.1.domain > 10.13.21.38.54115: 19011| 0/0/0 (33)
22 14:41:10.738300 IP 10.13.21.38.51704 > 10.13.255.1.domain: Flags [S], seq 4213947779, win 64952, options [mss 1412,sackOK,TS val 217713268 ecr 0,nop,wscale 8], length 0
23 14:41:10.738668 IP 10.13.255.1.domain > 10.13.21.38.51704: Flags [S.], seq 1145570450, ack 4213947780, win 24160, options [mss 1220,sackOK,TS val 2624403361 ecr 217713268,nop,wscale 9], length 0
24 14:41:10.738693 IP 10.13.21.38.51704 > 10.13.255.1.domain: Flags [.], ack 1, win 254, options [nop,nop,TS val 217713268 ecr 2624403361], length 0
25 14:41:10.738855 IP 10.13.21.38.51704 > 10.13.255.1.domain: Flags [P.], seq 1:71, ack 1, win 254, options [nop,nop,TS val 217713268 ecr 2624403361], length 7035380+ A? flow.nzkong.com. (68)
26 14:41:10.738980 IP 10.13.255.1.domain > 10.13.21.38.51704: Flags [.], ack 71, win 48, options [nop,nop,TS val 2624403361 ecr 217713268], length 0
27 14:41:10.739073 IP 10.13.255.1.domain > 10.13.21.38.51704: Flags [P.], seq 1:52, ack 71, win 48, options [nop,nop,TS val 2624403361 ecr 217713268], length 5135380 1/0/0 A 106.75.178.212 (49)
28 14:41:10.739081 IP 10.13.21.38.51704 > 10.13.255.1.domain: Flags [.], ack 52, win 254, options [nop,nop,TS val 217713269 ecr 2624403361], length 0
29 14:41:10.747067 IP 10.13.255.1.domain > 10.13.21.38.51704: Flags [P.], seq 52:161, ack 71, win 48, options [nop,nop,TS val 2624403369 ecr 217713269], length 10919011 0/1/0 (107)
30 14:41:10.747076 IP 10.13.21.38.51704 > 10.13.255.1.domain: Flags [.], ack 161, win 254, options [nop,nop,TS val 217713277 ecr 2624403369], length 0
31 14:41:10.747175 IP 10.13.21.38.51704 > 10.13.255.1.domain: Flags [F.], seq 71, ack 161, win 254, options [nop,nop,TS val 217713277 ecr 2624403369], length 0
32 14:41:10.747387 IP 10.13.255.1.domain > 10.13.21.38.51704: Flags [F.], seq 161, ack 72, win 48, options [nop,nop,TS val 2624403370 ecr 217713277], length 0
33 14:41:10.747394 IP 10.13.21.38.51704 > 10.13.255.1.domain: Flags [.], ack 162, win 254, options [nop,nop,TS val 217713277 ecr 2624403370], length 0
34 
35 
36 
37 14:41:20.125630 IP 10.13.21.38.32649 > 10.13.255.1.domain: 4265+ A? flow.nzkong.com. (33)
38 14:41:20.125851 IP 10.13.255.1.domain > 10.13.21.38.32649: 4265 1/0/0 A 106.75.178.212 (49)
39 14:41:20.125935 IP 10.13.21.38.28137 > 10.13.255.1.domain: 63155+ AAAA? flow.nzkong.com. (33)

 

  1.抓包分析發現其會解析該域名的AAAA記錄,然後卡住
  2.判斷是因為沒有ipv6的地址導致的原因
  3.考慮新增失敗快取

原因

  • 該域名沒有ipv6的地址,但是dns先會查詢ipv6,如果沒有的話再查ipv4,
  • ipv6查詢不到的話會有個超時以及重試時長,然後這個時長在我們的機器上預設差不多是2到3秒,總的來說就是快取的問題,失敗快取時長設定小了

解決方案    

  設定bind引數:min-ncache-ttl 600  # 設定否定答案的快取時長


二、每次ping域名rec.vnugc.com的第一次迴應都很慢,大於2s

排查過程

抓包分析:tcpdump -i eth0 -n -s 500 port domain

 1 18:19:59.737280 IP 10.35.170.87.38128 > 10.35.255.200.domain: 15552+ A? rec.vnugc.com. (31)
 2 18:19:59.737296 IP 10.35.170.87.38128 > 10.35.255.200.domain: 24775+ AAAA? rec.vnugc.com. (31)
 3 18:19:59.738138 IP 10.35.255.200.domain > 10.35.170.87.38128: 24775 0/1/0 (107)
 4 18:19:59.755957 IP 10.35.255.200.domain > 10.35.170.87.38128: 15552 7/0/0 CNAME cloudbase-sg.vnugc.com., A 118.194.235.163, A 118.194.233.238, A 118.194.235.140, A 118.194.233.192, A 118.194.234.167, A 118.194.235.150 (154)
 5 18:19:59.758176 IP 10.35.170.87.42698 > 10.35.255.200.domain: 21552+ PTR? 163.235.194.118.in-addr.arpa. (46)
 6 18:20:01.010025 IP 10.35.255.200.domain > 10.35.170.87.42698: 21552 ServFail 0/0/0 (46)
 7 18:20:01.010324 IP 10.35.170.87.58235 > 10.35.255.200.domain: 21552+ PTR? 163.235.194.118.in-addr.arpa. (46)
 8 18:20:01.010665 IP 10.35.255.200.domain > 10.35.170.87.58235: 21552 ServFail 0/0/0 (46)
 9 18:20:08.268323 IP 10.35.170.87.41102 > 10.35.255.200.domain: 28888+ A? rec.vnugc.com. (31)
10 18:20:08.268339 IP 10.35.170.87.41102 > 10.35.255.200.domain: 64223+ AAAA? rec.vnugc.com. (31)
11 18:20:08.268660 IP 10.35.255.200.domain > 10.35.170.87.41102: 28888 7/0/0 CNAME cloudbase-sg.vnugc.com., A 118.194.234.167, A 118.194.233.192, A 118.194.235.150, A 118.194.235.163, A 118.194.233.238, A 118.194.235.140 (154)
12 18:20:08.268703 IP 10.35.255.200.domain > 10.35.170.87.41102: 64223 1/1/0 CNAME cloudbase-sg.vnugc.com. (134)
13 18:20:08.270051 IP 10.35.170.87.39988 > 10.35.255.200.domain: 5211+ PTR? 167.234.194.118.in-addr.arpa. (46)
14 18:20:09.220012 IP 10.35.255.200.domain > 10.35.170.87.39988: 5211 ServFail 0/0/0 (46)
15 18:20:09.220288 IP 10.35.170.87.47297 > 10.35.255.200.domain: 5211+ PTR? 167.234.194.118.in-addr.arpa. (46)
16 18:20:09.220560 IP 10.35.255.200.domain > 10.35.170.87.47297: 5211 ServFail 0/0/0 (46)

  1.抓包分析發現DNS會進行反解
  2.發現該反解地址如果不存在,會導致servfail
  3.考慮設定servfail的快取時長

原因

  • ping的時候會進行dns解析,然後dns會對該地址進行反解,而該地址沒有PTR記錄,導致servFail,超時時間有點長,需要servFail的快取

解決方案

  設定bind引數:servfail-ttl 30 # 設定servfail的快取時長