記一次線上bug:crontab 被意外清空

2023-07-04 12:00:32

記一次線上bug:crontab 被意外清空

問題概述

同事反饋,某臺伺服器的crontab 被清空了.

看了cron執行紀錄檔來看,問題出在這裡:

Jul  3 10:01:24 10-10-65-235 crontab[19333]: (root) REPLACE (root)

這一時刻被替換掉了,後續就沒有任務執行了。

然後緊急恢復備份,並從紀錄檔中檢查遺漏項。

問題排查

定位出問題後,就從history中找執行的操作。

可從history中沒有看到 crontab -r 的操作被執行。

後在Google搜尋問題找到一些類似的情況,是由於遠端登入、加空格之類的導致的(https://cloud.tencent.com/developer/article/2222953)。

我看了後,就發現這一定是我的鍋了,並且受影響的還有其他兩臺裝置。

問題復現

我的操作是從遠端伺服器通過 ssh 直接執行的命令,卡住之後 ctrl+C中斷了操作,於是任務被清空。

復現流程:

  • 建立任務(保證任務列表中有內容,不為空)。

    • ╭─ ~/cmd ▓▒░·······························································░▒▓ ✔  10:29:52 ─╮
      ╰─ crontab -l                                                                              ─╯
      # Minute   Hour   Day of Month       Month          Day of Week        who      Command
      # (0-59)  (0-23)     (1-31)    (1-12 or Jan-Dec)  (0-6 or Sun-Sat)
      
      20 17 * * 1-5 open "https://tengyun.qianxin-inc.cn/home/workspace/worklog"
      
      
      ╭─ ~/cmd ▓▒░·······························································░▒▓ ✔  10:29:54 ─╮
      ╰─                                                                                         ─╯
      
  • 使用ssh命令執行能卡住的命令,如crontabcrontab -

    • ╭─ ~/cmd ▓▒░·······························································░▒▓ ✔  10:29:54 ─╮
      ╰─ ssh [email protected] crontab                                                           ─╯
      
      
  • 命令卡住,此刻檢視 crontab -l 還是有的。

    • ╭─ ~/cmd ▓▒░·······························································░▒▓ ✔  10:30:59 ─╮
      ╰─ crontab -l                                                                              ─╯
      # Minute   Hour   Day of Month       Month          Day of Week        who      Command
      # (0-59)  (0-23)     (1-31)    (1-12 or Jan-Dec)  (0-6 or Sun-Sat)
      
      20 17 * * 1-5 open "https://tengyun.qianxin-inc.cn/home/workspace/worklog"
      
      ╭─ ~/cmd ▓▒░·······························································░▒▓ ✔  10:31:02 ─╮
      ╰─                                                                                         ─╯
      
      
  • 使用ctrl+C結束掉後檢查任務列表。

    • ╭─ ~/cmd ▓▒░·······························································░▒▓ ✔  10:29:54 ─╮
      ╰─ ssh [email protected] crontab                                                           ─╯
      ^C%                                                                                           
      ╭─ ~/cmd ▓▒░··························································░▒▓ ✔  55s  10:31:30 ─╮
      ╰─ crontab -l                                                                              ─╯
      
      ╭─ ~/cmd ▓▒░·······························································░▒▓ ✔  10:31:33 ─╮
      ╰─                                                                                         ─╯
      

問題復現成功。

其他測試

  • 後又分別在本機直接測試了這些指令,均沒有清空任務列表。只有在ssh操作時,會被清空。
  • ssh 操作中 ssh XXX@XXX bash -c "crontab -l" 該命令也會被卡住。

總結

產生的操作,弄明白了,但是原理還是沒搞明白。

避坑方法:

  • 定時備份。
  • 在卡住時,不要中斷,先備份,在停止掉。

又是一次難忘的經歷。