Attempted to get executor loss reason for executor id 17 at RPC address 192.168.48.172:59070, but got no response. Marking as slave lost.
java.io.IOException: Failed to send RPC 9102760012410878153 to /192.168.48.172:59047: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) ~[spark-network-common_2.11-2.2.0.jar:2.2.0]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
driver端顯示紀錄檔內容為RPC通訊錯誤,從而認為心跳超時,執行器被yarn殺掉,該問題有兩種解決思路
driver所在伺服器與executor所在伺服器之間的時間相差較多,相差1分鐘以上就應該及時修改時間了,究其根本原因也很簡單,兩臺伺服器時間相差過大,造成本來就1ms內完成的通訊,由於兩個java程序計算的時間戳不同,造成driver認為響應超時,目前看大部分文章給的解決方式都是第一種,直接加executor記憶體,未必能解決問題,我們大部分叢集都做了時鐘同步,為什麼還會造成時間相差很大呢,此時需要檢視伺服器是否開啟了chronyd,如果你使用的是ntp,chronyd會對ntp有干擾,可以關閉chronyd
關閉chronyd方法
systemctl disable chronyd
systemctl stop chronyd
systemctl enable ntpd
systemctl start ntpd