trino on yarn

2023-08-29 15:17:39

一、前言

     最近在研究trino on yarn 功能,網上大部分都是關於presto on yarn文章,關於trino on yarn 資料很少,但是本質上差不多,需要修改一些內容,主要在偵錯方面這個slider不是很方便,分享下實踐過程。

  如果Trino叢集沒有彈性擴縮容需求或者已經有很成熟的K8S容器部署方案你可以忽略這個功能,最後實現效果就是通過slider自動部署以及調整trino node節點的數量實現快速的擴容縮容,查詢本身的消耗資源跟yarn沒有很大關係,還是跟設定的trino的叢集資源有關。在叢集資源緊張的情況下,合理調節不同時段資源的分配,比如夜裡查詢請求很少的情況下,可以釋放一部分node節點給Flink Spark去做計算還是很實用的。

二、環境準備

        編譯apache-slider-0.92.0-incubating

    1. 下載地址:https://archive.apache.org/dist/incubator/slider/
    2. 修改PythonExecutor.py,不然執行Python失敗,參考: https://issues.apache.org/jira/browse/SLIDER-1254

  def python_command(self, script, script_params):
    #we need manually pass python executable on windows because sys.executable will return service wrapper
    python_binary = os.environ['PYTHON_EXE'] if 'PYTHON_EXE' in os.environ else sys.executable
    python_command = [python_binary, "-S", script] + script_params

    #if Python binary location is not found then fall back to generic Python path
    if not python_binary:
      logger.warn("Python binary not found in this environment. Using /usr/bin/python")
      python_binary = "/usr/bin/python"
      python_command = [python_binary, script] + script_params
    return python_command

  

    編譯trino-yarn

      1.GitHub地址:https://github.com/prestodb/presto-yarn.git

      2.修改根目錄pom檔案  

     3.修改presto-yarn-package pom檔案依賴

 

 

三、部署安裝

   1.編譯好了以後把2個檔案拷貝到伺服器上,設定trino-yarn-package appConfig-default.json,resources-default.json,熟悉trino、presto的應該都比較熟悉,附上我的設定參考:

{
  "schema": "http://example.org/specification/v2.0.0",
  "metadata": {
  },
  "global": {
    "site.global.app_user": "presto",
    "site.global.user_group": "presto",
    "site.global.data_dir": "/data/trino/data",
    "site.global.config_dir": "/data/trino/etc",
    "site.global.app_name": "trino-server-418",
    "site.global.app_pkg_plugin": "${AGENT_WORK_ROOT}/app/definition/package/plugins/",
    "site.global.singlenode": "true",
    "site.global.coordinator_host": "192.168.2.182",
    "site.global.presto_query_max_memory": "27GB",
    "site.global.presto_query_max_memory_per_node": "4GB",
    "site.global.presto_query_max_total_memory_per_node":  "9GB",
    "site.global.presto_server_port": "8089","site.global.catalog": "{'tpch': ['connector.name=system']}",
    "site.global.jvm_args": "['-server', '-Xmx50G', '-XX:InitialRAMPercentage=80', '-XX:MaxRAMPercentage=80', '-XX:G1HeapRegionSize=32M', '-XX:+ExplicitGCInvokesConcurrent', '-XX:+ExitOnOutOfMemoryError', '-XX:+HeapDumpOnOutOfMemoryError', '-XX:-OmitStackTraceInFastThrow', '-XX:ReservedCodeCacheSize=512M', '-XX:PerMethodRecompilationCutoff=10000', '-XX:PerBytecodeRecompilationCutoff=10000', '-Djdk.attach.allowAttachSelf=true', '-Djdk.nio.maxCachedBufferSize=2000000', '-XX:+UnlockDiagnosticVMOptions', '-XX:+UseAESCTRIntrinsics', '-XX:+UseG1GC']",
          
    "site.global.log_properties": "['io.trino=INFO']",
    "application.def": ".slider/package/trino/trino-yarn.zip",
    "java_home": "/home/presto/presto/zulu17.42.21-ca-crac-jdk17.0.7-linux_x64/bin/java"
  },    
  "components": {
    "slider-appmaster": {
      "jvm.heapsize": "128M"
    }
  } 
}  
{
  "schema": "http://example.org/specification/v2.0.0",
  "metadata": {
  },
  "global": {
    "yarn.vcores": "1"
  },
  "components": {
    "slider-appmaster": {
    },
    "WORKER": {
      "yarn.role.priority": "2",
      "yarn.component.instances": "3",
      "yarn.component.placement.policy": "1",
      "yarn.memory": "1500"
    }
  }
}

      詳細參考:https://prestodb.io/presto-yarn/installation-yarn-configuration-options.html#appconfig-json

   2.啟動slider

  ../bin/slider package --install --name trino --package trino-yarn.zip --replacepkg
../bin/slider create presto-query --template appConfig-default.json --resources resources-default.json

          成功效果圖

 

        詳細的可以參照這個部落格,非常的詳盡:PrestoOnYarn搭建及其問題解決方案總結_presto on yarn_qq_2368521029的部落格-CSDN部落格,(我主要寫我偵錯的內容,這方面的內容比較少)

 

四、偵錯排錯

   部署到Yarn 裡面後會遇到很多的問題,但是怎麼偵錯這個還是稍微有點麻煩,我給出我的偵錯方法給大家一個參考。

   其實程式本身就是通過動態的分發Presto-yarn包裡的trino-server檔案以及自動生成trino的組態檔,slider是一個通用執行命令的框架。通過紀錄檔我們可以看到實際的工作目錄,以及具體的執行python指令碼命令。

 

               註釋掉slider 這部執行語句,讓程式空跑,指令碼實際並沒有執行。

 

    完成1個Work節點的部署就三步,INSTALL--->START---->STATUS

    根據具體列印出來的命令手動切換 AGENT_WORK_ROOT 目錄,然後手動執行指令碼,就能按照實際的報錯進行偵錯,具體引數就是紀錄檔裡面列印出來的拷貝,給出範例:

 

[root@gpmaster scripts]# export PYTHONPATH=/opt/softinstall/hadoop-3.2.3/data/tmp/nm-local-dir/usercache/root/appcache/application_1692453841261_0117/filecache/10/slider-agent.tar.gz/slider-agent/jinja2:/opt/softinstall/hadoop-3.2.3/data/tmp/nm-local-dir/usercache/root/appcache/application_1692453841261_0117/filecache/10/slider-agent.tar.gz/slider-agent
[root@gpmaster scripts]# python presto_worker.py INSTALL /opt/softinstall/hadoop-3.2.3/logs/userlogs/application_1692453841261_0117/container_1692453841261_0117_01_000002/command-1.json /opt/softinstall/hadoop-3.2.3/data/tmp/nm-local-dir/usercache/root/appcache/application_1692453841261_0117/filecache/11/trino-yarn.zip/package  /opt/softinstall/hadoop-3.2.3/logs/userlogs/application_1692453841261_0117/container_1692453841261_0117_01_000002/structured-out-1.json INFO /opt/softinstall/hadoop-3.2.3/data/tmp/nm-local-dir/usercache/root/appcache/application_1692453841261_0117/container_1692453841261_0117_01_000002
2023-08-28 17:04:39,453 - Directory['/opt/softinstall/hadoop-3.2.3/data/tmp/nm-local-dir/usercache/root/appcache/application_1692453841261_0117/container_1692453841261_0117_01_000002/app/install'] {'action': ['delete']}
[root@gpmaster scripts]# python presto_worker.py START /opt/softinstall/hadoop-3.2.3/logs/userlogs/application_1692453841261_0117/container_1692453841261_0117_01_000002/command-1.json /opt/softinstall/hadoop-3.2.3/data/tmp/nm-local-dir/usercache/root/appcache/application_1692453841261_0117/filecache/11/trino-yarn.zip/package  /opt/softinstall/hadoop-3.2.3/logs/userlogs/application_1692453841261_0117/container_1692453841261_0117_01_000002/structured-out-1.json INFO /opt/softinstall/hadoop-3.2.3/data/tmp/nm-local-dir/usercache/root/appcache/application_1692453841261_0117/container_1692453841261_0117_01_000002
2023-08-28 17:04:39,453 - Directory['/opt/softinstall/hadoop-3.2.3/data/tmp/nm-local-dir/usercache/root/appcache/application_1692453841261_0117/container_1692453841261_0117_01_000002/app/install'] {'action': ['delete']}

 

 

五、1個機器多個節點衝突解決

        1.檔案衝突:把trino的組態檔 etc和data 目錄都生成到AGENT_WORK_ROOT下,這樣就能解決排程到同一臺機器上這2個檔案衝突的問題。

主要修改params.py

 

    2.埠衝突:加上隨機埠設定,註釋掉config.properties-WORKER.j2 模板裡面的http-server.http.port={{presto_work_port}},增加隨機埠的設定寫入。

     具體參考實現trino on yarn排程到同一機器上多範例埠衝突問題處理_qq_2368521029的部落格-CSDN部落格

 

六、非公版Trino-server打包部署

  我們對trino修改過一些功能,所以正常打包出來的檔案並不能適合我們的環境,需要把我們自己的Trino給打包進去。

    1.首先正常打包出來的包目錄trino-yarn/package/files下就是對應的 trino-server 檔案,把我們自己的Trino去掉etc和data目錄,打包替換成對應的包

    2.修改params.py,configure.py 以及config.properties-WORKER.j2 模板,對應生成自己需要的模板

    3.打包重新上傳到hdfs指定目錄 

      ../bin/slider package --install --name trino --package trino-yarn.zip --replacepkg

              4.指定JDK,目前我們是把JDK直接跟trino-server的目錄打包在一起,修改下啟動命令