gRPC(Java) keepAlive機制研究

2022-11-18 15:00:18

基於java gRPC 1.24.2 分析

結論

  1. gRPC keepAlive是grpc框架在應用層面連線保活的一種措施。即當grpc連線上沒有業務資料時,是否傳送pingpong,以保持連線活躍性,不因長時間空閒而被Server或作業系統關閉
  2. gRPC keepAlive在client與server都有,client端預設關閉(keepAliveTime為Long.MAX_VALUE), server端預設開啟,keepAliveTime為2小時,即每2小時向client傳送一次ping
// io.grpc.internal.GrpcUtil
public static final long DEFAULT_SERVER_KEEPALIVE_TIME_NANOS = TimeUnit.HOURS.toNanos(2L);
  1. KeepAlive的管理使用類io.grpc.internal.KeepAliveManager, 用於管理KeepAlive狀態,ping任務排程與執行.

Client端KeepAlive

使用入口

  1. 我們在使用io.grpc框架建立grpc連線的時候,可以設定keeplive, 例如下面:
NettyChannelBuilder builder = NettyChannelBuilder.forTarget(String.format("grpc://%s", provider)) //
      .usePlaintext() //
      .defaultLoadBalancingPolicy(props.getBalancePolicy()) //
      .maxInboundMessageSize(props.getMaxInboundMessageSize()) //
      .keepAliveTime(1,TimeUnit.MINUTES)
      .keepAliveWithoutCalls(true)
      .keepAliveTimeout(10,TimeUnit.SECONDS)
      .intercept(channelManager.getInterceptors()); //
  1. 其中與keepAlive相關的引數有三個,keepAliveTime,keepAliveTimeout,keepAliveWithoutCalls。這三個變數有什麼作用呢?
  • keepAliveTime: 表示當grpc連線沒有資料傳遞時,多久之後開始向server傳送ping packet
  • keepAliveTimeout: 表示當傳送完ping packet後多久沒收到server迴應算超時
  • keepAliveTimeoutCalls: 表示如果grpc連線沒有資料傳遞時,是否keepAlive,預設為false

簡要時序列表

Create & Start

NettyChannelBuilder
   -----> NettyTransportFactory
   ---------> NettyClientTransport
   -------------> KeepAliveManager & NettyClientHandler

響應各種事件
當Active、Idle、DataReceived、Started、Termination事件發生時,更改KeepAlive狀態,排程傳送ping任務。

Server端KeepAlive

使用入口

// 只擷取關鍵程式碼,詳細程式碼請看`NettyServerBuilder`
ServerImpl server = new ServerImpl(
    this,
    buildTransportServers(getTracerFactories()),
    Context.ROOT);
for (InternalNotifyOnServerBuild notifyTarget : notifyOnBuildList) {
  notifyTarget.notifyOnBuild(server);
}
return server;

// 在buildTransportServers方法中建立NettyServer
List<NettyServer> transportServers = new ArrayList<>(listenAddresses.size());
for (SocketAddress listenAddress : listenAddresses) {
  NettyServer transportServer = new NettyServer(
      listenAddress, resolvedChannelType, channelOptions, bossEventLoopGroupPool,
      workerEventLoopGroupPool, negotiator, streamTracerFactories,
      getTransportTracerFactory(), maxConcurrentCallsPerConnection, flowControlWindow,
      maxMessageSize, maxHeaderListSize, keepAliveTimeInNanos, keepAliveTimeoutInNanos,
      maxConnectionIdleInNanos, maxConnectionAgeInNanos, maxConnectionAgeGraceInNanos,
      permitKeepAliveWithoutCalls, permitKeepAliveTimeInNanos, getChannelz());
  transportServers.add(transportServer);
}

簡要時序列表

Create & Start

NettyServerBuilder
    ---> NettyServer
    ---------> NettyServerTransport
    -------------> NettyServerHandler
    -----------------> KeepAliveEnforcer

連線準備就緒
呼叫 io.netty.channel.ChannelHandler的handlerAdded方法,關於此方法的描述:

Gets called after the ChannelHandler was added to the actual context and it's ready to handle events.
NettyServerHandler(handlerAdded)
   ---> 建立KeepAliveManager物件

響應各種事件
同Client

KeepAliveEnforcer

在上面Server端的簡要時序圖中,可以看見,server端有一個特有的io.grpc.netty.KeepAliveEnforcer
此類的作用是監控clinet ping的頻率,以確保其在一個合理範圍內。

package io.grpc.netty;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Preconditions;
import java.util.concurrent.TimeUnit;
import javax.annotation.CheckReturnValue;

/** Monitors the client's PING usage to make sure the rate is permitted. */
class KeepAliveEnforcer {
  @VisibleForTesting
  static final int MAX_PING_STRIKES = 2;
  @VisibleForTesting
  static final long IMPLICIT_PERMIT_TIME_NANOS = TimeUnit.HOURS.toNanos(2);

  private final boolean permitWithoutCalls;
  private final long minTimeNanos;
  private final Ticker ticker;
  private final long epoch;

  private long lastValidPingTime;
  private boolean hasOutstandingCalls;
  private int pingStrikes;

  public KeepAliveEnforcer(boolean permitWithoutCalls, long minTime, TimeUnit unit) {
    this(permitWithoutCalls, minTime, unit, SystemTicker.INSTANCE);
  }

  @VisibleForTesting
  KeepAliveEnforcer(boolean permitWithoutCalls, long minTime, TimeUnit unit, Ticker ticker) {
    Preconditions.checkArgument(minTime >= 0, "minTime must be non-negative");

    this.permitWithoutCalls = permitWithoutCalls;
    this.minTimeNanos = Math.min(unit.toNanos(minTime), IMPLICIT_PERMIT_TIME_NANOS);
    this.ticker = ticker;
    this.epoch = ticker.nanoTime();
    lastValidPingTime = epoch;
  }

  /** Returns {@code false} when client is misbehaving and should be disconnected. */
  @CheckReturnValue
  public boolean pingAcceptable() {
    long now = ticker.nanoTime();
    boolean valid;
    if (!hasOutstandingCalls && !permitWithoutCalls) {
      valid = compareNanos(lastValidPingTime + IMPLICIT_PERMIT_TIME_NANOS, now) <= 0;
    } else {
      valid = compareNanos(lastValidPingTime + minTimeNanos, now) <= 0;
    }
    if (!valid) {
      pingStrikes++;
      return !(pingStrikes > MAX_PING_STRIKES);
    } else {
      lastValidPingTime = now;
      return true;
    }
  }

  /**
   * Reset any counters because PINGs are allowed in response to something sent. Typically called
   * when sending HEADERS and DATA frames.
   */
  public void resetCounters() {
    lastValidPingTime = epoch;
    pingStrikes = 0;
  }

  /** There are outstanding RPCs on the transport. */
  public void onTransportActive() {
    hasOutstandingCalls = true;
  }

  /** There are no outstanding RPCs on the transport. */
  public void onTransportIdle() {
    hasOutstandingCalls = false;
  }

  /**
   * Positive when time1 is greater; negative when time2 is greater; 0 when equal. It is important
   * to use something like this instead of directly comparing nano times. See {@link
   * System#nanoTime}.
   */
  private static long compareNanos(long time1, long time2) {
    // Possibility of overflow/underflow is on purpose and necessary for correctness
    return time1 - time2;
  }

  @VisibleForTesting
  interface Ticker {
    long nanoTime();
  }

  @VisibleForTesting
  static class SystemTicker implements Ticker {
    public static final SystemTicker INSTANCE = new SystemTicker();

    @Override
    public long nanoTime() {
      return System.nanoTime();
    }
  }
}

  1. 先來看pingAcceptable方法,此方法是判斷是否接受client ping。
  • lastValidPingTime是上次client valid ping的時間, 連線建立時此時間等於KeepAliveEnforcer物件建立的時間。當client ping有效時,其等於當時ping的時間
  • hasOutstandingCalls其初始值為false,當連線activie時,其值為true,當連線idle時,其值為false。如果grpc呼叫為阻塞時呼叫,則呼叫時連線變為active,呼叫完成,連線變為idle.
  • permitWithoutCalls其值是建立NettyServer時傳入,預設為false.
  • IMPLICIT_PERMIT_TIME_NANOS其值為常數,2h
  • minTimeNanos其值是建立NettyServer時傳入,預設為5min.
  • MAX_PING_STRIKES其值為常數2
  1. resetCounters方法是當grpc當中有資料時會被呼叫,即有grpc呼叫時lastValidPingTime和pingStrikes會被重置。
  2. 如果client要想使用keepAlive,permitWithoutCalls值需要設定為true,而且cient keepAliveTime需要>=minTimeNanos