聊聊 RocketMQ 訊息軌跡

2023-11-03 21:03:05

這篇文章,我們聊一聊 RocketMQ 的訊息軌跡設計思路。

查詢訊息軌跡可作為生產環境中排查問題強有力的資料支援 ,也是研發同學解決線上問題的重要武器之一。

1 基礎概念

訊息軌跡是指一條訊息從生產者傳送到 Broker , 再到消費者消費,整個過程中的各個相關節點的時間、狀態等資料匯聚而成的完整鏈路資訊。

當我們需要查詢訊息軌跡時,需要明白一點:訊息軌跡資料是儲存在 Broker 伺服器端,我們需要定義一個主題,在生產者,消費者端定義軌跡勾點

2 開啟軌跡

2.1 修改 Broker 組態檔

# 開啟訊息軌跡
traceTopicEnable=true

2.2 生產者設定

public DefaultMQProducer(final String producerGroup, boolean enableMsgTrace) 

public DefaultMQProducer(final String producerGroup, boolean enableMsgTrace, final String customizedTraceTopic) 

在生產者的建構函式裡,有兩個核心引數:

  • enableMsgTrace:是否開啟訊息軌跡
  • customizedTraceTopic:記錄訊息軌跡的 Topic , 預設是: RMQ_SYS_TRACE_TOPIC

執行如下的生產者程式碼:

public class Producer {
    public static final String PRODUCER_GROUP = "mytestGroup";
    public static final String DEFAULT_NAMESRVADDR = "127.0.0.1:9876";
    public static final String TOPIC = "example";
    public static final String TAG = "TagA";

    public static void main(String[] args) throws MQClientException, InterruptedException {
        DefaultMQProducer producer = new DefaultMQProducer(PRODUCER_GROUP, true);
        producer.setNamesrvAddr(DEFAULT_NAMESRVADDR);
        producer.start();
        try {
            String key = UUID.randomUUID().toString();
            System.out.println(key);
            Message msg = new Message(
                    TOPIC,
                    TAG,
                    key,
                    ("Hello RocketMQ ").getBytes(RemotingHelper.DEFAULT_CHARSET));
            SendResult sendResult = producer.send(msg);
            System.out.printf("%s%n", sendResult);
        } catch (Exception e) {
            e.printStackTrace();
        }
        // 這裡休眠十秒,是為了非同步傳送軌跡訊息成功。
        Thread.sleep(10000);
        producer.shutdown();
    }
}

在生產者程式碼中,我們指定了訊息的 key 屬性, 便於對於訊息進行高效能檢索。

執行成功之後,我們從控制檯檢視軌跡資訊。

從圖中可以看到,訊息軌跡中儲存了訊息的 儲存時間 儲存伺服器IP傳送耗時

2.3 消費者設定

和生產者類似,消費者的建構函式可以傳遞軌跡引數:

public DefaultMQPushConsumer(final String consumerGroup, boolean enableMsgTrace);

public DefaultMQPushConsumer(final String consumerGroup, boolean enableMsgTrace, final String customizedTraceTopic);

執行如下的消費者程式碼:

public class Consumer {
    public static final String CONSUMER_GROUP = "exampleGruop";
    public static final String DEFAULT_NAMESRVADDR = "127.0.0.1:9876";
    public static final String TOPIC = "example";

    public static void main(String[] args) throws InterruptedException, MQClientException {
        DefaultMQPushConsumer consumer = new DefaultMQPushConsumer(CONSUMER_GROUP , true);
        consumer.setNamesrvAddr(DEFAULT_NAMESRVADDR);
        consumer.setConsumeFromWhere(ConsumeFromWhere.CONSUME_FROM_FIRST_OFFSET);
        consumer.subscribe(TOPIC, "*");
        consumer.registerMessageListener((MessageListenerConcurrently) (msg, context) -> {
            System.out.printf("%s Receive New Messages: %s %n", Thread.currentThread().getName(), msg);
            return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
        });
        consumer.start();
        System.out.printf("Consumer Started.%n");
    }
}

3 實現原理

軌跡的實現原理主要是在生產者傳送、消費者消費時新增相關的勾點。 因此,我們只需要瞭解勾點的實現邏輯即可。

下面的程式碼是 DefaultMQProducer 的建構函式。

public DefaultMQProducer(final String namespace, final String producerGroup, RPCHook rpcHook,
    boolean enableMsgTrace, final String customizedTraceTopic) {
    this.namespace = namespace;
    this.producerGroup = producerGroup;
    defaultMQProducerImpl = new DefaultMQProducerImpl(this, rpcHook);
    // if client open the message trace feature
    if (enableMsgTrace) {
        try {
            //非同步軌跡分發器
            AsyncTraceDispatcher dispatcher = new AsyncTraceDispatcher(producerGroup, TraceDispatcher.Type.PRODUCE, customizedTraceTopic, rpcHook);
            dispatcher.setHostProducer(this.defaultMQProducerImpl);
            traceDispatcher = dispatcher;
            // 傳送訊息時新增執行勾點
            this.defaultMQProducerImpl.registerSendMessageHook(
                new SendMessageTraceHookImpl(traceDispatcher));
            // 結束事務時新增執行勾點
            this.defaultMQProducerImpl.registerEndTransactionHook(
                new EndTransactionTraceHookImpl(traceDispatcher));
        } catch (Throwable e) {
            log.error("system mqtrace hook init failed ,maybe can't send msg trace data");
        }
    }
}

當是否開啟軌跡開關開啟時,建立非同步軌跡分發器 AsyncTraceDispatcher ,然後給預設的生產者實現類在傳送訊息的勾點 SendMessageTraceHookImpl

//傳送訊息時新增執行勾點
this.defaultMQProducerImpl.registerSendMessageHook(new SendMessageTraceHookImpl(traceDispatcher));

我們把生產者傳送訊息的流程簡化如下程式碼 :

//DefaultMQProducerImpl#sendKernelImpl
this.executeSendMessageHookBefore(context);
// 發生訊息
this.mQClientFactory.getMQClientAPIImpl().sendMessage(....)
// 生產者傳送訊息後會執行
this.executeSendMessageHookAfter(context);

進入SendMessageTraceHookImpl 類 ,該類主要有兩個方法 sendMessageBefore sendMessageAfter

1、sendMessageBefore 方法

public void sendMessageBefore(SendMessageContext context) {
    //if it is message trace data,then it doesn't recorded
    if (context == null || context.getMessage().getTopic().startsWith(((AsyncTraceDispatcher)  localDispatcher).getTraceTopicName())) {
        return;
    }
    //build the context content of TuxeTraceContext
    TraceContext tuxeContext = new TraceContext();
    tuxeContext.setTraceBeans(new ArrayList<TraceBean>(1));
    context.setMqTraceContext(tuxeContext);
    tuxeContext.setTraceType(TraceType.Pub);
    tuxeContext.setGroupName(NamespaceUtil.withoutNamespace(context.getProducerGroup()));
    //build the data bean object of message trace
    TraceBean traceBean = new TraceBean();
    traceBean.setTopic(NamespaceUtil.withoutNamespace(context.getMessage().getTopic()));
    traceBean.setTags(context.getMessage().getTags());
    traceBean.setKeys(context.getMessage().getKeys());
    traceBean.setStoreHost(context.getBrokerAddr());
    traceBean.setBodyLength(context.getMessage().getBody().length);
    traceBean.setMsgType(context.getMsgType());
    tuxeContext.getTraceBeans().add(traceBean);
}

傳送訊息之前,先收集訊息的 topic 、tag、key 、儲存 Broker 的 IP 地址、訊息體的長度等基礎資訊,並將訊息軌跡資料儲存在呼叫上下文中。

2、sendMessageAfter 方法

public void sendMessageAfter(SendMessageContext context) {
    // ...省略部分程式碼 
    TraceContext tuxeContext = (TraceContext) context.getMqTraceContext();
    TraceBean traceBean = tuxeContext.getTraceBeans().get(0);
    int costTime = (int) ((System.currentTimeMillis() - tuxeContext.getTimeStamp()) / tuxeContext.getTraceBeans().size());
    tuxeContext.setCostTime(costTime);
    if (context.getSendResult().getSendStatus().equals(SendStatus.SEND_OK)) {
        tuxeContext.setSuccess(true);
    } else {
        tuxeContext.setSuccess(false);
    }
    tuxeContext.setRegionId(context.getSendResult().getRegionId());
    traceBean.setMsgId(context.getSendResult().getMsgId());
    traceBean.setOffsetMsgId(context.getSendResult().getOffsetMsgId());
    traceBean.setStoreTime(tuxeContext.getTimeStamp() + costTime / 2);
    localDispatcher.append(tuxeContext);
}

跟蹤物件裡會儲存 costTime (訊息傳送時間)、success (是否傳送成功)、regionId (傳送到 Broker 所在的分割區) 、 msgId (訊息 ID,全域性唯一)、offsetMsgId (訊息物理偏移量) ,storeTime (儲存時間 ) 。

儲存時間並沒有取訊息的實際儲存時間,而是估算出來的:使用者端傳送時間的一般的耗時表示訊息的儲存時間。

最後將跟蹤上下文新增到本地軌跡分發器:

localDispatcher.append(tuxeContext);

下面我們分析下軌跡分發器的原理:

public AsyncTraceDispatcher(String group, Type type, String traceTopicName, RPCHook rpcHook) {
    // 省略程式碼 ....   
    this.traceContextQueue = new ArrayBlockingQueue<TraceContext>(1024);
    this.appenderQueue = new ArrayBlockingQueue<Runnable>(queueSize);
    if (!UtilAll.isBlank(traceTopicName)) {
        this.traceTopicName = traceTopicName;
    } else {
        this.traceTopicName = TopicValidator.RMQ_SYS_TRACE_TOPIC;
    }
    this.traceExecutor = new ThreadPoolExecutor(//
            10, 
            20, 
            1000 * 60, 
            TimeUnit.MILLISECONDS, 
            this.appenderQueue, 
            new ThreadFactoryImpl("MQTraceSendThread_"));
    traceProducer = getAndCreateTraceProducer(rpcHook);
}
public void start(String nameSrvAddr, AccessChannel accessChannel) throws MQClientException {
        if (isStarted.compareAndSet(false, true)) {
            traceProducer.setNamesrvAddr(nameSrvAddr);
            traceProducer.setInstanceName(TRACE_INSTANCE_NAME + "_" + nameSrvAddr);
            traceProducer.start();
        }
        this.accessChannel = accessChannel;
        this.worker = new Thread(new AsyncRunnable(), "MQ-AsyncTraceDispatcher-Thread-" + dispatcherId);
        this.worker.setDaemon(true);
        this.worker.start();
        this.registerShutDownHook();
}

上面的程式碼展示了分發器的建構函式和啟動方法,建構函式建立了一個傳送訊息的執行緒池 traceExecutor ,啟動 start 後會啟動一個 worker執行緒

class AsyncRunnable implements Runnable {
    private boolean stopped;
    @Override
    public void run() {
        while (!stopped) {
            synchronized (traceContextQueue) {
                long endTime = System.currentTimeMillis() + pollingTimeMil;
                while (System.currentTimeMillis() < endTime) {
                    try {
                        TraceContext traceContext = traceContextQueue.poll(
                                endTime - System.currentTimeMillis(), TimeUnit.MILLISECONDS
                        );
                        if (traceContext != null && !traceContext.getTraceBeans().isEmpty()) {
                            // get the topic which the trace message will send to
                            String traceTopicName = this.getTraceTopicName(traceContext.getRegionId());

                            // get the traceDataSegment which will save this trace message, create if null
                            TraceDataSegment traceDataSegment = taskQueueByTopic.get(traceTopicName);
                            if (traceDataSegment == null) {
                                traceDataSegment = new TraceDataSegment(traceTopicName, traceContext.getRegionId());
                                taskQueueByTopic.put(traceTopicName, traceDataSegment);
                            }

                            // encode traceContext and save it into traceDataSegment
                            // NOTE if data size in traceDataSegment more than maxMsgSize,
                            //  a AsyncDataSendTask will be created and submitted
                            TraceTransferBean traceTransferBean = TraceDataEncoder.encoderFromContextBean(traceContext);
                            traceDataSegment.addTraceTransferBean(traceTransferBean);
                        }
                    } catch (InterruptedException ignore) {
                        log.debug("traceContextQueue#poll exception");
                    }
                }
                // NOTE send the data in traceDataSegment which the first TraceTransferBean
                //  is longer than waitTimeThreshold
                sendDataByTimeThreshold();
                if (AsyncTraceDispatcher.this.stopped) {
                    this.stopped = true;
                }
            }
        }
  }

worker 啟動後,會從軌跡上下文佇列 traceContextQueue 中不斷的取出軌跡上下文,並將上下文轉換成軌跡資料片段 TraceDataSegment

為了提升系統的效能,並不是每一次從佇列中獲取到資料就直接傳送到 MQ ,而是積累到一定程度的臨界點才觸發這個操作,我們可以簡單的理解為批次操作

這裡面有兩個維度 :

  1. 軌跡資料片段的資料大小大於某個資料大小閾值。筆者認為這段 RocketMQ 4.9.4 版本程式碼存疑,因為最新的 5.0 版本做了優化。

    if (currentMsgSize >= traceProducer.getMaxMessageSize()) {
        List<TraceTransferBean> dataToSend = new ArrayList(traceTransferBeanList);
        AsyncDataSendTask asyncDataSendTask = new AsyncDataSendTask(traceTopicName, regionId, dataToSend);
        traceExecutor.submit(asyncDataSendTask);
        this.clear();
    }
    
  2. 當前時間 - 軌跡資料片段的首次儲存時間 是否大於重新整理時間 ,也就是每500毫秒重新整理一次。

    private void sendDataByTimeThreshold() {
        long now = System.currentTimeMillis();
        for (TraceDataSegment taskInfo : taskQueueByTopic.values()) {
            if (now - taskInfo.firstBeanAddTime >= waitTimeThresholdMil) {
                taskInfo.sendAllData();
            }
        }
    }
    

軌跡資料儲存的格式如下:

TraceBean bean = ctx.getTraceBeans().get(0);
//append the content of context and traceBean to transferBean's TransData
case Pub: {
  sb.append(ctx.getTraceType()).append(TraceConstants.CONTENT_SPLITOR)
    .append(ctx.getTimeStamp()).append(TraceConstants.CONTENT_SPLITOR)
    .append(ctx.getRegionId()).append(TraceConstants.CONTENT_SPLITOR)
    .append(ctx.getGroupName()).append(TraceConstants.CONTENT_SPLITOR)
    .append(bean.getTopic()).append(TraceConstants.CONTENT_SPLITOR)
    .append(bean.getMsgId()).append(TraceConstants.CONTENT_SPLITOR)
    .append(bean.getTags()).append(TraceConstants.CONTENT_SPLITOR)
    .append(bean.getKeys()).append(TraceConstants.CONTENT_SPLITOR)
    .append(bean.getStoreHost()).append(TraceConstants.CONTENT_SPLITOR)
    .append(bean.getBodyLength()).append(TraceConstants.CONTENT_SPLITOR)
    .append(ctx.getCostTime()).append(TraceConstants.CONTENT_SPLITOR)
    .append(bean.getMsgType().ordinal()).append(TraceConstants.CONTENT_SPLITOR)
    .append(bean.getOffsetMsgId()).append(TraceConstants.CONTENT_SPLITOR)
    .append(ctx.isSuccess()).append(TraceConstants.FIELD_SPLITOR);
}
break;

下圖展示了事務軌跡訊息資料,每個資料欄位是按照 CONTENT_SPLITOR 分隔。

注意:

分隔符 CONTENT_SPLITOR = (char) 1 它在記憶體中的值是:00000001 , 但是 char i = '1' 它在記憶體中的值是 49 ,即 00110001。


參考資料:

阿里雲檔案:

https://help.aliyun.com/zh/apsaramq-for-rocketmq/cloud-message-queue-rocketmq-4-x-series/user-guide/query-a-message-trace

石臻臻:

https://mp.weixin.qq.com/s/saYD3mG9F1z-oAU6STxewQ