效能問題

最近在跑flink社群1.15版本使用json_value函數時，發現其效能很差，通過jstack檢視堆疊經常在執行以下堆疊

可以看到這裡的邏輯是在等鎖，檢視jsonpath的LRUCache

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package org.apache.flink.table.shaded.com.jayway.jsonpath.spi.cache;

import java.util.Deque;
import java.util.LinkedList;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.locks.ReentrantLock;
import org.apache.flink.table.shaded.com.jayway.jsonpath.JsonPath;

public class LRUCache implements Cache {
    private final ReentrantLock lock = new ReentrantLock();
    private final Map<String, JsonPath> map = new ConcurrentHashMap();
    private final Deque<String> queue = new LinkedList();
    private final int limit;

    public LRUCache(int limit) {
        this.limit = limit;
    }

    public void put(String key, JsonPath value) {
        JsonPath oldValue = (JsonPath)this.map.put(key, value);
        if (oldValue != null) {
            this.removeThenAddKey(key);
        } else {
            this.addKey(key);
        }

        if (this.map.size() > this.limit) {
            this.map.remove(this.removeLast());
        }

    }

    public JsonPath get(String key) {
        JsonPath jsonPath = (JsonPath)this.map.get(key);
        if (jsonPath != null) {
            this.removeThenAddKey(key);
        }

        return jsonPath;
    }

    private void addKey(String key) {
        this.lock.lock();

        try {
            this.queue.addFirst(key);
        } finally {
            this.lock.unlock();
        }

    }

    private String removeLast() {
        this.lock.lock();

        String var2;
        try {
            String removedKey = (String)this.queue.removeLast();
            var2 = removedKey;
        } finally {
            this.lock.unlock();
        }

        return var2;
    }

    private void removeThenAddKey(String key) {
        this.lock.lock();

        try {
            this.queue.removeFirstOccurrence(key);
            this.queue.addFirst(key);
        } finally {
            this.lock.unlock();
        }

    }

    private void removeFirstOccurrence(String key) {
        this.lock.lock();

        try {
            this.queue.removeFirstOccurrence(key);
        } finally {
            this.lock.unlock();
        }

    }

    ...
}

可以看到get操作時，如果獲取到的是有值的，那麼會更新相應key的資料從雙端佇列移到首位，藉此來實現LRU的功能，但是這樣每次get和put操作都是需要加鎖的，因此並行情況下吞吐就會比較低，也會導致cpu使用效率較低。
從jsonpath社群檢視相應的問題，也有相關的反饋
https://github.com/json-path/JsonPath/issues/740
https://github.com/apache/pinot/pull/7409
比較方便的是，jsonpath 提供了spi的方式可以自定義的設定Cache的實現類，可以通過以下方式來設定新的cache實現。

static {
    CacheProvider.setCache(new JsonPathCache());
}

從pinot的實現中，我們看到他是用了guava的cache來替換了預設的LRUCache實現，那麼這樣實現效能優化有多少呢，這裡我們是用java的效能測試框架jmh來測試下效能提升的情況

效能測試

這裡為了方便，直接在flink-benchmark工程裡新增了兩個benchmark的測試類.
GuavaCache
LRUCache
這裡面需要注意，因為cache是程序級別共用的，所以我們需要將設定@State(Benchmark)級別，這樣我們構建的cache就是程序級別共用，而不是執行緒級別共用的。

寫的測試是4個執行緒執行，快取大小均為400
為了避免在本機執行時受本機的其他程式影響，最好是build jar之後放到伺服器上跑

java -jar target/benchmarks.jar -rf csv org.apache.flink.benchmark.GuavaCacheBenchmark

得到一個測試結果

Benchmark                 Mode  Cnt     Score     Error   Units
GuavaCacheBenchmark.get  thrpt   30  4480.563 ± 203.311  ops/ms
GuavaCacheBenchmark.put  thrpt   30  1774.769 ± 119.198  ops/ms

LRUCacheBenchmark.get  thrpt   30  441.239 ±  2.812  ops/ms
LRUCacheBenchmark.put  thrpt   30  350.549 ± 12.285  ops/ms

可以看到使用guava的cache後，get效能提升8倍左右，put效能提升5倍左右。
這塊效能提升的主要來源是cache的實現機制上，和caffeine 的作者在github上也簡單瞭解了下相關的推薦實現
後面會寫一篇文章來專門分析下caffeine cache的優化實現。

參考

https://github.com/ben-manes/caffeine/wiki/Benchmarks caffeine benchmark
https://github.com/ben-manes/caffeine/blob/master/caffeine/src/jmh/java/com/github/benmanes/caffeine/cache/GetPutBenchmark.java caffeine benchmark
https://www.jianshu.com/p/ad34c4c8a2a3 jmh 框架常見引數
http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/ jmh 常見用例

使用jmh框架進行benchmark測試

效能問題

效能測試

參考