elasticsearch之日期型別有點怪

2023-02-15 09:00:33

一、Date型別簡介

elasticsearch通過JSON格式來承載資料的,而JSON中是沒有Date對應的資料型別的,但是elasticsearch可以通過以下三種方式處理JSON承載的Date資料

  • 符合特定格式化的日期字串;
  • 基於milliseconds-since-the-epoch的一個長整型數位;
  • 基於seconds-since-the-epoch的一個長整型數位;

索引資料的時候,elasticsearch內部會基於UTC時間,將傳入的資料轉化為基於milliseconds-since-the-epoch的一個長整型數位;查詢資料的時候,elasticsearch內部會將查詢轉化為range查詢;

二、測試資料準備

建立mapping,設定create_date的type為date

PUT my_date_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "create_date": {
          "type": "date" 
        }
      }
    }
  }
}

索引以下三個document

PUT my_date_index/_doc/1
{ "create_date": "2015-01-01" } 

PUT my_date_index/_doc/2
{ "create_date": "2015-01-01T12:10:30Z" } 

PUT my_date_index/_doc/3
{ "create_date": 1420070400001 }

三、日期查詢的詭異之處

我們希望可以通過以下查詢命中2015-01-01的記錄

POST my_date_index/_search
{
  "query": {
    "term": {
      "create_date": "2015-01-01"
    }
  }
}

檢視執行結果發現命中了三條資料

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01T12:10:30Z"
        }
      },
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01"
        }
      },
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "create_date" : 1420070400001
        }
      }
    ]
  }
}

通過以下可以看到elasticsearch內部確實將查詢重寫為一個範圍查詢create_date:[1420070400000 TO 1420156799999]

POST my_date_index/_search
{
  "profile": "true", 
  "query": {
    "term": {
      "create_date": "2015-01-01"
    }
  }
}


  {
    "id" : "[eD2KQtMGSla7jzJQBQVAfQ][my_date_index][0]",
    "searches" : [
      {
        "query" : [
          {
            "type" : "IndexOrDocValuesQuery",
            "description" : "create_date:[1420070400000 TO 1420156799999]",
            "time_in_nanos" : 2101,
            "breakdown" : {
              "score" : 0,
              "build_scorer_count" : 0,
              "match_count" : 0,
              "create_weight" : 2100,
              "next_doc" : 0,
              "match" : 0,
              "create_weight_count" : 1,
              "next_doc_count" : 0,
              "score_count" : 0,
              "build_scorer" : 0,
              "advance" : 0,
              "advance_count" : 0
            }
          }
        ],
        "rewrite_time" : 2200,
        "collector" : [
          {
            "name" : "CancellableCollector",
            "reason" : "search_cancelled",
            "time_in_nanos" : 700,
            "children" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 200
              }
            ]
          }
        ]
      }
    ],
    "aggregations" : [ ]
  }

接下來我們來分析一下Date資料型別的term查詢

我們可以看到termQuery查詢直接呼叫了rangeQuery,並將傳入的日期引數作為range的兩個範圍值;

DateFieldType

@Override
public Query termQuery(Object value, @Nullable QueryShardContext context) {
    Query query = rangeQuery(value, value, true, true, ShapeRelation.INTERSECTS, null, null, context);
    if (boost() != 1f) {
        query = new BoostQuery(query, boost());
    }
    return query;
}

rangeQuery中會呼叫parseToMilliseconds計算查詢的兩個範圍值

DateFieldType

@Override
public Query rangeQuery(Object lowerTerm, Object upperTerm, boolean includeLower, boolean includeUpper, ShapeRelation relation,
                        @Nullable DateTimeZone timeZone, @Nullable DateMathParser forcedDateParser, QueryShardContext context) {
    failIfNotIndexed();
    if (relation == ShapeRelation.DISJOINT) {
        throw new IllegalArgumentException("Field [" + name() + "] of type [" + typeName() +
                "] does not support DISJOINT ranges");
    }
    DateMathParser parser = forcedDateParser == null
            ? dateMathParser
            : forcedDateParser;
    long l, u;
    if (lowerTerm == null) {
        l = Long.MIN_VALUE;
    } else {
        l = parseToMilliseconds(lowerTerm, !includeLower, timeZone, parser, context);
        if (includeLower == false) {
            ++l;
        }
    }
    if (upperTerm == null) {
        u = Long.MAX_VALUE;
    } else {
        u = parseToMilliseconds(upperTerm, includeUpper, timeZone, parser, context);
        if (includeUpper == false) {
            --u;
        }
    }
    Query query = LongPoint.newRangeQuery(name(), l, u);
    if (hasDocValues()) {
        Query dvQuery = SortedNumericDocValuesField.newSlowRangeQuery(name(), l, u);
        query = new IndexOrDocValuesQuery(query, dvQuery);
    }
    return query;
}

通過以下程式碼可以看到,左邊界的值會覆蓋new MutableDateTime(1970, 1, 1, 0, 0, 0, 0, DateTimeZone.UTC)對應的位置的數位,右邊界的值會覆蓋ew MutableDateTime(1970, 1, 1, 23, 59, 59, 999, DateTimeZone.UTC)對應位置的數位;所以我們查詢中輸入2015-01-01,相當於查詢這一天之內的所有記錄;

JodaDateMathParser

private long parseDateTime(String value, DateTimeZone timeZone, boolean roundUpIfNoTime) {
    DateTimeFormatter parser = dateTimeFormatter.parser;
    if (timeZone != null) {
        parser = parser.withZone(timeZone);
    }
    try {
        MutableDateTime date;
        // We use 01/01/1970 as a base date so that things keep working with date
        // fields that are filled with times without dates
        if (roundUpIfNoTime) {
            date = new MutableDateTime(1970, 1, 1, 23, 59, 59, 999, DateTimeZone.UTC);
        } else {
            date = new MutableDateTime(1970, 1, 1, 0, 0, 0, 0, DateTimeZone.UTC);
        }
        final int end = parser.parseInto(date, value, 0);
        if (end < 0) {
            int position = ~end;
            throw new IllegalArgumentException("Parse failure at index [" + position + "] of [" + value + "]");
        } else if (end != value.length()) {
            throw new IllegalArgumentException("Unrecognized chars at the end of [" + value + "]: [" + value.substring(end) + "]");
        }
        return date.getMillis();
    } catch (IllegalArgumentException e) {
        throw new ElasticsearchParseException("failed to parse date field [{}] with format [{}]", e, value,
            dateTimeFormatter.pattern());
    }
}

一般我們使用的日期都是精確到秒,那麼只要我們將輸入資料精確到秒基本上就可以命中記錄;如果還是命中多個記錄,那麼就需要將資料的精度提高到毫秒,並且查詢輸入的時候也需要帶上毫秒;

POST my_date_index/_search
{
  "query": {
    "term": {
      "create_date": "2015-01-01T12:10:30Z"
    }
  }
}

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01T12:10:30Z"
        }
      }
    ]
  }
}

四、自定義時間字串的解析格式

elasticsearch中date預設的日期格式是表徵epoch_millis的長整型數位或者符合strict_date_optional_time格式的字串;

public static final DateFormatter DEFAULT_DATE_TIME_FORMATTER = DateFormatter.forPattern("strict_date_optional_time||epoch_millis");

strict_date_optional_time
strict限制時間字串中的年月日部分必須是4、2、2個數位,不足部分在前邊補0,例如20230123;
date_optional_time則要求字串可以不包含時間部分,但是必須包含日期部分;

strict_date_optional_time支援的完整的時間格式如下

 date-opt-time     = date-element ['T' [time-element] [offset]]
 date-element      = std-date-element | ord-date-element | week-date-element
 std-date-element  = yyyy ['-' MM ['-' dd]]
 ord-date-element  = yyyy ['-' DDD]
 week-date-element = xxxx '-W' ww ['-' e]
 time-element      = HH [minute-element] | [fraction]
 minute-element    = ':' mm [second-element] | [fraction]
 second-element    = ':' ss [fraction]
 fraction          = ('.' | ',') digit+

我們使用2015/01/01搜尋的時候,elasticsearch無法解析就會報錯

POST my_date_index/_search
{
  "profile": "true", 
  "query": {
    "term": {
      "create_date": "2015/01/01"
    }
  }
}

{
  "error": {
    "root_cause": [
      {
        "type": "parse_exception",
        "reason": "failed to parse date field [2015/01/01] with format [strict_date_optional_time||epoch_millis]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_date_index",
        "node": "eD2KQtMGSla7jzJQBQVAfQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"term\" : {\n    \"create_date\" : {\n      \"value\" : \"2015/01/01\",\n      \"boost\" : 1.0\n    }\n  }\n}",
          "index_uuid": "9MTRkZcMTnK8GgK9vKwUuA",
          "index": "my_date_index",
          "caused_by": {
            "type": "parse_exception",
            "reason": "failed to parse date field [2015/01/01] with format [strict_date_optional_time||epoch_millis]",
            "caused_by": {
              "type": "illegal_argument_exception",
              "reason": "Unrecognized chars at the end of [2015/01/01]: [/01/01]"
            }
          }
        }
      }
    ],
    "caused_by": {
      "type": "parse_exception",
      "reason": "failed to parse date field [2015/01/01] with format [strict_date_optional_time||epoch_millis]",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unrecognized chars at the end of [2015/01/01]: [/01/01]"
      }
    }
  },
  "status": 400
}

我們可以在mapping或者在搜尋的時候指定format

POST my_date_index/_search
{
    "query": {
        "range" : {
            "create_date" : {
                "gte": "2015/01/01",
                "lte": "2015/01/01",
                "format": "yyyy/MM/dd"
            }
        }
    }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01T12:10:30Z"
        }
      },
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01"
        }
      },
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "create_date" : 1420070400001
        }
      }
    ]
  }
}