基於Elasticsearch 為電商提供商品資料巨量資料查詢

2023-06-15 12:00:31

基於Elasticsearch 為電商提供商品資料巨量資料查詢

前言

對於現代電商的產品,維度的多員花,與一套強大的搜尋引擎,那是非常必要的。今天我們主要是描述我們在從事電商搜尋引擎過程中的遇到的一些問題和經驗分享。

過程

資料準備

1、我們準備為我們需要做查詢的資料做好一張檢視,方便我們分析資料查詢維度,與查詢場景需求。附加程式碼,對於Mysql 建立檢視不清楚的,可以自行查詢具體的檔案瞭解,在我們完成檢視建立後,我們就已經有了一張檢視表,供我們資料使用。


select `g`.`goods_id` AS `goods_id`,`g`.`publisher_sn` AS `publisher_sn`,`g`.`add_time` AS `add_time`,`g`.`last_update` AS `last_update`,`g`.`goods_name` AS `goods_name`,`g`.`fineness` AS `fineness`,`g`.`look` AS `look`,`g`.`cat_path` AS `cat_path`,`g`.`goods_number` AS `goods_number`,`g`.`shop_price` AS `shop_price`,`g`.`goods_weight` AS `weight`,`g`.`keywords` AS `keywords`,`g`.`goods_desc` AS `goods_desc`,`g`.`isbn` AS `isbn`,`a`.`attr_value` AS `author`,`b`.`attr_value` AS `publisher`,`c`.`attr_value` AS `yiname`,`m`.`age` AS `age`,`m`.`press_intro` AS `press_intro`,`m`.`author_info` AS `author_info`,`m`.`media_intro` AS `media_intro`,`m`.`catalog` AS `catalog`,`m`.`prologue` AS `prologue`,`m`.`selling_point_1` AS `selling_point_1`,`m`.`selling_point_2` AS `selling_point_2`,`m`.`selling_point_3` AS `selling_point_3`,`m`.`detail_intro_1` AS `detail_intro_1`,`m`.`detail_intro_2` AS `detail_intro_2`,`m`.`detail_intro_3` AS `detail_intro_3`,`m`.`wtao_intro` AS `wtao_intro`,`m`.`video_intro` AS `video_intro`,`co`.`positive` AS `positive`,`co`.`negative` AS `negative`,`s`.`name` AS `series_name`,`s`.`name_cn` AS `series_name_cn`,`v`.`title` AS `v_title`,`v`.`article` AS `v_article`,`k`.`bunch_no` AS `bunch_no` from ((((((((`sd_goods` `g` left join `sd_goods_attr` `c` on((`g`.`goods_id` = `c`.`goods_id`))) left join `sd_goods_attr` `a` on((`g`.`goods_id` = `a`.`goods_id`))) left join `sd_goods_attr` `b` on((`g`.`goods_id` = `b`.`goods_id`))) left join `sd_goods_more` `m` on((`g`.`goods_id` = `m`.`goods_id`))) left join `sd_cover_text` `co` on((`g`.`isbn` = `co`.`isbn`))) left join `sd_series_name` `s` on((`g`.`isbn` = `s`.`isbn`))) left join `nosql`.`video_words_result` `v` on((`g`.`isbn` = `v`.`isbn`))) left join `sd_bunch` `k` on((`g`.`isbn` = `k`.`isbn`))) where ((`c`.`attr_id` = 1) and (`a`.`attr_id` = 2) and (`b`.`attr_id` = 3))

2、建立查詢索引,在建立這塊的時候,需要主要建立過程中的型別的選擇,方便您在查詢過程中可以應用的更準確與方便。

PUT /products
{
 "settings": {
   "number_of_shards": 5,
   "number_of_replicas": 1
 },
 "mappings": {
     "properties": {
       "goods_id":{
         "type": "text"
       },
       "publisher_sn":{
           "type": "text"
       },
       "goods_name": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "keywords": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "weight":{
           "type":"keyword"
       },
       "goods_desc": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "author": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "publisher": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "yiname": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "fineness":{
           "type": "text"
       },
       "look":{
           "type": "text"
       },
       "isbn":{
           "type": "text"
       },
       "age":{
           "type": "text"
       },
       "press_intro": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "author_info": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "media_intro": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "positive": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "negative": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "series_name": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "series_name_cn": {
           "type": "text",
           "analyzer": "ik_smart"
       },
       "v_title":{
           "type": "text",
           "analyzer": "ik_smart"
       },
       "v_article":{
           "type": "text",
           "analyzer": "ik_smart"
       }
     }
 }
}

3、索引資料的新增,資料的新增方式更多的看具體的團隊的情況,我們這邊主要是使用Canal 來幫助我們完成資料的新增與新增資料的新增,在使用Canal的時候,需要有JAVA經驗,會更好的解決一些同步過程中的問題。

4、對於未使用現成資料同步工具的,自己也是可以根據具體場景寫Hook 來完成資料的新增,對於有不清楚的地方,可以聯絡我們瞭解。

5、對於 Elasticsearch 的部署搭建不熟悉的同步,可以參考我們的 Docker-composer 快速部署方式。

資料使用

資料查詢應用,基於 SDK 查詢, 對於 Query DSL 不熟悉的同步可以基於我們前面的 How to build a OR condition in Elasticsearch Query DSL 瞭解更多