字串壓縮(二)之LZ4

2022-07-08 18:01:01

  本文來自部落格園,作者:T-BARBARIANS,轉載請註明原文連結:https://www.cnblogs.com/t-bar/p/16451185.html 謝謝!

 

  上一篇對google精品ZSTD的壓縮、解壓縮方法,壓縮、解壓縮的效能表現,以及多執行緒壓縮的使用方法進行了介紹。

  本篇,我們從類似的角度,看看LZ4有如何表現。

一、LZ4壓縮與解壓

  LZ4有兩個壓縮函數。預設壓縮函數原型:

  int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity);

  快速壓縮函數原型:

  int LZ4_compress_fast (const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);

  快速壓縮函數acceleration的引數範圍:[1 ~ LZ4_ACCELERATION_MAX],其中LZ4_ACCELERATION_MAX為65537。什麼意思呢,簡單的說就是acceleration值越大,壓縮速率越快,但是壓縮比就越低,後面我會用實驗資料來進行說明。

  另外,當acceleration = 1時,就是簡化版的LZ4_compress_defaultLZ4_compress_default函數預設acceleration = 1。

 

  LZ4也有兩個解縮函數。安全解縮函數原型:

  int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);

  快速解縮函數原型:
  int LZ4_decompress_fast (const char* src, char* dst, int originalSize);

  快速解壓函數不建議使用。因為LZ4_decompress_fast 缺少被壓縮後的文字長度引數,被認為是不安全的,LZ4建議使用LZ4_decompress_safe。

  同樣,我們先來看看LZ4的壓縮與解壓縮範例。

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include <sys/time.h>
 4 #include <malloc.h>
 5 #include <lz4.h>
 6 #include <iostream>
 7 
 8 using namespace std;
 9 
10 int main()
11 {
12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
33     It's only mud.";
34 
35     size_t com_space_size;
36     size_t peppa_pig_text_size;
37 
38     char *com_ptr = NULL;
39 
40     // compress
41     peppa_pig_text_size = strlen(peppa_pig_buf);
42     com_space_size = LZ4_compressBound(peppa_pig_text_size);
43     
44     com_ptr = (char *)malloc(com_space_size);
45     if(NULL == com_ptr) {
46         cout << "compress malloc failed" << endl;
47         return -1;
48     }
49 
50     memset(com_ptr, 0, com_space_size);
51 
52     size_t com_size;
53     //com_size = LZ4_compress_default(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size);
54     com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1);
55     cout << "peppa pig text size:" << peppa_pig_text_size << endl;
56     cout << "compress text size:" << com_size << endl;
57     cout << "compress ratio:" << (float)peppa_pig_text_size / (float)com_size << endl << endl;
58 
59 
60     // decompress
61     size_t decom_size;
62     char* decom_ptr = NULL;
63     
64     decom_ptr = (char *)malloc((size_t)peppa_pig_text_size);
65     if(NULL == decom_ptr) {
66         cout << "decompress malloc failed" << endl;
67         return -1;
68     }
69 
70     decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size);
71     cout << "decompress text size:" << decom_size << endl;
72 
73     // use decompress buf compare with origin buf
74     if(strncmp(peppa_pig_buf, decom_ptr, peppa_pig_text_size)) {
75         cout << "decompress text is not equal peppa pig text" << endl;
76     }
77     
78     free(com_ptr);
79     free(decom_ptr);
80     return 0;
81 }

執行結果:

  從結果可以發現,壓縮之前的peppa pig文字長度為1848,壓縮後的文字長度為1125(上一篇ZSTD為759),壓縮率為1.6,解壓後的長度與壓縮前相等。相同文字情況下,壓縮率低於ZSTD的2.4。從文字被壓縮後的長度表現來說,LZ4比ZSTD要差。

  下圖圖1是LZ4隨著acceleration的遞增,文字被壓縮後的長度與acceleration的關係。隨著acceleration的遞增,文字被壓縮後的長度越來越長。

圖1

  圖2是LZ4隨著acceleration的遞增,壓縮率acceleration的關係。隨著acceleration的遞增,壓縮率也越來越低。

 圖2

  這是為什麼呢?還是上一篇提到的 魚(效能)和熊掌(壓縮比)的關係。獲得了壓縮的高效能,失去了演演算法的壓縮率。

二、LZ4壓縮效能探索

  接下來摸索一下LZ4的壓縮效能,以及LZ4在不同acceleration級別下的壓縮效能。

  測試方法是,使用LZ4_compress_fast,連續壓縮同一段文字並持續10秒。每一次分別使用不同的acceleration級別,最後得到每一種acceleration級別下每秒的平均壓縮速率。測試壓縮效能的程式碼範例如下:

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include <sys/time.h>
 4 #include <malloc.h>
 5 #include <lz4.h>
 6 #include <iostream>
 7 
 8 using namespace std;
 9 
10 int main()
11 {
12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
33     It's only mud.";
34 
35     int cnt = 0;
36     
37     size_t com_size;
38     size_t com_space_size;
39     size_t peppa_pig_text_size;
40 
41     timeval st, et;
42     char *com_ptr = NULL;
43 
44     peppa_pig_text_size = strlen(peppa_pig_buf);
45     com_space_size = LZ4_compressBound(peppa_pig_text_size);
46 
47     int test_times = 6;
48     int acceleration = 1;
49     
50     // compress performance test
51     while(test_times >= 1) {
52     
53         gettimeofday(&st, NULL);
54         while(1) {
55         
56             com_ptr = (char *)malloc(com_space_size);
57             if(NULL == com_ptr) {
58                 cout << "compress malloc failed" << endl;
59                 return -1;
60             }
61             
62             com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, acceleration);
63             if(com_size <= 0) {
64                 cout << "compress failed, error code:" << com_size << endl;
65                 free(com_ptr);
66                 return -1;
67             }
68             
69             free(com_ptr);
70         
71             cnt++;
72             gettimeofday(&et, NULL);
73             if(et.tv_sec - st.tv_sec >= 10) {
74                 break;
75             }
76         }
77         
78         cout << "acceleration:" << acceleration << ", compress per second:" << cnt/10 << " times" << endl;
79 
80         ++acceleration;
81         --test_times;
82     }
83 
84     return 0;
85 }

執行結果:

 

  結果可以總結為兩點:一是acceleration為預設值1時,即LZ4_compress_default函數的預設值時,每秒的壓縮效能在20W+;二是隨著acceleration的遞增,每秒的壓縮效能也在遞增,但是代價就是獲得更低的壓縮率。

  acceleration遞增與壓縮速率的關係如下圖所示:

 圖3

三、LZ4解壓效能探索

  接下來繼續瞭解一下LZ4的解壓效能。

  測試方法是先使用LZ4_compress_fastacceleration = 1壓縮文字,再使用安全解壓函數LZ4_decompress_safe,連續解壓同一段文字並持續10秒,最後得到每秒的平均解壓速率。測試解壓效能的程式碼範例如下:

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include <sys/time.h>
 4 #include <malloc.h>
 5 #include <lz4.h>
 6 #include <iostream>
 7 
 8 using namespace std;
 9 
10 int main()
11 {
12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
33     It's only mud.";
34 
35     int cnt = 0;
36     
37     size_t com_size;
38     size_t com_space_size;
39     size_t peppa_pig_text_size;
40 
41     timeval st, et;
42     char *com_ptr = NULL;
43 
44     // compress
45     peppa_pig_text_size = strlen(peppa_pig_buf);
46     com_space_size = LZ4_compressBound(peppa_pig_text_size);
47 
48     com_ptr = (char *)malloc(com_space_size);
49     if(NULL == com_ptr) {
50         cout << "compress malloc failed" << endl;
51         return -1;
52     }
53 
54     com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1);
55     if(com_size <= 0) {
56         cout << "compress failed, error code:" << com_size << endl;
57         free(com_ptr);
58         return -1;
59     }
60 
61     // decompress
62     size_t decom_size;
63     char* decom_ptr = NULL;
64     
65     // decompress performance test
66     gettimeofday(&st, NULL);
67     while(1) {
68 
69         decom_ptr = (char *)malloc((size_t)peppa_pig_text_size);
70         if(NULL == decom_ptr) {
71             cout << "decompress malloc failed" << endl;
72             free(com_ptr);
73             return -1;
74         }
75         
76         decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size);
77         if(decom_size <= 0) {
78             cout << "decompress failed, error code:" << decom_size << endl;
79             free(com_ptr);
80             free(decom_ptr);
81             return -1;
82         }
83 
84         free(decom_ptr);
85 
86         cnt++;
87         gettimeofday(&et, NULL);
88         if(et.tv_sec - st.tv_sec >= 10) {
89             break;
90         }
91     }
92 
93     free(com_ptr);
94     cout << "decompress per second:" << cnt/10 << " times" << endl;
95     
96     return 0;
97 }

執行結果:

   結果顯示LZ4的解壓效能大概在每秒54W次左右,解壓速率還是非常可觀。

四、LZ4對比ZSTD

  使用相同的待壓縮文字,分別使用ZSTD與LZ4進行壓縮、解壓、壓縮效能、解壓效能測試後有表1的資料。

表1

  

  拋開演演算法的優劣對比,從實驗結果來看,ZSTD更加側重於壓縮率,LZ4(acceleration = 1)更加側重於壓縮效能。

五、總結

  無論任何演演算法,都很難做到既有高效能壓縮的同時,又有特別高的壓縮率。兩者必須要做一個取捨,或者找到一個合適的平衡點。

  如果在效能可以接受的情況下,選擇具有更高壓縮率的ZSTD將更加節約儲存空間(通過執行緒池進行多執行緒壓縮可以進一步提升效能);如果對壓縮率不是特別看中,追求更高的壓縮效能,那LZ4也是一個不錯的選擇。

  最後,看到這裡是不是覺得任何長度的字串都可以被ZSTD、LZ4之類的壓縮算壓縮得很好呢?欲知後事如何,請聽下回分解!碼字不易,還請各位技術愛好者登入點個贊呀!

 

  本文來自部落格園,作者:T-BARBARIANS,轉載請註明原文連結:https://www.cnblogs.com/t-bar/p/16451185.html 謝謝!