KMP

KMP是單模式匹配演演算法，即在一個長度為 $n$ 的文字串S中查詢一個長度 $m$ 的模式串P。它的複雜度是 $O (n + m)$ ，差不多是此類演演算法能達到的最優複雜度。
它是如何做到的？簡單說，它通過分析P的特徵對P進行預處理，從而在與S匹配的時候能夠跳過一些字串，達到快速匹配的目的。

原理

以 $S [] = " a b c a b c a b c d ", P [] = 「 a b c d 」$ 為例， $i$ 指向 $S [i]$ ， $j$ 指向 $P [j]$ 。
在這裡插入圖片描述

圖（c）說明KMP演演算法，指向 $S$ 的 $i$ 指標不會回溯，而是一直往後走到底；
同圖（b）的樸素演演算法相比，大大減少了匹配次數。

那麼KMP是如何做到 $i$ 不回溯，只回溯 $j$ 呢？ $j$ 應該怎麼回溯？這就是KMP的核心—— $n e x t []$ 陣列，當匹配失敗後，用 $n e x t []$ 陣列指出 $j$ 回溯的位置。

next
$n e x t []$ 陣列是對串P預處理得到的， $n e x t$ 陣列的值是除當前字元外的字串的字首與字尾相同的最大長度。

字首：以 $j$ 為起點的一段字串，終點不限(別越界)
字尾：以 $j$ 為終點的一段字串，起點不限(別越界)

比如求串「abab」的next陣列：初值賦-1；
第3個字元a之前的字串ab中有長度為0的相同字首字尾，所以第3個字元a對應的next值為0；
第4個字元b之前的字串aba中有長度為1的相同字首字尾a，所以第4個字元b對應的next值為1。

a	b	a	b
-1	0	0	1

如果還是很難消化next的原理，就先知道它的作用即可：指出 $j$ 匹配失敗後回溯的位置。
知道原理後，對於程式碼程式設計實現求 $n e x t []$ 陣列其實還是不好理解，雖然程式碼不長，自慚形穢，免得誤人子弟，給出一篇大牛的詳解參考，小夥伴們可以刨根究底：從頭到尾徹底理解KMP

模板

void getnext(char* p, int lp) {
	nex[0] = nex[1] = 0;
	for (int i = 1; i < lp; i++) {
		int j = nex[i];
		while (j && p[i] != p[j])j = nex[j];
		nex[i + 1] = p[i] == p[j] ? j + 1 : 0;
	}
}
int kmp(char* s, char* p) {	//統計s中有多少個p
	int ans = 0;
	int ls = strlen(s), lp = strlen(p);
	getnext(p, lp);
	for (int i = 0, j = 0; i < ls; i++) {
		while (j && s[i] != p[j])j = nex[j];//失配則回溯j
		if (s[i] == p[j])j++;//匹配則繼續
		if (j >= lp)ans++;//統計
	}
	return ans;
}

例題

HDU-1686Oulipo

Problem Description
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter ‘e’. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given 「word」 as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T’s is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {‘A’, ‘B’, ‘C’, …, ‘Z’} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {‘A’, ‘B’, ‘C’, …, ‘Z’}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {‘A’, ‘B’, ‘C’, …, ‘Z’}, with |W| ≤ |T| ≤ 1,000,000.
Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
Sample Output
1
3
0

分析：先來個模板題

#include<bits/stdc++.h>
using namespace std;
const int maxn = 10004;
int nex[maxn];
char s[maxn * 100], p[maxn];
void getnext() {
	int lp = strlen(p);
	nex[0] = nex[1] = 0;
	for (int i = 1; i < lp; i++) {
		int j = nex[i];
		while (j && p[i] != p[j])j = nex[j];
		nex[i + 1] = p[i] == p[j] ? j + 1 : 0;
	}
}
int kmp() {
	int ans = 0;
	int ls = strlen(s), lp = strlen(p);
	getnext();
	for (int i = 0, j = 0; i < ls; i++) {
		while (j && s[i] != p[j])j = nex[j];
		if (s[i] == p[j])j++;
		if (j >= lp)ans++;
	}
	return ans;
}
int main() {
	int t;
	scanf("%d", &t);
	while (t--) {
		scanf("%s%s", p, s);
		printf("%d\n", kmp());
	}
	return 0;
}

HDU-2087剪花布條

Problem Description
一塊花布條，裡面有些圖案，另有一塊直接可用的小飾條，裡面也有一些圖案。對於給定的花布條和小飾條，計算一下能從花布條中儘可能剪出幾塊小飾條來呢？
Input
輸入中含有一些資料，分別是成對出現的花布條和小飾條，其布條都是用可見ASCII字元表示的，可見的ASCII字元有多少個，布條的花紋也有多少種花樣。花紋條和小飾條不會超過1000個字元長。如果遇見#字元，則不再進行工作。
Output
輸出能從花紋布中剪出的最多小飾條個數，如果一塊都沒有，那就老老實實輸出0，每個結果之間應換行。
Sample Input
abcde a3
aaaaaa aa
#
Sample Output
0
3

分析：找到能分開的子串數量，套用KMP，統計的時候判斷與上一個是否重合即可。

#include<bits/stdc++.h>
using namespace std;
const int maxn = 1003;
int nex[maxn];
char p[maxn], s[maxn];
void getnext() {
	nex[0] = nex[1] = 0;
	int lp = strlen(p);
	for (int i = 1; i < lp; i++) {
		int j = nex[i];
		while (j && p[i] != p[j])j = nex[j];
		nex[i + 1] = p[i] == p[j] ? j + 1 : 0;
	}
}
int kmp() { 
	int ls = strlen(s), lp = strlen(p);
	getnext();
	int last = -1; //指向上一個匹配的末尾
	int ans = 0;
	for (int i = 0, j = 0; i < ls; i++) {
		while (j && s[i] != p[j])j = nex[j];
		if (s[i] == p[j])j++;
		if (j >= lp) { //若完全匹配
			if (i - last >= lp) { //且與上一個不重合
				ans++;
				last = i;
			}
		}
	}
	return ans;
}
int main() {
	while (~scanf("%s", s)) {
		if (s[0] == '#')break;
		scanf("%s", p);
		printf("%d\n", kmp());
	}
	return 0;
}

POJ-2752Seek the Name, Seek the Fame

Description
The little cat is so famous, that many couples tramp over hill and dale to Byteland, and asked the little cat to give names to their newly-born babies. They seek the name, and at the same time seek the fame. In order to escape from such boring job, the innovative little cat works out an easy but fantastic algorithm:
Step1. Connect the father’s name and the mother’s name, to a new string S.
Step2. Find a proper prefix-suffix string of S (which is not only the prefix, but also the suffix of S).
Example: Father=‘ala’, Mother=‘la’, we have S = ‘ala’+‘la’ = ‘alala’. Potential prefix-suffix strings of S are {‘a’, ‘ala’, ‘alala’}. Given the string S, could you help the little cat to write a program to calculate the length of possible prefix-suffix strings of S? (He might thank you by giving your baby a name:)
Input
The input contains a number of test cases. Each test case occupies a single line that contains the string S described above.
Restrictions: Only lowercase letters may appear in the input. 1 <= Length of S <= 400000.
Output
For each test case, output a single line with integer numbers in increasing order, denoting the possible length of the new baby’s name.
Sample Input
ababcababababcabab
aaaaa
Sample Output
2 4 9 18
1 2 3 4 5

分析：問字首和字尾相同的長度可能是多少？就是考察對 $n e x t []$ 的理解，設 $l e n$ 為字串長度，首先原字串肯定是自己字首和字尾，len是答案，然後 $n e x [l e n]$ 就是代表這個字串的最大字首字尾相同長度，然後將這個長度為 $n e x [l e n]$ 的子串繼續分析字首字尾最大相同長度，遞迴求解即可。
眼尖選手可以直接打表套樣例找規律。

#include<cstdio>
#include<cstring>
#include<vector>
using namespace std;
const int maxn = 400005;
int nex[maxn], ls;
char s[maxn];
void getnext() {
	nex[0] = nex[1] = 0;
	for (int i = 1; i < ls; i++) {
		int j = nex[i];
		while (j && s[i] != s[j])
			j = nex[j];
		nex[i + 1] = s[i] == s[j] ? j + 1 : 0;
	}
}
int main() {
	while (~scanf("%s", s)) {
		ls = strlen(s);
		getnext();
		int idx = ls ;
		vector<int>ans;
		while (nex[idx]) {
			ans.push_back(nex[idx] );
			idx = nex[idx];
		}
		int len = ans.size();
		for (int i = len - 1; i >= 0; i--)
			printf("%d ", ans[i]);
		printf("%d\n", ls);
	}
	return 0;
}

POJ-2406Power Strings

Description
Given two strings a and b we define ab to be their concatenation. For example, if a = 「abc」 and b = 「def」 then ab = 「abcdef」. If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = 「」 (the empty string) and a^(n+1) = a*(a^n).
Input
Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.
Output
For each s you should print the largest n such that s = a^n for some string a.
Sample Input
abcd
aaaa
ababab
.
Sample Output
1
4
3
Hint
This problem has huge input, use scanf instead of cin to avoid time limit exceed.

分析：問字串是由多少個子串迴圈組成 ?還是對 $n e x t []$ 的考察， $n e x t []$ 是字首和字尾最大相同長度，那麼 $n - n e x t [n]$ 就是可能的迴圈節長度，再判斷下是否為0及是否整除，否則迴圈節長度為1。舉圖說明，也可以打表觀察。
~~居然不給n範圍，我測出來是1e6~~
在這裡插入圖片描述

#include<cstdio>
#include<cstring>
using namespace std;
const int maxn = 1000006;
int nex[maxn], ls;
char s[maxn];
void getnext() {
	nex[0] = nex[1] = 0;
	for (int i = 1; i < ls; i++) {
		int j = nex[i];
		while (j && s[i] != s[j])
			j = nex[j];
		nex[i + 1] = s[i] == s[j] ? j + 1 : 0;
	}
}
int main() {
	while (~scanf("%s", s)) {
		if (s[0] == '.')break;
		ls = strlen(s);
		getnext();
		int len = ls - nex[ls];
		if (len && ls % len == 0)
			printf("%d\n", ls / len);
		else puts("1");
	}
	return 0;
}

後記

小結一下，KMP是單模式匹配演演算法，複雜度是 $O (n + m)$ ，~~多模匹配演演算法用AC自動機~~ 。其中需要重點理解的是 $n e x t []$ ，很多衍生題都是從 $n e x [] t$ 切入的，如果沒思路可以試試打表套樣例找規律，多觀察 $n e x t []$ 。

原創不易，請勿轉載（~~本不富裕的存取量雪上加霜~~ ）
博主首頁：https://blog.csdn.net/qq_45034708
如果文章對你有幫助，記得關注點贊收藏❤

字串-KMP

文章目錄

KMP

原理

模板

例題

HDU-1686Oulipo

HDU-2087剪花布條

POJ-2752Seek the Name, Seek the Fame

POJ-2406Power Strings

後記