String Pattern Matching Algorithm. Part-1
Knuth–Morris–Pratt(KMP) algorithm, Z-algorithm, Rabin–Karp algorithm, Tries
Given a pattern p = “bacb” and a string s = “bacba”. You just have to find is this pattern exist in the string or not? If it exist then how many time this pattern exist and from which positions we get those pattern? Those are some sample questions we can answer using KMP algorithm. Here, I take the string small for not making the article big. Let’s walk with my thinking process and have fun.
NOTE:- I am also a learner like you guys and write for making small notes, so I can revisit those when I need them. I may make mistakes so if you guys find anything please let me know so that I can increase my knowledge and fix it. Please, read the last part if you read algorithm. Let’s start.
A prefix is a group of letters placed before a specific letter. Here it is last letter. For example, ABBC. Here proper or perfect prefix of C is A,AB,ABB. The longest prefix is ABB.
A suffix is a group of letters placed after a specific letter. Here it is first letter. For example, ABBC. Here proper or perfect suffix of A is B,BB,BBC. The longest suffix is BBC.
Algorithm flow in a simple sentence:
- We traverse each element s[i] and ask it’s previous element array[i-1] that what is the maximum length of your match prefix which is also a suffix.
2. Then we start matching the string from that position j = array[i-1]. So, we match s[j] == s[i]. If it match then we increase the the length j++ and assign it array[i] = j. If it do not match then then we go to the previous element array[j-1] and ask 1st question again and again.
this is the simple sentence explanation for this algorithm. Now face a question.
Q1:- from 2 if it is not match then why we go to previous element?
We know the previous element which is s[j-1] already match s[i-1]. So, if I know the maximum length of prefix of array[j-1] which is also suffix then we can say that after that element we can match our i-th element with it. Because the previous elements already match.
Let’s explain with example:
let’s assume that the for loop iterate the i-th position which is a. Now, this “a” will ask its previous element which is ‘b’ that how many characters you already match which is both prefix and suffix at the same time. The answer of “b” will be 4 as pi-array[i-1] == 4. There are “bacb” which present as a suffix and prefix of the same time. So the element “a” knows that 4 elements from the first already match. So it starts to compare from the 5-th element which is “#”. As a == # is false so it asks the previous element of “#” the same question. How many characters you(b) match? “b” will answer 1. So, it will start compering from the 2nd element which is “a” in that case. Is i-th element(a) == 2nd element(a)? Yes, it is. So the i-th position value of the pi-array will be 2. The same things happen again and again.
Notice that the pi-array value of the 8-th position is 4. What is it mean? It means we have 4 elements from the index (8–4+1) to index 8-th, which is also a suffix and prefix at the same time. Now, what is our first 4 element in the string? It is our pattern. So we can say that we get a pattern match which end is at position 8-th. So we can find the pattern from 3 elements earlier as the element at position 8-th is also included in our pattern and the pattern length is 4.
let’s see the code:
Time Complexity of KMP:
Time complexity of KMP is O(n) both in worst case and best case also. In best case you do not even need n. You just need n-m+1.
I was thinking to write all algorithms all together but it looks like so long. So, I will try to explain the remaining algorithms in letter parts. If anyone reads it and it helps you a little bit, give a comment. It encourages me to write more and share each other knowledge.
Resources: I read several articles, watch some videos to clear my concept. It’s a random list. You can go through it in a google search. I respect all of you guys who shared their knowledge which helps me to learn. Happy coding.