The algorithm is designed to improve upon the performance of traditional pattern-matching algorithms, such as the naïve approach or the brute-force method. The key idea behind the KMP algorithm is to avoid unnecessary character comparisons by utilizing information about the pattern itself.
Here’s a high-level overview of how the KMP algorithm works:
Preprocessing (Building the “Partial Match” Table):
The algorithm analyzes the pattern and constructs a “partial match” table, also known as the “failure function” or “pi function.”
This table helps determine the number of characters to shift the pattern when a mismatch occurs during the search.
Searching:
The algorithm starts matching characters of the pattern against the text from left to right.
If a character in the pattern matches the corresponding character in the text, both pointers are advanced.
If a mismatch occurs:
The algorithm uses the partial match table to determine the maximum number of characters it can safely skip in the pattern.
It shifts the pattern by that amount and resumes matching from the new position, avoiding redundant comparisons.
By utilizing the partial match table, the KMP algorithm achieves a linear time complexity for pattern matching, specifically O(n + m), where n is the length of the text and m is the length of the pattern. This improvement makes it more efficient than the brute-force method, which has a time complexity of O(n * m).
The KMP algorithm is widely used in various applications that involve pattern matching, such as text editors, search engines, bioinformatics, and data mining, where efficient string searching is required.
It’s worth noting that while the KMP algorithm provides an efficient solution for pattern matching, it may not always be the optimal choice depending on the specific requirements and characteristics of the problem at hand. Other algorithms, such as Boyer-Moore or Rabin-Karp, may be more suitable in certain scenarios.
// pi[i] stores the length of the longest proper prefix which is also a suffix in pattern[0...i] std::vector<int> pi(pattern.length(), 0);
// Empty string's longest suffix is empty itself, the lenght is zero int j = 0;
// Building the pi array for the pattern for (int i = 1; i < pattern.size(); i++) { while (j > 0 && pattern[i] != pattern[j]) { // If characters don't match, use pi array to skip comparisons // L = pi[j - 1], then pattern[0...(L-1)] = pattern[(i-L+1)...i], so the next search // position is L j = pi[j - 1]; }
// If characters match, update the pi array if (pattern[i] == pattern[j]) { // j is the index, so (index + 1) means the length pi[i] = ++j; } }
j = 0;
for (int i = 0; i < text.length(); i++) { while (j > 0 && text[i] != pattern[j]) { j = pi[j - 1]; }
if (text[i] == pattern[j]) { if (j == pattern.length() - 1) { return i - j; } else { j++; } } }
return-1; } };
6 Question-32[★★★★]
Longest Valid Parentheses
Given a string containing just the characters '(' and ')', find the length of the longest valid (well-formed) parentheses substring.
For "(()", the longest valid parentheses substring is "()", which has length = 2.