Mark As Completed Discussion

Good evening! Here's our prompt for today.

How would you write a function to detect a substring in a string?

Description

If the substring can be found in the string, return the index at which it starts. Otherwise, return -1.

JAVASCRIPT
1function detectSubstring(str, subStr) {
2  return -1;
3}

Important-- do not use the native String class's built-in substring or substr method. This exercise is to understand the underlying implementation of that method.

Constraints

  • Length of both the given strings <=100000
  • The strings would never be null
  • The strings will only consist of lowercase letters
  • Expected time complexity : O(n)
  • Expected space complexity : O(1)

Try to solve this here or in Interactive Mode.

How do I practice this challenge?

PYTHON
OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment

Here's a video of us explaining the solution.

To change the speed of the video or see it in full screen, click the icons to the right of the progress bar.

We'll now take you through what you need to know.

How do I use this guide?

We should begin by reasoning out what we need to do in plain English before diving into the logic.

Let's take a simple example:

If we wanted to know if car was in carnival, that's easy. The first few letters are the same.

SNIPPET
1c a r
2c a r n i v a l
3^ ^ ^

So we'd write a for-loop and check that each index is equal until the shorter word is completed.

What if we wanted to know if graph was in geography? We would still write a for-loop that compares the characters at each index, but we need to account for if there is a match later on.

Here, we can use the two-pointer technique we learned about prior! Why don't we use another pointer for the purpose of seeing how far we can "extend our match" when we find an initial one?

PYTHON
OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment

So given string geography and substring graph, we'd step it through like this:

  1. g matches g, so j = 1
  2. e does not match r, so j = 0
  3. o does not match g, so j = 0
  4. g does match g, so j = 1
  5. r does match r, so j = 2
  6. a does match a, so j = 3 and so on...

At the same time, we also need to keep track of the actual index that the match starts. The j index will help us get the matching, but we need a idxOfStart variable for just the start.

Putting it all together:

PYTHON
OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment

There are still two slight issues with this code. Can you see what they are?

Let's take the examples of a string ggraph and substring graph. On the first iteration, g will match g, which is what we want. However, on the next iteration, we will try to match the second g in ggraph with r in graph.

What should happen is if we don't find a match, we ought to reset i a bit as well-- this is to accommodate duplicate characters. We can reset it specifically to the length of the last processed substring (or j).

The other thing is regarding the index of the start of the match-- we can simply get it from i - (subStr.length - 1). Note that you need to subtract 1 from the substring's length in order to get to the correct index.

Complexity of Final Solution

Let n be the length of the string. In the worst-case scenario, we iterate through all n characters of the string and do not find a match. Many of us will then declare that the time complexity is linear O(n). But think carefully, for each partial matching of substrings, we are traversing through the substring, and then again resetting to the initial pointer i.

To get a more clear insight into this, let's consider an example:

TEXT
1str = "aaaaaa"
2subStr = "aab"

The final solution code of the method explained above is given.

JAVASCRIPT
OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment

Build your intuition. Click the correct answer from the options.

How many comparisons will happen for the given code in the given test case?

TEXT
1str = "aaaaaa"
2subStr = "aab"

Click the option that best answers the question.

  • 2
  • 12
  • 5
  • 6

As you can see, there are total 15 comparisons for the given test case. This is not close to the linear 6 comparison time complexity. In fact, this is a complexity of close to 6*3 comparisons. The reason for this is for each starting letter of the main string, we are comparing the whole substring and then resetting back to the initial character. See the image below:

Complexity Calculation

So the time complexity will be O(mn) where m and n are the lengths of the string and substring. For memory, using the two-pointer method, we use a constant O(1) space for our solution.

There is another established algorithm named Rabin-Karp algorithm which uses hashing and two-pointer technique to solve this problem in O(n) time complexity. Feel free to try to implement that algorithm and tell us in the discussion about it.

One Pager Cheat Sheet

  • Write a function detectSubstring to detect a substring within a given string, avoiding String class's built-in substring or substr method, with O(n) time complexity and O(1) space complexity, and handle strings of length ≤ 100000.
  • We should reason out what we need to do before diving into the logic of solving a simple example like whether a car is in carnival.
  • We can use the two-pointer technique to compare two strings and check if one is a substring of the other by writing a for-loop to compare the characters at each index and discover how far we can extend our match when we find an initial one.
  • We can use an idxOfStart variable and a j index to determine if a string contains a substring by checking character-by-character whether they match or not.
  • The code needs two slight adjustments to handle duplicate characters and accurately obtain the index of the start of matches, and its complexity is more accurately expressed as O(n*m), where m is the length of the substring.
  • The number of comparisons made for the given code in the given test case is 12, which is n * m (6 * 3).
  • With the two-pointer method, the time complexity for solving the given test case problem is O(mn) and a constant O(1) space is used, although the Rabin-Karp algorithm may provide a more optimal O(n) time complexity solution.

This is our final solution.

To visualize the solution and step through the below code, click Visualize the Solution on the right-side menu or the VISUALIZE button in Interactive Mode.

PYTHON
OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment

Alright, well done! Try another walk-through.

If you had any problems with this tutorial, check out the main forum thread here.