AlgoDaily - Longest Substring With No Duplicate Characters

Good evening! Here's our prompt for today.

Given a string of characters, can you find the longest substring of the string that has no repeating characters?

An example is shown below:

JAVASCRIPT

1lengthOfLongestSubstring("abracadabar");
2// 4 because it is 'brac'

Constraints

Length of the string <= 100000
The string will only contain lowercase letters
Expected time complexity : O(n)
Expected space complexity : O(1)

In this challenge, we'll discuss a simple but efficient method for finding the longest substring with no repeating characters.

Try to solve this here or in Interactive Mode.

How do I practice this challenge?

xxxxxxxxxx
 
var assert = require('assert');
​
var lengthOfLongestSubstring = function (s) {
  // fill this in
  return maximumLen;
};
​
try {
  assert.equal(lengthOfLongestSubstring('algodaily'), 7);
​
  console.log(
    'PASSED: ' +
      "`lengthOfLongestSubstring('algodaily')` should be equal to `7`"
  );
} catch (err) {
  console.log(err);
}
​
try {
  assert.equal(lengthOfLongestSubstring('thisisatest'), 5);
​
  console.log(
    'PASSED: ' +
      "`lengthOfLongestSubstring('thisisatest')` should be equal to `5`"
  );
} catch (err) {
  console.log(err);
}
​

Here's how we would solve this problem...

How do I use this guide?

Brute Forcing It

This is one of those problems which is deceptively challenging. When you're given the prompt, it sounds like an easy iteration problem-- but we'll need to have it perform at a decent complexity.

Before even considering optimization, though, let's consider the most straightforward solution possible. So we want to know the longest substring with no duplicate characters. How would we do this manually, given the word algodaily? To achieve this, why not just look at every single substring?

Here's how the steps would go:

First, we consider a. Longest max is 1.
Then, we look at al. Longest max is now 2.
We look at alg. It's now 3.
o - 4
d - 5
a - wait, we've seen this before. Max is still 5, and the next few steps are also invalid and won't change the max.
We have substring algodai - i - substring has duplicate, so invalld
algodail - l - substring has duplicate, so invalld
algodaily - y - substring has duplicate, so invalld
Now we can start from l, and repeat from step 1... (l, lg, etc.)

We've provided an example of how this brute force solution would look when implemented.

xxxxxxxxxx
}
 
function hasDuplicateChars(s) {
  const hash = {};
  for (let ch of s) {
    if (hash[ch]) {
      return true;
    } else {
       hash[ch] = 1; 
    }
  }
  return false;
}
​
var lengthOfLongestSubstring = function (s) {
  let max = 0;
  
  for (let start = 0; start < s.length; start++) {
    for (let end = 0; end < s.length; end++) {  
      const substr = s.slice(start, end + 1);
      if (!hasDuplicateChars(substr)) {
        max = Math.max(max, substr.length);
      }
    }
  }
  
  return max;
};
​
try {
  assert.equal(lengthOfLongestSubstring('algodaily'), 7);

However, at O(n^2), this is way too inefficient. Let's determine a better solution. At a quick glance, we know we'll need to iterate and have some form of on-going storage for tracking purposes.

Solution to Longest Substring Without Duplicate Characters

Our optimized method solves the problem efficiently with respect to time, yet requires very little extra space. Remember the manual steps our brute force approach above? Oftentimes, efficient solutions just cut out steps from slower ones.

Consider the step by step we already did. I highlighted/bolded some steps we probably don't need:

First, we consider a. Longest max is 1.
Then, we look at al. Longest max is now 2.
We look at alg. It's now 3.
o - 4
d - 5
a - wait, we've seen this before. Max is still 5, and the next few steps are also invalid and won't change the max. 1. We have substring algodai - i - substring has duplicate, so invalid 1. algodail - l - substring has duplicate, so invalid 1. algodaily - y - substring has duplicate, so invalid
Now we can start from l, and repeat from step 1... (l, lg, etc.)

Notice something? The last 3 steps before we start from l could've been cut out! We knew that once we encountered the second a, all substrings that we extended to would be invalid. So why don't we jump straight to starting from l as soon as we see the duplicate?

That's the idea behind the optimal solution.

More Setup

To find the longest substring with no repeating characters, we'll read each character of the input string from left to right.

At the same time, we'll keep track of the current substring and the maximum substring length found so far. To keep track of the current substring, we'll record its starting index in a variable startInd and its last index in a variable i.

This solution uses an array, which has the same size as the size of the alphabet. The array stores the index of the character of the alphabet last found in the input string (in other words, it's tracking the last seen character element). In this tutorial, we call this array lastPos.

JAVASCRIPT

1const lastPos = [];

As we read the characters, we'll also maintain the lastPos array for storing the index of each character's last occurrence in the string. The lastPos array would be initialized to -1.

Suppose at index position i, we have read character x. Before that, we would've encountered this character at lastPos[x]. lastPos helps us remember where we last saw a duplicate character. That way, we can use lastPos[x] + 1 to "remove" the first duplicate from the current substring, and test against a valid substring.

Cases

There are two possible cases that are occurring, which are shown in the figure and explained below. For now, we'll assume that lastPos can be indexed by x. Later we'll discuss how to implement the indexing for this array.

These two cases help us make sense of when to start or stop consideration of the current substring.

Case 1: x has already occurred in the current substring. In this situation startInd <= lastPos[x].
To handle this case, start a new substring after lastPos[x]-- to do this, we can do startInd = lastPos[x]+1. Update the length of the current substring as (i-startInd+1).
Case 2: x has either not occurred in the string or it has occurred in some previous substring. In this case, startInd > lastPos[x] and we can read the next character of the input string and keep updating length and maximum length found so far.

Algorithm for Longest Substring Without Duplicates

The big picture pseudo-code algorithm for finding the longest substring without duplicates is provided here. We'll make the same assumption about the lastPos array as mentioned previously (that it can be indexed by x and lastPos[x] holds the last found occurrence of x in the input string).

The figure below shows the entire process for an example.

xxxxxxxxxx
 
Method: longestSubstring
Input: Input string str
​
1. Initialize all elements of lastPos to -1
2. startInd = 0
3. len = 0
4. max = 0
5. i = 0
6. For each character x in str at position i
    {
        a. if startInd <= lastPos[x]
            i.   startInd = lastPos[x]+1
            ii.  len = i-startInd + 1
            iii. lastPos[x] = i
        b. if startInd > lastPos[x] 
            i.   lastPos[x] = i
            ii.  len++
            iii. if (len>max) max = len
        
    }
7. return max
}

Time Complexity

If n is the length of the input string, then the time complexity for the algorithm described here is O(n). This is evident from the described algorithm, which traverses the entire input string from left to right only once.

The space complexity of the algorithm is, however, O(A), where A is the size of the alphabet. Most of the time, the size of the alphabet is constant (26 for lower-case letters, 25 for all letters, and 256 for ASCII) and quite low compared to memory of a modern computer. So space complexity can be thought of O(1).

Concluding Remarks

In this challenge, we discussed a linear time algorithm for finding the shortest substring of a string such that the substring has no duplicate characters

In trade-off of very little memory, the algorithm is time-efficient. But if the alphabet is very large, a more efficient data structure, that consumes less space can be used. One possibility is a hash table that would only store the characters that occurred last in the string and not the entire alphabet.

One Pager Cheat Sheet

We'll use an efficient method which has an O(n) time complexity and O(1) space complexity to find the longest substring without repeating characters in a string of length 100000 or less which only contains lowercase letters.
The problem of finding the longest substring with no repeated characters can be solved by manually checking every substring with a brute force approach.
Our optimized solution efficiently solves the problem with respect to time, while requiring very little extra space, by dynamically tracking the last seen character element and jumping straight to the next valid substring when a duplicate is spotted.
This algorithm finds the longest substring without duplicates by iterating through each character in the string and using lastPos to store the last occurrence of the character.
The time complexity of the algorithm discussed is O(n) and the space complexity is O(1) or, depending on the size of the alphabet, O(A).

This is our final solution.

To visualize the solution and step through the below code, click Visualize the Solution on the right-side menu or the VISUALIZE button in Interactive Mode.

xxxxxxxxxx
}
 
var assert = require('assert');
​
var lengthOfLongestSubstring = function (s) {
  var startNum = 0,
    maximumLen = 0;
  var map = new Map();
​
  for (var i = 0; i < s.length; i++) {
    var ch = s[i];
​
    if (map.get(ch) >= startNum) startNum = map.get(ch) + 1;
    map.set(ch, i);
​
    if (i - startNum + 1 > maximumLen) maximumLen = i - startNum + 1;
  }
​
  return maximumLen;
};
​
try {
  assert.equal(lengthOfLongestSubstring('algodaily'), 7);
​
  console.log(
    'PASSED: ' +
      "`lengthOfLongestSubstring('algodaily')` should be equal to `7`"
  );
} catch (err) {
  console.log(err);
}
​
try {
  assert.equal(lengthOfLongestSubstring('thisisatest'), 5);

Great job getting through this. Let's move on.

If you had any problems with this tutorial, check out the main forum thread here.