Using Bit Manipulation to store the occurrence of characters with constant extra space

Using Bit Manipulation to store the occurrence of characters with constant extra space

Leetcode 2405: Optimal Partition of String using Bit Manipulation with constant extra space.

ยท

7 min read

Leetcode 2405: Optimal Partition of String

Problem Statement:

Note: Input string can have only lowercase characters.

Understanding the problem:

A substring is a contiguous sequence of characters within the string. Thus, the longest substring is the parent string itself and the smallest substring is the empty string. For a string "abaca", some substrings can be "", "a", "aba","ba","abaca".

  • Total number of non-empty substrings for a string of length n is calculated by n+(n-1)+(n-2)+...+1 = n*(n+1)/2 .

So for the string "abaca", the count of non-empty substrings is equal to 5+4+3+2+1 = 5*(5+1)/2 = 15 .

Here, we have to minimize the count of non-overlapping substrings for the given input strings, such that for every substring all characters are unique to that substring.

Points to remember:

--> The non-overlapping substrings mean that no two substrings should share the same character identified by its position in the input string rather than by its value.

--> To minimize the count of required substrings, we have to maximize the length of the substring so that every substring should try to cover most of the unique and contiguous characters within it.

Steps to solve:

Now, let us display the input string s as its characters at their index values.

0123456
abacaba
  • Here, we can start from the index 0 and initialize the substring counter by 1 ( because at least 1st character will form a substring of a unique character).

  • Now we will iterate through the characters one by one and consider each character to be a part of the current substring until a repeated character is encountered.

  • Just when we encounter a repeated character, we increment the counter by 1 and new substring will start from this position.

  • Thus, following the same procedure till the last character of the input string, the counter will store the minimized count of the required substrings.

  • The answer for the above example will be 4. See the below table...

IndexCharacterRepetitionSubstring counterCurrent Substring
0aNo1a
1bNo1ab
2aYes2a
3cNo2ac
4aYes3a
5bNo3ab
6aYes4a

Requirements:

  1. An extra space to store the characters for every substring to check the repeated occurrence.

  2. A counter variable to store the number of substrings.

Average Approaches:

--> To count the substrings we have to iterate through every character of the input string at least once. So the best Time Complexity will be O(n), where n=length of the input string.

--> To store the characters for every substring, we need a Data Structure. Since the longest substring possible is the input string itself, the maximum extra space we need would be of a size equal to the length of the input string. So the worst Space Complexity will be O(n).

--> We can use an Array or a HashSet to store the occurrences of the unique characters.

  • Array Solution: In the case of an Array, it will be of size=26, where 0th index represents the character a and 25th index represents the character z. So for all substrings, the size of the array is independent of the length of the input string and remains constant i.e., 26. Therefore, Space Complexity = O(26), which is a constant but still uses some extra space and also has some redundant space.
    It is faster than the HashSet approach.

  • HashSet Solution: In the case of HashSet, the maximum number of characters that it will store at some point in time would be equal to the count of lowercase characters i.e., 26. Therefore, Worst Space Complexity = O(26) but can be less than that depending upon the count of unique characters in the input string. It is better than the Array approach for spacing because it would not have redundant space.

  • Best Solution: We can optimize the extra space by using a Binary Storage Structure implemented using Bit Manipulation as explained below...

Best approach: Bit Manipulation

So now the main goal is to optimize the extra space. Concluding from the Array approach, we have to store only 26 characters at any point in time.

An Integer has 64 bits in its binary number representation. For example:

  • 1 = 00000000...0001

  • 3 = 00000000...0011

  • 14 = 00000000...1110

Bit value = SET (1) or UNSET (0)
The bit position having SET value (i.e., 1) contributes to the value of the total number.
--> Least Significant Bit (LSB) = Right Most Bit
--> Most Significant Bit (MSB) = Left Most Bit

We can use the most right 26 bits out of the 64 bits of an Integer to store the occurrence of the characters in a similar way as we stored these in the Array.
* Index 0 of the Array --> represents a --> 1st bit from right
* Index 1 of the Array --> represents b --> 2nd bit from right
*
*
* Index 25 of the Array --> represents z --> 26th bit from right

To mark the occurrence of a character, we have to SET the bit at a position corresponding to the character in the Storage Variable.
Hence, while iteration through the input string, if the bit corresponding to the current character is already 1, that means the character has already occurred earlier.

For example: looking at 1001, we can conclude that characters a and d have occurred but b and c have not occurred yet.

Thus, we have to use extra space for only a single Integer type variable which is less than that in the Array approach. Therefore Space Complexity = O(1).

How to check if the kth bit of a number N is SET or not?
--> if N & (1<<(k-1)) == 0: kth bit is not set i.e., it is 0
--> else: kth bit is set i.e., it is 1

How to SET or UNSET the kth bit of a number N?
--> SET: N | (1<<(k-1)) ; Using OR, Left Shift operators
--> UNSET: N & (~(1<<(k-1)) ) ; Using AND, NOT, Left Shift operation

Algorithm Flow:

  • Initialize the storage variable of Integer type with 0 and a counter variable to count the substrings with 1.

  • Loop through the input string, starting at 0th Index and ending at the last index.

  • For each character check if the corresponding bit in the storage variable is 0 or 1.

  • If 0 --> character has not occurred yet --> SET the bit and move to the next iteration

  • If 1 --> character has already occurred --> Increment the counter by 1, re-initialize the storage variable with 0 and then SET the corresponding bit.

  • After the loop ends, the counter variable returns the desired minimum count of non-overlapping substrings with unique characters.

Code Implementation:

// Bit Manipulation Approach 
// Time: O(n), Space: O(1)
// Best Approach for both time and space
public int partitionString(String s) {
        int counter=1;
        int store=0;
        for(int i=0;i<s.length();i++){
            int val=s.charAt(i)-'a';
            if((store & (1<<val)) != 0){
                counter++;
                store=0;
            }
            store = (store | (1<<val));
        }
        return counter;
    }
//Average Approaches
// Array Approach
// Time: O(n) , Space: O(26) 
// Has redundant space but better for time
public int partitionString(String s) {
        int count = 0 ;
        for(int i = 0 ; i < s.length();i++){
            boolean[] add = new boolean[26];
            while(i<s.length()){
                if(add[s.charAt(i)-'a'])
                    break;
                add[s.charAt(i)-'a'] =true;
                 i++;
                }
                i--;
               count++;
        }
        return count;
    }
//******************************************************
// HashSet Approach
// Time: O(n) , Space: O(26) 
// Better for space but not good for time 
public int partitionString(String s) {
        int subs = 1;
        Set<Character> charSet = new HashSet<>();
        for(char ch: s.toCharArray()){
            if(!charSet.contains(ch)){
                charSet.add(ch);
                continue;
            }
            subs++;
            charSet.clear();
            charSet.add(ch);
        }
        return subs;
    }

Conclusion:

  • Bit Manipulation Approach is the best approach to solve the given problem with Time Complexity as O(n) and Space Complexity as O(1).

  • A binary representation of an integer can be used to store the occurrence of characters precisely.

Footnote Message:

The content of this blog is based on my learnings from the Data Structures and Algorithm practice. I have written it up to the best of my knowledge and conscience.
In case of any wrong or incomplete information, I request the visitors to put some light on it and help me to become better than my previous self.
Thank you in advance! ๐Ÿ™

ย