Python script to get all the distinct substrings from a string


Recently I came across an interview question:

Print out all the substrings of a given string.?

Now, the catch in the question is for strings like “ababc”. If you are blindly adding the substrings to a storage you have, you will end up with copies of “ab”, which is not what we want. Now, what is the best method to avoid duplicates..??
Ans: hash tables.. 😉

Thus, using the power of Python (i.e. the sets available in python) I have written the following script which takes the string as an input and returns the set of the substrings.

def substr(string):
    j=1
    a=set()
    while True:
        for i in range(len(string)-j+1):
            a.add(string[i:i+j])
        if j==len(string):
            break
        j+=1
        #string=string[1:]
    return a

Also, to print the set sequentially, we can easily convert the returned set to an ordered list. For example, here I have sorted the list based on the length of the substrings.

def prnt(seta):
    l=list(seta)
    l.sort(key=lambda(x):len(x))
    print l
    ## return l # if you want to return this object too

The normal execution on the commandline looks like this:

>>> x
 set(['a', 'ayus', 'h', 'yus', 'ayush', 'us', 'ush', 'ay', 's', 'sh', 'u', 'yush', 'y', 'yu', 'ayu'])
>>> prnt(x)
 ['a', 'h', 's', 'u', 'y', 'us', 'ay', 'sh', 'yu', 'yus', 'ush', 'ayu', 'ayus', 'yush', 'ayush']

Complexities:
time O(n**2) ## worst but this method provides us with the substrings also
memory O(2**n) ## worst when all characters in the string are different

Advertisements
Explore posts in the same categories: Python

Tags: , , ,

You can comment below, or link to this permanent URL from your own site.

3 Comments on “Python script to get all the distinct substrings from a string”

  1. wehnsdaefflae Says:

    you know you can also do this with set compehension. for example like that:

    test_set = {‘a’, ‘ayus’, ‘h’, ‘yus’, ‘ayush’, ‘us’, ‘ush’, ‘ay’, ‘s’, ‘sh’, ‘u’, ‘yush’, ‘y’, ‘yu’, ‘ayu’}
    new_set = {x[i:j] for x in test_set for i in range(len(x)) for j in range(len(x)+1) if len(x[i:j]) > 0}
    print new_set

    # returns set([‘a’, ‘ayus’, ‘h’, ‘yus’, ‘ayush’, ‘us’, ‘ush’, ‘y’, ‘s’, ‘sh’, ‘u’, ‘yush’, ‘ay’, ‘yu’, ‘ayu’])


  2. A much cleaner and probably more efficient solution must I suggest:

    “`def get_all_substrings(text):
    length = len(text)
    return {text[i:j+1] for i in range(length) for j in range(i, length)}“`


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: