Archive for the ‘Python’ category

Threading in Python

April 15, 2012

Recently in my effort to learn something new in Python, I thought of having a small introduction to threading in python.

The following modules are related to python that come in default installation in python:

From Python Docs:

The thread module has been renamed to _thread in Python 3.0. The 2to3 tool will automatically adapt imports when converting your sources to 3.0; however, you should consider using the high-level threading module instead.

Thus, it can be assumed that when developing scripts that may use threading, always use the threading module rather than thread.

To save myself some time, it would be better if you can read the basic concepts of threads from wiki itself.

Now, comes the first program.

#!/usr/bin/env python

import time
import thread

def myfunction(string, sleeptime, max_count, *args):

    counter = 0
    ## To manage I/O
    while counter < max_count:
        print "{}. {}".format(counter, string)
        counter += 1
        #sleep for a specified amount of time.

if __name__=="__main__":

    print "thread Started : {}".format(thread.start_new_thread(myfunction,("Thread No:1", 2, 10)))
##    thread.exit_thread
    ## this can be omitted
    while 1:


In the above script, a new thread is started using the function myfunction. The arguments to the function are passed to the start_new_thread() using a tuple (do remember to make a tuple from the arguments you want to pass). The start_new_thread() returns the thread identifier of the thread started (which has been printed here).
A very usual thing I noticed in threaded programs is the use of time.sleep(), it helps in synchronizing the input output on the terminal. In actual backend scripts, the sleep function would not prove useful (I may be wrong!)

The last example was just for introduction. To jump up the level, let’s calculate the Fibonacci series from a thread.


#Fibonacci threader

import time, thread, threading

def fib(n):
    a, b = 0, 1
    while a</pre>

The above script uses both <a title="Python Docs" href="">thread</a> and <a title="Python Docs" href="">threading</a> module. The thread module is used to create threads and threading module is used to get information on the current running threads in the process.

Here the function fib(n) is actually a <a title="Python Docs" href="">generator</a> and returning a iterator (returning a <a title="Python Docs" href="">generator iterator</a>). Thus we are able to iterate over the Fibonacci numbers using these generators. After the required number of Fibonacci numbers have been generated, <a title="Python Docs" href="">thread.exit_thread() </a>is called which exits the running thread silently.

After creating the thread the script prints the information of the running threads. (Execute to see)

In the end, I would be showing you the code for the (quite famous) <a title="Wikipedia" href="">Consumer-Producer problem</a> which would include the code for using locks.


#!/usr/bin/env python

import time
import thread

## Implementing consumer-producer problem using threads and locks

product = []

def producer(lock, produce_time, lim, *args):

    pr_val = 0

    while True:

        print "Producing.."
        print "Produced {}".format(pr_val)

        print "P: Lock ACK"
        print "Added product {}".format(pr_val)
        print "P: Release ACK"

        pr_val += 1

        if pr_val > lim:

def consumer(lock, consume_time, waiting_time, lim, *args):

    con_val = 0
    got_product = False

    while True:

        print "C: Lock ACK"

            con_val = product.pop()
            print "Retrieved value {}".format(con_val)
            got_product = True
        except IndexError:
            print "No produce!"
            got_product = False

        print "C: Release ACK"

        if got_product:
            print "Consuming.. {}".format(con_val)
            print "Waiting for produce"

        if con_val == lim:

if __name__=="__main__":

    max_produce = 3
    thread.start_new_thread(producer,(lock, 1, max_produce))
    thread.start_new_thread(consumer,(lock, 2, 1, max_produce))

    # Required for commandline output
    while 1:


The above code creates a lock to be used by the consumer and producer to acquire the produce-line. Then we define the units to be produced. When the threads are started, the consumer and producer are also told the consume-time and produce-time as arguments (these values are implemented in the program using the time.sleep() function).

The consumer thread starts with acquiring the lock on the produce-line and then taking the product from it. If there is no produce yet, it prints an error message. Else, the produce is picked and the lock released. The consumer then consumes the produce and go backs to the start of the loop.

The producer thread starts by producing an item. Then it acquires the lock on the produce-line and adds the produce to it. Then it releases the lock and starts reproducing.

What I have not covered: The threads can also be created and defined using classes. I could not cover that in this post. A good resource of it can be from IBM and devshed.

Github repository for it :

Also, python (CPython actually) is known to be not very good at threading because of GIL (Global Interpreter Lock) on all the data. Google for more information. 😛


Python script to bring all files from subfolders to main folder

January 3, 2012

A usual plight with me is bringing all photos I transfer from my phone to one folder. My phone transfer creates a subfolder for each date it has a pic for.

This was till I discovered the module shutil. (No, I knew about it for around _4 months, but I was too lazy to actually write the code 😛 )

This code I have written is meant to be cross-platform. In case of any discrepancies, please do tell.

import shutil
import os

# copy all the files in the subfolders to main folder

# The current working directory
dest_dir = os.getcwd()
# The generator that walks over the folder tree
walker = os.walk(dest_dir)

# the first walk would be the same main directory
# which if processed, is
# redundant
# and raises shutil.Error
# as the file already exists

rem_dirs =[1]

for data in walker:
for files in data[2]:
shutil.move(data[0] + os.sep + files, dest_dir)
except shutil.Error:
# still to be on the safe side

# clearing the directories
# from whom we have just removed the files
for dirs in rem_dirs:
shutil.rmtree(dest_dir + os.sep + dirs)


Since the code is all documented, I would skip the explanation here.

Please comment, share or like the post. (Good for my writing spirit 🙂 )

Some tricks of Python

December 31, 2011

Zen of Python

First of all, see the Zen of Python 😉

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

List Comprehensions

These are one of the most smart things to happen in python. They build up a list for you pretty simply and in ONE line.

>>> x
>>> for i in range(20):
x.append(10 * i) ## append elements to list b

>>> print x ## use the list
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190]

Now this can be done (pretty easily) by:

>>> print [10 * i for i in range(20)]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190]

A better example would be

>>> [i**2 for i in xrange(10) if i%2 == 0]
[0, 4, 16, 36, 64]
## this gives a list containing squares of all even numbers in range of [0, 10)
## means including 0 and not including 10

>>> [ord(i) for i in raw_input()]
[49, 119, 101, 100, 51, 52, 50, 107]

Now, a more Memory efficient solution to above exists with generators:

>>> for j in (i**2 for i in xrange(10) if i%2 == 0):
print j


Generator Expression

The generator expression do not calculate all the values at the same time (like it was happening in list comprehensions). It calculates values as and when required.
A generator expression is very similar to list comprehensions (just a change in brackets [] -> () )

In python 2.7

dict and set comprehensions

are also provided:

>>> {i: chr(i) for i in range(48, 58)}  ## a dict formed
{48: '0', 49: '1', 50: '2', 51: '3', 52: '4', 53: '5', 54: '6', 55: '7', 56: '8', 57: '9'}
>>> {chr(i) for i in range(48, 58)}     ## a set formed
set(['1', '0', '3', '2', '5', '4', '7', '6', '9', '8'])

P.S. the above code snippets have been pasted from IDLE python shell. If similar statements are used in scripts the results are not printed.

zip function:

>>> print zip.__doc__
zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
Return a list of tuples, where each tuple contains the i-th element
 from each of the argument sequences. The returned list is truncated
 in length to the length of the shortest argument sequence.

The above function can be used to transpose a matrix.!

>>> a = [[1, 2], [4, 3], [5, 6]]
>>> zip(*a)
[(1, 4, 5), (2, 3, 6)]

The (*a) means unpack the list/tuple named a i.e. this operator is valid on both tuples and lists.
It can be used only in function call arguments.
This * is also called the

Splat Operator

😀 .
Dictionaries respond differently and have got an extra operator (**)

>>> a = {'a': 1, 'b': 2, 'c': 3}
>>> def splat(*args):
print args

>>> splat(a)
({'a': 1, 'c': 3, 'b': 2},)
>>> splat(*a)
('a', 'c', 'b')
>>> splat(**a)
Traceback (most recent call last):
File "<pyshell#75>", line 1, in <module>
TypeError: splat() got an unexpected keyword argument 'a'

>>> def splat(a, b, c):
print a, b, c

>>> splat(a)
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
TypeError: splat() takes exactly 3 arguments (1 given)

>>> splat(*a)
a c b
>>> splat(**a)
1 2 3

Ok, this is something I was mistaking from long.
When you have a list named say my_list and you need its index when looping it, you usually do

>>> my_list = [1, 4, 5, 2, 3, 6]
>>> for i in range(len(my_list)):
print i, my_list[i]

0 1
1 4
2 5
3 2
4 3
5 6

Now, this is something they say “unpythonic” (though i like it 😛 ,  even if its dirty, at least it works 😉 ).


But there is a better way to it:

>>> for index, val in enumerate(my_list):
print index, val

0 1
1 4
2 5
3 2
4 3
5 6

(Ok, now I also agree, it was “unpythonic” 😀 )


Now, remember one thing, every slicable type (list and strings to be precise) can be simply reversed by

>>> a
[1, 4, 5, 2, 3, 6]
>>> a[::-1]
[6, 3, 2, 5, 4, 1]
>>> b = "this will be reversed"
>>> b[::-1]
'desrever eb lliw siht'

Though reversed(sequence) is better 😛

>>> import timeit
>>> a = timeit.Timer('a[::-1]', 'a = [2, 4, 1, 6]')
>>> a.timeit()
>>> a = timeit.Timer('reversed(a)', 'a = [2, 4, 1, 6]')
>>> a.timeit()

itertools module

And this is for some testing purposes (my favorite module)
These ones are most used by me to generate tests. You can see other functions in the modules.

>>> import itertools as it
>>> print list(it.combinations('asd', 2))         ## all the combinations
[('a', 's'), ('a', 'd'), ('s', 'd')]
>>> print list(it.combinations_with_replacement('asd', 2)) ## combinations + repeated values
[('a', 'a'), ('a', 's'), ('a', 'd'), ('s', 's'), ('s', 'd'), ('d', 'd')]
>>> print list(it.permutations('asd', 2))         ## permutations
[('a', 's'), ('a', 'd'), ('s', 'a'), ('s', 'd'), ('d', 'a'), ('d', 's')]
>>> print list(it.product('asd', 'def'))          ## product of sequences
[('a', 'd'), ('a', 'e'), ('a', 'f'), ('s', 'd'), ('s', 'e'), ('s', 'f'), ('d', 'd'), ('d', 'e'), ('d', 'f')]

P.S. The above functions actually return an iterator (the one with next() function), but tho show the use, I have put out all the values in the form of a list.


This was one of my best finds on internet:

>>> import antigravity

Just give it a try 😉


And at last, something to laugh.. 😀

>>> from __future__ import braces
SyntaxError: not a chance (<pyshell#106>, line 1)


Do remember to comment, like or share the post. 🙂

The new dropbox API

October 22, 2011

Just going through the feeds I found out that Dropbox had given out new APIs. Luckily, they had better support for python this time.

Going along, I have written a script (or an introduction you can say) to these APIs in Python.

I have tried to document the code as much as possible.

do remember to install oauth, setuptools, simplejson as these are not included by default in Python 2.7 installation

## author: Ayush Goel
## Python 2.7 used
## get the Dropbox new API from
## mail:

## do remember to install oauth, setuptools, simplejson
## not included by default in Python 2.7 installation

# Include the Dropbox SDK libraries
from dropbox import client, rest, session

# Get your app key and secret from the Dropbox developer website

# ACCESS_TYPE should be 'dropbox' or 'app_folder' as configured for your app
sess = session.DropboxSession(APP_KEY, APP_SECRET, ACCESS_TYPE)

request_token = sess.obtain_request_token()

while True:
  url = sess.build_authorize_url(request_token)
  print "url:", url
  import webbrowser
  print "You have been redirected to the authorization page."
  print "Please do the authorization within 5 minutes else the URL would expire."
  print "Press ENTER here once you are done. To create the url again, enter any character"
  #print "Please visit this website and press the 'Allow' button, then hit 'Enter' here."
  if s=='':

# This will fail if the user didn't visit the above URL and hit 'Allow'
access_token = sess.obtain_access_token(request_token)

client = client.DropboxClient(sess)
print "linked account:", client.account_info()

## The client object is what is required for the whole
## app buildup by anyone

## anyways, lets have a look at some things of interest of
## all those tokens we just saw

## the url we produced above
print  url

# ''
print sess.API_HOST
# ''
print sess.API_VERSION
# 1
print  sess.WEB_HOST
# ''
print  sess.is_linked()
# True
print  sess.locale
# nothing None
print  sess.root
# 'sandbox'
print  sess.signature_method.get_name()

## Very important to NOTICE: every token we generated
## has two unique identifiers (key, secret)

print  request_token.key
# 'n8yyjthdgff92hvasafd1g5'
print  request_token.secret
# 'qu1dfozfgafeg1hmwrijwum'
print  request_token.verifier
# None
print  access_token.key
# 'd2rdsfjjfgzgd8hwc3j9kiiof'
print  access_token.secret
# 'wfsdfjwglhho0odek2jqby44'

print  access_token.callback_confirmed
# None
print  access_token.get_callback_url()
# None

## file_create_folder() creates a folder in the app folder given to you.
## The folder name is passed as an argument
## a dict is returned giving details of the folder
## errors are raised otherwie (see the documentation)

print client.file_create_folder("folder1")
##        {
##        u'size': u'0 bytes',
##        u'rev': u'104721f34',
##        u'thumb_exists': False,
##        u'bytes': 0,
##        u'modified': u'Fri, 21 Oct 2011 19:48:25 +0000',
##        u'path': u'/folder1',
##        u'is_dir': True,             # memory management
##        u'icon': u'folder',
##        u'root': u'app_folder',      # authotization
##        u'revision': 1
##        }

## client.put_file("C:\Users\Ayush\Desktop\dropify.txt","folder1/")
##Traceback (most recent call last):
##  File "<pyshell#31>", line 1, in <module>
##    client.put_file("C:\Users\Ayush\Desktop\dropify.txt","folder1/")
##  File "C:\Users\Ayush\Applications\python_pkg\dropbox-python-sdk-1.2\dropbox-1.2\dropbox\", line 147, in put_file
##    return RESTClient.PUT(url, file_obj, headers)
##  File "C:\Users\Ayush\Applications\python_pkg\dropbox-python-sdk-1.2\dropbox-1.2\dropbox\", line 142, in PUT
##    return cls.request("PUT", url, body=body, headers=headers, raw_response=raw_response)
##  File "C:\Users\Ayush\Applications\python_pkg\dropbox-python-sdk-1.2\dropbox-1.2\dropbox\", line 109, in request
##    raise ErrorResponse(r)
##ErrorResponse: [400] {u'path': u"Path 'C:\\Users\\Ayush\\Desktop\\dropify.txt' can't contain \\"}

## correct way of uploading a file
print client.put_file("folder1/dropify.txt",f)

##        {
##        u'size': u'34 bytes',
##        u'rev': u'204721f34',
##        u'humb_exists': False,
##        u'bytes': 34,
##        u'modified': u'Fri, 21 Oct 2011 19:57:39 +0000',
##        u'path': u'/folder1 (1)',
##        u'is_dir': False,
##        u'icon': u'page_white',
##        u'root': u'app_folder',
##        u'mime_type': u'application/octet-stream',
##        u'revision': 2
##        }

##read a file from dropbox
## we are actually reading the same file we just
## uploaded. It's a text document

print a.fileno()
# 588

## get headers of the file we are reading
## a list of tuples is returned
## difference between tuples and lists will be covered later

print a.getheaders()
##        [
##        ('content-length', '34'),
##        ('accept-ranges', 'bytes'),
##        ('server', 'dbws'),
##        ('connection', 'keep-alive'),
##        ('etag', '2n'),
##        ('pragma', 'public'),
##        ('cache-control', 'max-age=0'),
##        ('date', 'Fri, 21 Oct 2011 19:59:28 GMT'),
##        ('content-type', 'text/plain; charset=ascii')
##        ]

## the contents
## please don't do this on your "big files"
## may slow down or clog your app as memory requirements would go very high

## some additional data about the file
print a.reason
# 'OK'
print a.status
# 200
print  a.strict
# 0
print  a.version
# 11
print  a.chunk_left
print  a.chunked
# 0
print  a.begin()
# None
# ''

## another list of headers

# I am actually printing the headers beautifully 🙂
for i in s:
print "%15s%s%20s"%(i[0]," : ", i[1])

## content-length :                   34
##  accept-ranges :                bytes
##         server :                 dbws
##     connection :           keep-alive
##           etag :                   2n
##         pragma :               public
##  cache-control :            max-age=0
##           date : Fri, 21 Oct 2011 19:59:28 GMT
##   content-type : text/plain; charset=ascii

print  client.metadata('/')
##        u'hash': u'00d3e63a8e91467dddaf18d04b206e57',
##        u'thumb_exists': False,
##        u'bytes': 0,
##        u'path': u'/',
##        u'is_dir': True,
##        u'icon': u'folder',
##        u'root': u'app_folder', u
##        'contents': [
##                {
##                        u'size': u'0 bytes',
##                        u'rev': u'104721f34',
##                        u'thumb_exists': False,
##                        u'bytes': 0,
##                        u'modified': u'Fri, 21 Oct 2011 19:48:25 +0000',
##                        u'path': u'/folder1',
##                        u'is_dir': True,
##                        u'icon': u'folder',
##                        u'root': u'dropbox',
##                        u'revision': 1
##                },
##                {
##                        u'size': u'34 bytes',
##                        u'rev': u'204721f34',
##                        u'thumb_exists': False,
##                        u'bytes': 34,
##                        u'modified': u'Fri, 21 Oct 2011 19:57:39 +0000',
##                        u'path': u'/folder1',
##                        u'is_dir': False,
##                        u'icon': u'page_white',
##                        u'root': u'dropbox',
##                        u'mime_type': u'application/octet-stream',
##                        u'revision': 2
##                }
##        ],
##        u'size': u'0 bytes'

And yes, I have tested this on my machine, so I am sure it is working..

So, put your coding caps on and go get your own keys from dropbox.

And yes, don’t worry about the .key and .secret , they are scrambled and tempered with.. 😉

Retrieving files from URLs

October 20, 2011

This script was writeen by me way long back. I documented it a little so that it’s easy to understand what it’s trying to do.

## python 3.x compliant
## author: Ayush Goel

import urllib.request as ur

file_url=input('Enter the file URL you want to be downloaded: ')
file_name=input('Enter the path where you want the file to be saved(/enter): ')

if file_name=='':
## if no location provided, we get ourselves a default one

## change the location of download as suited for you
## this one worked on my Win7 machine

## try to retrieve the file using the URL

except ur.URLError:
## urls like : ""
## headers like http:// https:// are missing
print ("The URL is parsed to be incorrect.. please provide with the complete url, including the protocol name (http,https..)")

except IOError:
## the url given ain't to a file.
## It might be a forwarding URL, we would need the actual file url
print("We are facing issues with the url you provided")

I have included some error issues. If you find any, comment here or PM me.

Python script to get all the distinct substrings from a string

August 11, 2011

Recently I came across an interview question:

Print out all the substrings of a given string.?

Now, the catch in the question is for strings like “ababc”. If you are blindly adding the substrings to a storage you have, you will end up with copies of “ab”, which is not what we want. Now, what is the best method to avoid duplicates..??
Ans: hash tables.. 😉

Thus, using the power of Python (i.e. the sets available in python) I have written the following script which takes the string as an input and returns the set of the substrings.

def substr(string):
    while True:
        for i in range(len(string)-j+1):
        if j==len(string):
    return a

Also, to print the set sequentially, we can easily convert the returned set to an ordered list. For example, here I have sorted the list based on the length of the substrings.

def prnt(seta):
    print l
    ## return l # if you want to return this object too

The normal execution on the commandline looks like this:

>>> x
 set(['a', 'ayus', 'h', 'yus', 'ayush', 'us', 'ush', 'ay', 's', 'sh', 'u', 'yush', 'y', 'yu', 'ayu'])
>>> prnt(x)
 ['a', 'h', 's', 'u', 'y', 'us', 'ay', 'sh', 'yu', 'yus', 'ush', 'ayu', 'ayus', 'yush', 'ayush']

time O(n**2) ## worst but this method provides us with the substrings also
memory O(2**n) ## worst when all characters in the string are different

Windows error when installing Django

June 29, 2011

After all I started learning Django. Just installed it. The first method to check if it got installed correctly is to write at the python shell:
import django

If this statement runs without reporting any errors, then be happy, your installation is complete.

The next step is to create a project. On windows7 (I am using), when I wrote the command on commandline: startproject testproj ## should create a directory named testproj

I got the most frustrating error:

C:\Users\Ayush\Documents\Django> startproject testproject
'' is not recognized as an internal or external command,
operable program or batch file.

I was confused. I just imported django on my shell and it worked fine!. Then I started on the quest of this “”. I searched the site-packages directory of python (didn’t look deep the first time), and didn’t find anything.

Then I searched a bit on the net and someone was giving an advice to put in the scripts folder of python. Since I installing a very new version (1.3), I checked the Scripts folder and found it there. 🙂

So, anyone finding similar error, just redirect yourself to the Scripts folder (C:\Python27\Scripts\testproj) and execute the command there: startproject testproject

Though this is not the place we wanted to create our test project directory, but something is better than nothing.. 🙂

An addition: Some people might not be able to start projects with the aforementioned command. Instead try this:

>python.exe startproject testproject

My friend on WinXP solved his problem with this command.

%d bloggers like this: