Python VI: Best Practices and Testing

Today’s Goals:

Review good coding practices
Learn about assert and try/except clauses
Write a script to annotate a transcriptome

Good coding practices:

Modular coding, i.e. make each function do one thing and do it well. It’s easier to compose and test minimal, single-purpose functions.
Test a function/line of code before you write another one!
Comment a lot and add doc strings once you have a functioning function. Future you will be so happy.
Use informative names for your variables. Function names should be verbs while instances should be nouns or noun phrases. In general, avoid short abbreviations. Single letters (i, n, …) are typically reserved for counting variables.
Once you’re a seasoned python-er, keep your style up to standard by reading ‘best practices’ (eg link)

Ways to test your code:

use a reduced file
use print statements
assertions
try and except clauses
keep a log file

Once your code works:

Clean it up! Delete or comment out old print strings
Add doc strings
Test on more files.

What about unit testing?

Unless you’re developing your own software, use other types of tests. We won’t cover this today.

Examples

If you’re working with large files, use a reduced file to test your code.

# save the first 1000 lines to a new file
head -1000 largeDataFile.csv > test.csv

If you plan to automate your analyses, test each function with only one file at first.

function_name('file1.txt')

# rather than:
for file in directory_list:
	function_name(file)

As you’re writing functions, use print to check that your output is what you expect.

import os
import sys

def make_filelist(directory, file_ending):
	"""This function makes a list of files with a certain ending in a certain directory.
	"""
	
	files=os.listdir(directory)
	files_wanted=[]
	for file in files:
		if file.endswith(file_ending):
			files_wanted.append(file)
	print files
	print files_wanted
	return files_wanted

def main():
	directory = sys.argv[1]
	file_ending = sys.argv[2]
	make_filelist(directory, file_ending)

main()

If you’re working with integers or floats or are getting a TypeError from python, you can use the type() function to check that the variable is in the format you expect. If you’re having trouble indexing certain values, copy a subset into the python console and check your syntax.

Python’s assert clause allows you to test a comparison and exits the script if not true. You can have it print an error message upon exiting.

assert a == b, "Error: comparison %s == %s is false" %(value1,value2)

Example:

def sum_num(num_list):
	totalSum=0
	for item in num_list:
		assert type(item)==int, "Uh oh, item '%s' not the right type! Exiting now." %item
		totalSum+=item
	print totalSum
	
num_list=[3,52,6,'b',2,463,'a']		
sum_num(num_list)

Another example:

import os
import sys

def make_filelist(directory, file_ending):
	"""This function makes a list of files with a certain ending in a certain directory.
	"""
	
	files=os.listdir(directory)
	files_wanted=[]
	for file in files:
		if file.endswith(file_ending):
			files_wanted.append(file)
	return files_wanted

def main():
	# you can use assert statements to provide guidance for user input
	assert len(sys.argv) == 3, "To run this script you must provide a directory and file ending"
	directory = sys.argv[1]
	file_ending = sys.argv[2]
	make_filelist(directory, file_ending)
	
main()

Python’s try-except clauses allow you to trigger error messages for specific types of errors without killing the program.

def sum_num(num_list):
	totalSum=0
	for item in num_list:
		try:
			totalSum+=item
		except TypeError:
			print "Warning, item '%s' not the right type! Continuing to next item" %item
	print totalSum
		
num_list=[3,52,6,'b',2,463,'a']	
sum_num(num_list)

Print stdout to a log file when using your computer to check for errors. Everything you ‘print’ will be appended to the logfile.

python script.py > logfile

However, you will need to print directly to a logfile if you’re running the program on a cluster (in this case you will submit your file to be run on the cluster rather than running python in your shell). Either open the logfile as a global variable at the top of the file or open and close it every time you write to it.

#### global variable
import time
LOGFILE=open('jobname.log','a') 

def function_name(arguments):

	start=time.clock()
	code doing things
	and other things
	print >>LOGFILE,"%s was parsed in %fs." %(input_file,(time.clock() - start))

function_name(arguments)

#### open and close each time
import time

def function_name(arguments):

	start=time.clock()
	code doing things
	and other things
	with open('jobname.log','a') as f:
		f.write("%s was parsed in %fs." %(input_file,(time.clock() - start)))

function_name(arguments)

Now we will go through what modular coding and testing should look like.

See the files for python5. The annotate_transcriptome.py script will take a blast output file, a transcriptome, and output a subset of the transcriptome with annotations for each sequence.

[top]