Python VI: Best Practices and Testing
Today’s Goals:
- Review good coding practices
- Learn about assert and try/except clauses
- Write a script to annotate a transcriptome
Good coding practices:
- Modular coding, i.e. make each function do one thing and do it well. It’s easier to compose and test minimal, single-purpose functions.
- Test a function/line of code before you write another one!
- Comment a lot and add doc strings once you have a functioning function. Future you will be so happy.
- Use informative names for your variables. Function names should be verbs while instances should be nouns or noun phrases. In general, avoid short abbreviations. Single letters (i, n, …) are typically reserved for counting variables.
- Once you’re a seasoned python-er, keep your style up to standard by reading ‘best practices’ (eg link)
Ways to test your code:
- use a reduced file
- use print statements
- assertions
- try and except clauses
- keep a log file
Once your code works:
- Clean it up! Delete or comment out old print strings
- Add doc strings
- Test on more files.
What about unit testing?
Unless you’re developing your own software, use other types of tests. We won’t cover this today.
Examples
- If you’re working with large files, use a reduced file to test your code.
If you plan to automate your analyses, test each function with only one file at first.
- As you’re writing functions, use print to check that your output is what you expect.
If you’re working with integers or floats or are getting a TypeError from python, you can use the type() function to check that the variable is in the format you expect. If you’re having trouble indexing certain values, copy a subset into the python console and check your syntax.
- Python’s assert clause allows you to test a comparison and exits the script if not true. You can have it print an error message upon exiting.
Example:
Another example:
- Python’s try-except clauses allow you to trigger error messages for specific types of errors without killing the program.
- Print stdout to a log file when using your computer to check for errors. Everything you ‘print’ will be appended to the logfile.
However, you will need to print directly to a logfile if you’re running the program on a cluster (in this case you will submit your file to be run on the cluster rather than running python in your shell). Either open the logfile as a global variable at the top of the file or open and close it every time you write to it.
- Now we will go through what modular coding and testing should look like.
See the files for python5. The annotate_transcriptome.py
script will take a blast output file, a transcriptome, and output a subset of the transcriptome with annotations for each sequence.