Py4Bio Chapter 5: while, elif, and write

This is the Chapter 5 of Py4Bio.

So far, we have learned strings, lists, and logic-control. Let’s combine them and build upon them!

In this chapter, we’ll introduce some related topics.

We’ve learned how to find the first location of a string in another string with find.  What about finding all matches?

There is find method for strings which is useful. If you check it’s documentation:

S.find(sub [,start [,end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.

Return -1 on failure.

Experiment with find:

>>> seq = "aaaaTaaaTaaT"
>>> seq.find("T") 
4
>>> seq.find("T", 4)
4
>>> seq.find("T", 5)
8
>>> seq.find("T", 9)
11
>>> seq.find("T", 12)
-1

The only loop we’ve done so far is “for”.

But we aren’t looking at every  element in the list.

We need some way to jump forward and  stop when done.

The solution is the while statement.

There’s duplication…

Duplication is bad.  (Unless you’re a gene?)

The more copies there are the more likely some will be different than others.

Let’s initiate first assumption about pos that it is not present.

The break statement says “exit this loop immediately” instead of waiting for the normal exit.

>>> pos = -1
>>> while 1:
...   pos = seq.find("T", pos+1)
...   if pos == -1:
...     break
...   print("T at index", pos)
... 
T at index 4
T at index 8
T at index 11

break also works in the for loop.

Find the first 10 sequences in a file  which have a poly-A tail:

sequences = []
for line in open(filename):
    seq = line.rstrip()
    if seq.endswith("AAAAAAAA"):
        sequences.append(seq)
    if len(sequences) > 10:
        break

Shifting gear.

elif

Sometimes the if statement is more complex than if/else

“If the weather is hot then go to the beach.  If it is rainy, go to the movies.  If it is cold, read a book.  Otherwise play Fifa.”
if is_hot(weather):
    go_to_beach()
elif is_rainy(weather):
    go_to_movies()
elif is_cold(weather):
    read_book() 
else:
    play_fifa()

tuples

Python has another fundamental data type – a tuple.

A tuple is like a list except it’s immutable  (can’t be changed)

>>> data = ("Cape Town", 2004, []) 
>>> print(data)
('Cape Town', 2004, [])
>>> data[0]
'Cape Town'
>>> data[0] = "Johannesburg"
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment
>>> data[1:]
(2004, [])

Why tuples?

We already have a list type.  

What does a tuple add?

This is one of those deep computer science answers.

Tuples can be used as dictionary keys, because they are immutable so the hash value doesn’t change.

Tuples are used as anonymous classes and may contain heterogeneous elements.  Lists should be homogenous (eg, all strings or all numbers or all sequences or…)

String Formating

So far all the output examples used the print statement.   Print puts spaces between fields, and sticks a newline at the end.  Often you’ll need to be more precise.

Python has a new definition for the “%” operator when used with a strings on the left-hand side – “string interpolation”

The left side of a string interpolation is always a string.

The right side of the string interpolation may  be a dictionary, a tuple, or anything else.  Let’s start with the last.

The string interpolation looks for a “%” followed by a single character (except that “%%” means to use a single “%”).  That letter immediately following says how to interpret the object; %s for string, %d for number, %f for float, and a few others

Most of the time you’ll just use %s.

>>> "This is a string: %s" % "Yes, it is"
'This is a string: Yes, it is'
>>> "This is an integer: %d" % 10
'This is an integer: 10'
>>> "This is an integer: %4d" % 10
'This is an integer:   10'
>>> "This is an integer: %04d" % 10
'This is an integer: 0010'
>>> "This is a float: %f" % 9.8
'This is a float: 9.800000'
>>> "This is a float: %.2f" % 9.8
'This is a float: 9.80'

string % tuple

To convert multiple values, use a tuple on the right.

(Tuple because it can be heterogeneous)

Objects are extracted left to right.  First % gets the first element in the tuple, second % gets the second, etc.

>>> "Name: %s, age: %d, language: %s" % ("Andrew", 33, "Python")
'Name: Andrew, age: 33, language: Python'

Personally, I don’t like % formatting.

I love f-strings.

F-strings, or formatted string literals, are a way to embed expressions inside string literals, for formatting. They are prefixed with f and use curly braces {} to enclose the expressions to be evaluated.

>>> name = "Alice"
>>> age = 30
print(f"My name is {name} and I am {age} years old.")
My name is Alice and I am 30 years old.

Writing files

Opening a file for writing is very similar to opening one for reading.

>>> outfile = open("sequences_small.seq", "w")

It’s standard to open file for writing/reading using with, which handles the file and close it when done.

>>> seq_list = ["ATCG", "ATTC", "TTAT"]
>>> with open("output", "w") as file_handle:
...    file_handle.writelines(seq_list)

Exercise 1: The hydrophobic residues are [FILAPVM].

Write a program which asks for a protein sequence and prints “Hydrophobic signal” if (and only if) it has at least 5 hydrophobic residues in a row. Otherwise print “No hydrophobic signal.”

Examples:

Protein sequence? AA        
No hydrophobic signal

Protein sequence? AAAAAAAAAA
Hydrophobic signal

Protein sequence? AAFILAPILA
Hydrophobic signal

Protein sequence? ANDREWDALKE
No hydrophobic signal