This is the Chapter 5 of Py4Bio.
So far, we have learned strings, lists, and logic-control. Let’s combine them and build upon them!
In this chapter, we’ll introduce some related topics.
We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
There is find
method for strings which is useful. If you check it’s documentation:
S.find(sub [,start [,end]]) -> int
Return the lowest index in S where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
Experiment with find:
>>> seq = "aaaaTaaaTaaT"
>>> seq.find("T")
4
>>> seq.find("T", 4)
4
>>> seq.find("T", 5)
8
>>> seq.find("T", 9)
11
>>> seq.find("T", 12)
-1
The only loop we’ve done so far is “for”.
But we aren’t looking at every element in the list.
We need some way to jump forward and stop when done.
The solution is the while
statement.

There’s duplication…
Duplication is bad. (Unless you’re a gene?)
The more copies there are the more likely some will be different than others.

Let’s initiate first assumption about pos
that it is not present.
The break
statement says “exit this loop immediately” instead of waiting for the normal exit.
>>> pos = -1
>>> while 1:
... pos = seq.find("T", pos+1)
... if pos == -1:
... break
... print("T at index", pos)
...
T at index 4
T at index 8
T at index 11
A break
also works in the for loop.
Find the first 10 sequences in a file which have a poly-A tail:
sequences = []
for line in open(filename):
seq = line.rstrip()
if seq.endswith("AAAAAAAA"):
sequences.append(seq)
if len(sequences) > 10:
break
Shifting gear.
elif
Sometimes the if statement is more complex than if/else
“If the weather is hot then go to the beach. If it is rainy, go to the movies. If it is cold, read a book. Otherwise play Fifa.”
if is_hot(weather):
go_to_beach()
elif is_rainy(weather):
go_to_movies()
elif is_cold(weather):
read_book()
else:
play_fifa()
tuples
Python has another fundamental data type – a tuple.
A tuple is like a list except it’s immutable (can’t be changed)
>>> data = ("Cape Town", 2004, [])
>>> print(data)
('Cape Town', 2004, [])
>>> data[0]
'Cape Town'
>>> data[0] = "Johannesburg"
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment
>>> data[1:]
(2004, [])
Why tuples?
We already have a list type.
What does a tuple add?
This is one of those deep computer science answers.
Tuples can be used as dictionary keys, because they are immutable so the hash value doesn’t change.
Tuples are used as anonymous classes and may contain heterogeneous elements. Lists should be homogenous (eg, all strings or all numbers or all sequences or…)
String Formating
So far all the output examples used the print statement. Print puts spaces between fields, and sticks a newline at the end. Often you’ll need to be more precise.
Python has a new definition for the “%” operator when used with a strings on the left-hand side – “string interpolation”
The left side of a string interpolation is always a string.
The right side of the string interpolation may be a dictionary, a tuple, or anything else. Let’s start with the last.
The string interpolation looks for a “%” followed by a single character (except that “%%” means to use a single “%”). That letter immediately following says how to interpret the object; %s for string, %d for number, %f for float, and a few others
Most of the time you’ll just use %s.
>>> "This is a string: %s" % "Yes, it is"
'This is a string: Yes, it is'
>>> "This is an integer: %d" % 10
'This is an integer: 10'
>>> "This is an integer: %4d" % 10
'This is an integer: 10'
>>> "This is an integer: %04d" % 10
'This is an integer: 0010'
>>> "This is a float: %f" % 9.8
'This is a float: 9.800000'
>>> "This is a float: %.2f" % 9.8
'This is a float: 9.80'
string % tuple
To convert multiple values, use a tuple on the right.
(Tuple because it can be heterogeneous)
Objects are extracted left to right. First % gets the first element in the tuple, second % gets the second, etc.
>>> "Name: %s, age: %d, language: %s" % ("Andrew", 33, "Python")
'Name: Andrew, age: 33, language: Python'
Personally, I don’t like % formatting.
I love f-strings.
F-strings, or formatted string literals, are a way to embed expressions inside string literals, for formatting. They are prefixed with f
and use curly braces {}
to enclose the expressions to be evaluated.
>>> name = "Alice"
>>> age = 30
print(f"My name is {name} and I am {age} years old.")
My name is Alice and I am 30 years old.
Writing files
Opening a file for writing is very similar to opening one for reading.
>>> outfile = open("sequences_small.seq", "w")

It’s standard to open file for writing/reading using with, which handles the file and close it when done.
>>> seq_list = ["ATCG", "ATTC", "TTAT"]
>>> with open("output", "w") as file_handle:
... file_handle.writelines(seq_list)
Exercise 1: The hydrophobic residues are [FILAPVM].
Write a program which asks for a protein sequence and prints “Hydrophobic signal” if (and only if) it has at least 5 hydrophobic residues in a row. Otherwise print “No hydrophobic signal.”
Examples:
Protein sequence? AA
No hydrophobic signal
Protein sequence? AAAAAAAAAA
Hydrophobic signal
Protein sequence? AAFILAPILA
Hydrophobic signal
Protein sequence? ANDREWDALKE
No hydrophobic signal