This is the Chapter 7 of Py4Bio.
You’ve used several functions already.
>>> len("ATGGTCA")
7
>>> abs(-6)
6
>>> float("3.1415")
3.1415000000000002
What are functions?
A function is a code block with a name.
>>> def hello():
... print("Hello, how are you?")
Functions start with ‘def’
Then the name. This function is named ‘hello’.
Then the list of parameters.
The parameters are always listed in parenthesis. There are no parameters in this hello() function. So the parameter list is empty.
A colon. A function definition starts a new code block.
The definition line must end with a colon (the “:”)
Just like the ‘if’, and ‘for’ statements.
Then the code block.
These are the statements that are run when the function is called. They can be any Python statement (print, assignment, if, for, open, …)
How to call a function? Start with the name of the function.
In this case the name is “hello”. The parameters are always listed in parenthesis.
There are no parameters for this function, so the parameter list is empty.
And the function runs:
>>> hello()
Hello, how are you?
Arguments and Parameters
Two sides of the same idea. Most of the time you don’t want the function to do the same thing over and over. You want it to run the same algorithm using different data. For example, we can update the previous hello() function to do something like this:
Hello, <insert name here>
Say “Hello” followed by the person’s name. In math, we say “the function is parameterized by the person’s name”
>>> def hello(name):
... print("Hello", name)
...
The function now takes one parameter. When the function is called this parameter will be accessible using the variable named name.
The function call now needs one argument. Here I’ll use the string “Andrew”.
The function call assigns the string “Andrew” to the variable “name” then does the statements in the code block.
>>> hello("Andrew")
Hello Andrew
Multiple parameters
Here’s a function which takes two parameters and subtracts the second from the first.
>>> def subtract(x, y):
... print(x-y)
...
>>> subtract(8, 5)
3
Returning values
Rarely do functions only print.
More often the function does something and the results of that are used by something else.
For example, len
computes the length of a string or list then returns that value to the caller.
subtract()
doesn’t return anything.
By default, a function returns the special value None
.
>>> x = subtract(8, 5)
3
>>> print(x)
None
The return
statement
The return statement tells Python to exit the function and return a given object.
>>> def subtract(x, y):
... return x-y
...
>>> x = subtract(8, 5)
>>> print(x)
3
You can return anything (list, string, number, dictionary, even a function).
Why use functions?
You want to break your computational problem into individual parts. Solve them separately. Functions help to codify these smallest part.
Let’s count some letters!
seq = "ATGCATGATGCATGAAAGGTCG"
counts = {}
for base in seq:
if base not in counts:
counts[base] = 1
else:
counts[base] = counts[base] + 1
for base in counts:
print(base, “=”, counts[base])
Can you identify which part of the code here should be converted into a function?
I’m going to make a function which counts bases.
What’s the best part to turn into a function?
In this example the sequence can change. That makes seq a good choice as a parameter.
seq = "ATGCATGATGCATGAAAGGTCG"
This is the part of your program which does something:
counts = {}
for base in seq:
if base not in counts:
counts[base] = 1
else:
counts[base] = counts[base] + 1
Identify the output: The output will use the data computed by your function…
for base in counts:
print base, “=”, counts[base]
… which helps you identify the return value
Name the function
First, come up with a good name for your function.
It should be descriptive so that when you or someone else sees the name then they have an idea of what it does.
Good names: count_bases, count_letters, countbases
Bad names: do_count, count_bases_in_sequence, CoUnTbAsEs, QPXT
Start with the ‘def’ line. The function definition starts with a ‘def’:
def count_bases(seq):
The function is named ‘count_bases’. It takes one parameter, which will be accessed using the variable named ‘seq’. Remember, the def line ends with a colon.
Add the code block:
def count_bases(seq):
counts = {}
for base in seq:
if base not in counts:
counts[base] = 1
else:
counts[base] = counts[base] + 1
Return the results:
def count_bases(seq):
counts = {}
for base in seq:
if base not in counts:
counts[base] = 1
else:
counts[base] = counts[base] + 1
return counts
Use the function:
input_seq = “ATGCATGATGCATGAAAGGTCG”
results = count_bases(input_seq)
for base in results:
print(base, “=”, counts[base])
Notice that the variables for the parameters and the return value don’t need to be the same!
def count_bases(seq):
counts = {}
for base in seq:
if base not in counts:
counts[base] = 1
else:
counts[base] = counts[base] + 1
return counts
input_seq = “ATGCATGATGCATGAAAGGTCG”
results = count_bases(input_seq)
for base in results:
print(base, “=”, counts[base])
Interactively in the Python interpreter:
>>> def count_bases(seq):
... counts = {}
... for base in seq:
... if base not in counts:
... counts[base] = 1
... else:
... counts[base] = counts[base] + 1
... return counts
...
>>> count_bases("ATATC")
{'A': 2, 'C': 1, 'T': 2}
>>> count_bases("ATATCQGAC")
{'A': 3, 'Q': 1, 'C': 2, 'T': 2, 'G': 1}
>>> count_bases("")
{}
Functions can call functions
>>> def gc_content(seq):
... counts = count_bases(seq)
... return (counts["G"] + counts["C"]) / float(len(seq))
...
>>> gc_content("CGAATT")
0.333333333333
Functions can be used (almost) anywhere.
In an ‘if’ statement:
>>> def polyA_tail(seq):
... if seq.endswith("AAAAAA"):
... return True
... else:
... return False
...
>>> if polyA_tail("ATGCTGTCGATGAAAAAAA"):
... print("Has a poly-A tail")
...
Has a poly-A tail
>>>
In an ‘for’ statement:
>>> def split_into_codons(seq):
... codons = []
... for i in range(0, len(seq)-len(seq)%3, 3):
... codons.append(seq[i:i+3])
... return codons
...
>>> for codon in split_into_codons("ATGCATGCATGCATGCATGC"):
... print("Codon", codon)
...
Codon ATG
Codon CAT
Codon GCA
Codon TGC
Codon ATG
Codon CAT
Exercise 1: Make a function to add two numbers. Use the following as a template for your program:
def add(a, b):
# ... your function body goes here
Test it:
print("2+3 =", add(2, 3))
print("5+9 =", add(5, 9))
Exercise 2: Modify your program from Exercise A to add three numbers. Use the following as a template for your new program:
def add3 # you must finish this line
# then fill in the body
print("2+3+4 =", add(2, 3, 4))
print("5+9+10 =", add(5, 9, 10))
Exercise 3: Write a program which will ask the user to enter two sequences one by one (eg. seq1 and seq2). Then write a function to compare their GC-content. Print which sequence have the higher GC-content.