Biologist's bioinformatics notes

Earlier posts in this blog outlined random sequence generation in R and Python/Biopython. In this note, random sequence generation for proteins is outlined. Objective of the study is to simulate protein (aminoacid) sequence after taking following inputs from user:
1) Length of the desired sequence
2) Desired number of sequences

script will take above input and stores the sequences in the same folder where code is executed with date and time stamp. Stored format is fasta.

============================================
from Bio.Alphabet import IUPAC
import random
from datetime import datetime

n=int(input("type the length of the sequence: "))
j=int(input("type the number of sequences: "))

sequnce=IUPAC.IUPACProtein.letters.upper()

# print(sequnce)
# print(list(sequnce))
# print (random.choice(sequnce))
# print (random.sample(sequnce,2))
# print (''.join(random.sample(sequnce,4)))

pfile = open("aasequence_"+datetime.now().strftime("%Y%m%d_%H%M%S")+".fa", "a")
for j in range(j):
    my_seq=''.join(random.choice(sequnce) for i in range(n))
    id="seq "+str(j+1)
    my_seq = ">"+id+"\n"+my_seq
    pfile.write(my_seq+"\n")
pfile.close()
============================================
Please note that python is particular about indentation and make sure that indentation is correct.

Recent Posts

Links

Jun 8, 2016 - Generate random amino acid sequences using python/biopython