In earlier note, we have created random DNA sequence and protein sequence in R. Today, we will create random sequences of DNA and protein. Store multiple random sequences, in a single fasta file, both for proteins and nucleotides. Please note that this code may be preliminary/dirty/clumsy. But works.

Logic:

1) Write a function to
     a) Take input from user for the length of the Nucleotide/Aminoacid polymer   (DNA/Peptide respectively)
     b) Create user desired length DNA/AA polymer by random sampling
     c) Repeat step b for user desired number of sequences
     d) Store them as DNAsring set
     e) Name them
     f) Print output on the screen
2) User can save the out put to a file.
3) User can save the output to hard disk as a single fasta with multiple sequences.

Creating random DNA sequences

1) Create a function to that takes length of sequence (for eg 10 mer) and number of sequences with that length (10 sequences, 10 mers).

code:

DNAseq=function(x,y){
  DNAsample=sample(DNA_BASES,x,replace = TRUE)
  DNAsample=paste(DNAsample,collapse = "")
  DNAsamples=DNAStringSet(replicate(y,DNAsample))
  names(DNAsamples)=paste("DNA",letters [1:y],"fasta", sep=".")
  print(DNAsamples)
}

2)  Create a 10 random sequence with 10 bases
code:  DNAseq(10,10) ( first 10 creates 10 mer, second 10 creates 10 sequences of 10 nt length)

Please note that these sequences may vary on your computer as these are randomly generated sequences.

3) User can save the output to a file by running following command:

code: ds.seq=DNAseq(10,10)

4) User can see the sequences in file


6) Write to the disk as a single file with 10 randomly generated 10 base nucleotide sequences.
code: writeXStringSet(ds.seq, "ds.seq.fa", format="fasta")
On local disk







Code for creating 10 random nucleotide sequences with 10 bases each:

# function to take input for length of the DNA sequence and DNA sequences to be created
DNAseq=function(x,y){
  DNAsample=sample(DNA_BASES,x,replace = TRUE)
  DNAsample=paste(DNAsample,collapse = "")
  DNAsamples=DNAStringSet(replicate(y,DNAsample))
  names(DNAsamples)=paste("DNA",letters [1:y],"fasta", sep=".")
  print(DNAsamples)
}
# rest of the code
ds.seq=DNAseq(10,10)
ds.seq
writeXStringSet(ds.seq, "ds.seq.fa", format="fasta")

 Creating random AA sequences

1) Create a function for User input with length of the peptide (AA within each peptide) and number of peptides to be created

AAseq=function(x,y){
  AAsample=sample(AA_STANDARD,x,replace = TRUE)
  AAsample=paste(AAsample,collapse = "")
  AAsamples=AAStringSet(replicate(y,AAsample))
  names(AAsamples)=paste("AA",letters [1:y],"fasta", sep=".")
  print(AAsamples)
}

2) Create 10 random AA sequences with 10 AA each
code: AAseq(10,10)

3) store the out put sequences as an object as.seq
 code: as.seq=AAseq(10,10) 

4) View the sequences and  sequences are named starting with AA.followed by letters and ends with fasta extension.

5) Write to the disk as a single file with 10 randomly generated 10 base nucleotide sequences.
code: writeXStringSet(as.seq, "as.seq.fa", format="fasta")

Code for creating 10 random peptide sequences with 10 AA each:

# random aa sequences
# function
AAseq=function(x,y){
  AAsample=sample(AA_STANDARD,x,replace = TRUE)
  AAsample=paste(AAsample,collapse = "")
  AAsamples=AAStringSet(replicate(y,AAsample))
  names(AAsamples)=paste("AA",letters [1:y],"fasta", sep=".")
  print(AAsamples)
}
# rest code as.seq=AAseq(10,10)
writeXStringSet(as.seq, "as.seq.fa", format="fasta")