Biologist's bioinformatics notes

Most bioinformatics data analysts come across several issues when dealing with fasta files. One such issue is change the headers of multiple fasta files (with a single sequence in each fasta file) and headers are listed in another text file.

Examples are given below:

Folder name: test
Folder contents: test1.fasta, test2.fasta, test3.fasta so on
Each fasta file content:

$ cat test1.fasta

$ cat test1.fasta
>gene=test1 
ATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAG

$ cat test2.fasta
>gene = test2
GACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTT

Now another file with headers (to be used in replacement):
$ cat Headers.txt


transcript1


transcript2

Expected output is:

$ cat test1.fasta


>transcript1 
ATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAG

$ cat test2.fasta
>transcript2
GACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTT

Expected output should not change the file name (as in test1.fasta, test2.fasta etc), but should change the headers of each file from a list of headers in another text file (headers.txt in this example).

Code is:
$ mkdir test

$ for i in $(seq 1 $(ls *.fasta |wc -l)); do sed -n "$i"p headers.txt| 
   sed 's/^/>/'> test/$(ls *.fasta| sed -n "$i"p); cat $(ls *.fasta| sed -n "$i"p)|
   sed '1d' >>test/$(ls *.fasta| sed -n "$i"p); done

Now there are certain assumptions:

Order of fasta files in the directory and order of headers in headers.txt are same
User must be using bash shell

Recent Posts

Links

Sep 13, 2017 - Replace fasta headers from a file