Most bioinformatics data analysts come across several issues when dealing with fasta files. One such issue is change the headers of multiple fasta files (with a single sequence in each fasta file)  and headers are listed in another text file.

Examples are given below:

Folder name: test
Folder contents: test1.fasta, test2.fasta, test3.fasta so on
Each fasta file content:

$ cat test1.fasta

$ cat test1.fasta
>gene=test1
ATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAG

$ cat test2.fasta
>gene = test2
GACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTT


Now another file with headers (to be used in replacement):
$ cat Headers.txt
transcript1

transcript2 


Expected output is:
$ cat test1.fasta

>transcript1
ATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAG

$ cat test2.fasta
>transcript2
GACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTT

Expected output should not change the file name (as in test1.fasta, test2.fasta etc), but should change the headers of each file from a list of headers in another text file (headers.txt in this example).

 Code is:
$ mkdir test
$ for i in $(seq 1 $(ls *.fasta |wc -l)); do sed -n "$i"p headers.txt| 
   sed 's/^/>/'> test/$(ls *.fasta| sed -n "$i"p); cat $(ls *.fasta| sed -n "$i"p)|
   sed '1d' >>test/$(ls *.fasta| sed -n "$i"p); done
Now there are certain assumptions:
  • Order of fasta files in the directory and order of headers in headers.txt are same
  • User must be using bash shell