Most bioinformatics data analysts come across several issues when dealing with fasta files. One such issue is change the headers of multiple fasta files (with a single sequence in each fasta file) and headers are listed in another text file.
Examples are given below:
Folder name: test
Folder contents: test1.fasta, test2.fasta, test3.fasta so on
Each fasta file content:
$ cat test1.fasta
Now another file with headers (to be used in replacement):
Expected output is:
Expected output should not change the file name (as in test1.fasta, test2.fasta etc), but should change the headers of each file from a list of headers in another text file (headers.txt in this example).
Code is:
Examples are given below:
Folder name: test
Folder contents: test1.fasta, test2.fasta, test3.fasta so on
Each fasta file content:
$ cat test1.fasta
$ cat test1.fasta
>gene=test1
ATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAG
$ cat test2.fasta
>gene = test2
GACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTT
Now another file with headers (to be used in replacement):
$ cat Headers.txt
transcript1
transcript2
Expected output is:
$ cat test1.fasta
>transcript1
ATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAG
$ cat test2.fasta
>transcript2
GACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTT
Expected output should not change the file name (as in test1.fasta, test2.fasta etc), but should change the headers of each file from a list of headers in another text file (headers.txt in this example).
Code is:
$ mkdir test
$ for i in $(seq 1 $(ls *.fasta |wc -l)); do sed -n "$i"p headers.txt|
sed 's/^/>/'> test/$(ls *.fasta| sed -n "$i"p); cat $(ls *.fasta| sed -n "$i"p)|
sed '1d' >>test/$(ls *.fasta| sed -n "$i"p); done
Now there are certain assumptions:
- Order of fasta files in the directory and order of headers in headers.txt are same
- User must be using bash shell