Some times, we may need to replace headers of fasta files in part. Seqkit allows user to replace either sequence or headers in fasta file. Current blog note, deals with replacing headers in part i.e partial editing of headers.
Example input:
==================================================
>123456789.1
AGCT
>123456789.2
AGCT
>222221122.1
AGCT
==================================================
User wants to replace every thing before dot(.) and append every thing after dot including dot post replacement.Value key pair and expected output is as below:
==================================================
123456789   abcde
222221122   ghijk
==================================================
Expected output:
==================================================
>abcde.1
AGCT
>abcde.2
AGCT
>ghijk.1
AGCT
===================================================
code:
===================================================
$ seqkit replace --quiet -p '([0-9]+)(\.[0-9])' -r '{kv}${2}'  -k ids.txt test.fa
>abcde.1
AGCT
>abcde.2
AGCT
>ghijk.1
AGCT
=========================================================
Explanation:
  1. seqkit, by default, replaces headers
  2. --quiet is not to print errors, logs etc onto screen
  3. -p is option for pattern
  4. ([0-9]+)(\.[0-9]) - Two pattern captures (within normal brackets). First pattern ([0-9]+) catches multiple numbers (first part of headers- before dot) and second pattern catches (\.[0-9]) one dot and one number between 0 and 9
  5. By default, KV (key-value) pair replaces first pattern
  6. In addition, we added second pattern to append to replaced value (${2} denotes second pattern).