Generic format to store the sequences (read sequences) post sequencing is .fastq. There are several tools to parse fastq files. One of them is seqtk. Fastq formatted files store lots of useful information like sequencing machine, lane, flowcell, index sequence, quality filter, sequence, sequence quality etc. Hence before starting analysis, it is better to have a look at the fastq file. Download the example rnaseq files from here. Description about these files are provided in the same page. Please go through it to understand the source of the samples.
Print first 4 lines of fastq.gz
Print first 4 lines of fastq.gz
- $ seqkit seq hcc1395_normal_rep1_r1.fastq.gz | head -4
- $ seqkit stats hcc1395_normal_rep1_r1.fastq.gz
- $ seqkit seq hcc1395_normal_rep1_r1.fastq.gz -n
- $ seqkit seq hcc1395_normal_rep1_r1.fastq.gz -n | wc -l
- $ for i in *.gz; do echo $i; seqkit seq $i -n | wc -l; done | paste - -
- $ for i in *.gz; do echo $i; zgrep -P "^\@K00193" $i | wc -l; done | paste - -
- $ seqkit seq hcc1395_normal_rep1_r1.fastq.gz -s | head -1 | | fold -w1 | sort | uniq -c
Count frequency of bases in a fastq file:
- $ seqkit seq hcc1395_normal_rep1_r1.fastq.gz -s | fold -w1 | sort | uniq -c