In variant calling, variants are represented in variant call format (VCF). Let us call this as sample.vcf for further discussion. Sample vcf may contain one or more samples and may be annotated against reference vcf (reference data such as dbSNP, COSMIC, HGMD, PharmGKB vcfs). Following are few one liners that make job easier.Much of the commands are taken from bcftools manual from here.
$ bcftools annotate -c ID,INFO -a reference.vcf.gz sample.vcf.gz> output.vcf- Annotate sample vcf with reference vcf and append reference information from info column and reference ID from reference vcf.
Note: 1) Reference and Sample files must be bgzipped (gzipped by bgzip) and indexed by tabix
2) bcftools is installed
3) Output is not zipped and contains reference matching IDs in ID column and corresponding INFO information from Reference VCF.
- Do the same as above. But output is gzipped (not bgzipped)
- Count variant entries in vcf and list by type (SNV, MNV etc)
- Extract clinically significant variants from annotated vcf (after annotating with dbSNP/Clinvar with CLINSIG information)
- Same as above, but validated variants
- Print unannotated variants (i.e variants with ID as .)
- Count the variants with no ID.
- Print annotated variants (i.e variants with IDs)
$ bcftools view -k annotated.vcf.gz (or)
$ bcftools view -i 'ID!="."' annotated.vcf.gz
- Remove all IDs in ID column of the VCF
$ bcftools annotate -x ID sample.vcf.gz
- Add text "chr" to #CHROM column of vcf
$ bcftools annotate --rename-chrs <text.file> sample.vcf.gz
Note: 1) text file should have two columns, first column should contain old name and second column should contain new name.
2) Each chromosome in each row.
3) Same can be achieved by sed also:
$ sed 's/^\([1-9,X,Y,MT]\)/chr&/g' sample.vcf > output.vcf
- Random sample VCF records. There may be times you need random variants from your VCF file. For this you need to know how many you need as % (for eg 1% - 0.01 of your sample variants)
Note: This would print 10 random variants in vc format if your input file has 1000 variants.