Let us say we have a text file and we need to annotate it with dbSNP vcf to get rsIDs. We will use an example text file as follows and down load the text file from here:

Now we have starting coordinate in 2nd column, end coordinate in 3rd column. Chromosome names in 1st column do not match the chromosome name convention followed by dbSNP. Instead of chromosome 1, dbSNP uses "1" as chromosome name. We will remove it while we run the script. In addition, bedtools expects the file to be in bed format. Bed format is zero based where as vcf is 1 based. One must be careful when matching such files together. Please note that reference file used in the note is an example file. You need to download dbSNP vcf files for appropriate genome builds.

To get only rsIDs, one needs some code writing and run it. Instead what we would do here is:

  1. Convert text to bed
  2. Use bedtools to intersect reference vcf (dbSNP vcf) 
  3. Get all the records from dbSNP and input file. Then we can filter in whatever way we want.

Above steps can be done in a single line (download the reference file from here):

================================================================

$ sed "s/^chr//g" test.txt |  awk -v OFS="\t" '{$2=$2-1;print $1,$2,$3,$4}' |  bedtools  intersect -wb -wa -a stdin -b ref.sorted.vcf.gz

================================================================

output is as follows:

==================================================================
1    207679306    207679307    pan1    1    207679307    rs4844600    A    C,G    .    .    RS=4844600;RSPOS=207679307;dbSNPBuildID=111;SSR=0;SAO=0;VP=0x050328000b05110536000100;GENEINFO=CR1:1378;WGT=1;VC=SNV;PM;PMC;S3D;SLO;NSM;REF;SYN;ASP;G5;HD;GNO;KGPhase1;KGPhase3;CAF=0.1496,.,0.8504;COMMON=1;TOPMED=0.16070177115188583,0.00227765035677879,0.83702057849133537
1    207684191    207684192    pan1    1    207684192    rs12037841    T    G    .    .    RS=12037841;RSPOS=207684192;dbSNPBuildID=120;SSR=0;SAO=0;VP=0x050100080005110136000100;GENEINFO=CR1:1378;WGT=1;VC=SNV;SLO;INT;ASP;G5;GNO;KGPhase1;KGPhase3;CAF=0.06649,0.9335;COMMON=1;TOPMED=0.12155963302752293,0.87844036697247706
1    207685785    207685786    pan1    1    207685786    rs4266886    T    C    .    .    RS=4266886;RSPOS=207685786;dbSNPBuildID=111;SSR=0;SAO=0;VP=0x050100080005110536000100;GENEINFO=CR1:1378;WGT=1;VC=SNV;SLO;INT;ASP;G5;HD;GNO;KGPhase1;KGPhase3;CAF=0.1426,0.8574;COMMON=1;TOPMED=0.16106810652395514,0.83893189347604485
1    207685964    207685965    pan1    1    207685965    rs4562624    A    C,T    .    .    RS=4562624;RSPOS=207685965;dbSNPBuildID=111;SSR=0;SAO=0;VP=0x050100080005110136000100;GENEINFO=CR1:1378;WGT=1;VC=SNV;SLO;INT;ASP;G5;GNO;KGPhase1;KGPhase3;CAF=0.06669,0.9333,.;COMMON=1;TOPMED=0.12107384046890927,0.87892615953109072,.
1    207692048    207692049    pan1    1    207692048    rs386638846    CATC    CGTT    .    .    RS=386638846;RSPOS=207692049;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000080005000002000800;GENEINFO=CR1:1378;WGT=1;VC=MNV;INT;ASP
1    207692048    207692049    pan1    1    207692049    rs6656401    A    G,T    .    .    RS=6656401;RSPOS=207692049;dbSNPBuildID=116;SSR=0;SAO=0;VP=0x05016808000511053e030100;GENEINFO=CR1:1378;WGT=1;VC=SNV;PM;PMC;SLO;INT;ASP;G5;HD;GNO;KGPhase1;KGPhase3;MTP;OM;CAF=0.06669,0.9333,.;COMMON=1;TOPMED=0.09828937308868501,0.87834480122324159,0.02336582568807339

==================================================================

Output will have input columns (in bed format) and all the matching output from dbSNP VCF. Now we can extract information we want using cut, awk or any other column based extraction tool.