There are times that one needs to get coordinates, variant effect and clinical significance of rsIDs from dbSNP. This is useful because post annotation of sample variants, user gets matching rsIDs. User wants to know the effect of variation and it's clinical significance. However, please remember that tools that match sample variants with dbSNP variants also provide clinical significance and other relevant information as well.
Let us start with rs IDs 123 and 150. We would like to know their coordinates from dbSNP, their location and clinical significance. We will be using biomaRt package in R to do this. Following is the code:
============================================
rsid=c("rs123","rs150")
library(biomaRt)
ensembl_snp=useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")
getBM(attributes=c("refsnp_source",'refsnp_id','chr_name','chrom_start','chrom_end',"consequence_type_tv", "clinical_significance"), filters = 'snp_filter', values = rsid,mart = ensembl_snp)
============================================================
1) first line is list of the rsIDs which we will be using for query
2) Second line loads package biomaRt in R
3) Third line stores the meta information. We are instructing the program to use ENSEMBL_MART_SNP and dataset "hsapiens_snp" in the mart. Each mart can have multiple data set. For biomaRt to work, we need to choose a mart and a dataset within that mart and we are storing that information in an object called "ensembl_snp"
4) In fourth line, we are extracting, SNP source (i.e dbSNP), chromosome, start and end of the variants, consequence i.e location of the variant, and it's clinical significance. What we want to extract are called attributes. The query variations are stored in an object called "rsid" and they are passed to values. The nature of values is passed as filters i.e what the values are. 'snp_filter' is the internal reference to rs IDs (rs 123, rs 153 etc). Following example would make it clear: values = c("John", "Jane", "Bob") and filters="name". John, Jane and Bob are names. Like wise, we are telling the program rsIDs are snp_filter (internal name)
output is:
==============================
>getBM(attributes=c("refsnp_source",'refsnp_id','chr_name','chrom_start','chrom_end',"consequence_type_tv", "clinical_significance"), filters = 'snp_filter', values = rsid,mart = ensembl_snp)
refsnp_source refsnp_id chr_name chrom_start chrom_end consequence_type_tv
1 dbSNP rs123 7 24926827 24926827 intron_variant
2 dbSNP rs150 7 24971033 24971033 intron_variant
clinical_significance
1 NA
2 NA
==============================
Let us start with rs IDs 123 and 150. We would like to know their coordinates from dbSNP, their location and clinical significance. We will be using biomaRt package in R to do this. Following is the code:
============================================
rsid=c("rs123","rs150")
library(biomaRt)
ensembl_snp=useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")
getBM(attributes=c("refsnp_source",'refsnp_id','chr_name','chrom_start','chrom_end',"consequence_type_tv", "clinical_significance"), filters = 'snp_filter', values = rsid,mart = ensembl_snp)
============================================================
1) first line is list of the rsIDs which we will be using for query
2) Second line loads package biomaRt in R
3) Third line stores the meta information. We are instructing the program to use ENSEMBL_MART_SNP and dataset "hsapiens_snp" in the mart. Each mart can have multiple data set. For biomaRt to work, we need to choose a mart and a dataset within that mart and we are storing that information in an object called "ensembl_snp"
4) In fourth line, we are extracting, SNP source (i.e dbSNP), chromosome, start and end of the variants, consequence i.e location of the variant, and it's clinical significance. What we want to extract are called attributes. The query variations are stored in an object called "rsid" and they are passed to values. The nature of values is passed as filters i.e what the values are. 'snp_filter' is the internal reference to rs IDs (rs 123, rs 153 etc). Following example would make it clear: values = c("John", "Jane", "Bob") and filters="name". John, Jane and Bob are names. Like wise, we are telling the program rsIDs are snp_filter (internal name)
output is:
==============================
>getBM(attributes=c("refsnp_source",'refsnp_id','chr_name','chrom_start','chrom_end',"consequence_type_tv", "clinical_significance"), filters = 'snp_filter', values = rsid,mart = ensembl_snp)
refsnp_source refsnp_id chr_name chrom_start chrom_end consequence_type_tv
1 dbSNP rs123 7 24926827 24926827 intron_variant
2 dbSNP rs150 7 24971033 24971033 intron_variant
clinical_significance
1 NA
2 NA
==============================