In ubuntu, chromsizes ca be fetched multiple ways:
Method 1:
In GATK pipeline, user has to provide chromosome sizes. For this, "fetchChromSizes" utility from UCSC helps. User can download it from here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
To fetch chromosome sizes for build hg38, user should run following command:
$ sh fetchChromSizes <UCSC genome build> <output_file>
Example: $ sh fetchChromSizes hg38 > hg38.sizes (for hg38).
Same information can be found by visiting the URL:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&chromInfoPage=.
For any other build, change db=hg38 with appropriate genome build (For eg. dm3 will give Drosophila genome size and url would be: http://genome.ucsc.edu/cgi-bin/hgTracks?db=dm3&chromInfoPage=).
2) Method 2:
UCSC, by default provides chromsizes from direct URL
For eg. for hg19, link would be http://genome.ucsc.edu/goldenpath/helpc and for hg 39, link would be http://genome.ucsc.edu/goldenpath/help/hg38.chrom.sizes.
toget it direct in linux, run the following command:
wget -np -nd -r http://genome.ucsc.edu/goldenpath/help/hg38.chrom.sizes
-np, -nd tells the command that do not create any additional directories, download the file to the current director and -r instructs the command to fetch the file recursively.
Interesting number of entries in each build, kept on increasing. Look at the following picture: (compares between hg18.chrom.sizes,hg19.chrom.sizes and hg38.chrom.sizes).
Method 1:
In GATK pipeline, user has to provide chromosome sizes. For this, "fetchChromSizes" utility from UCSC helps. User can download it from here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
To fetch chromosome sizes for build hg38, user should run following command:
$ sh fetchChromSizes <UCSC genome build> <output_file>
Example: $ sh fetchChromSizes hg38 > hg38.sizes (for hg38).
Same information can be found by visiting the URL:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&chromInfoPage=.
For any other build, change db=hg38 with appropriate genome build (For eg. dm3 will give Drosophila genome size and url would be: http://genome.ucsc.edu/cgi-bin/hgTracks?db=dm3&chromInfoPage=).
2) Method 2:
UCSC, by default provides chromsizes from direct URL
For eg. for hg19, link would be http://genome.ucsc.edu/goldenpath/helpc and for hg 39, link would be http://genome.ucsc.edu/goldenpath/help/hg38.chrom.sizes.
toget it direct in linux, run the following command:
wget -np -nd -r http://genome.ucsc.edu/goldenpath/help/hg38.chrom.sizes
-np, -nd tells the command that do not create any additional directories, download the file to the current director and -r instructs the command to fetch the file recursively.
Interesting number of entries in each build, kept on increasing. Look at the following picture: (compares between hg18.chrom.sizes,hg19.chrom.sizes and hg38.chrom.sizes).