Interface to BioMart databases (e.g. Ensembl, COSMIC ,Wormbase and Gramene)Bioconductor version: Release (3.0)In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (http://www.biomart.org). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. Examples of BioMart databases are Ensembl, COSMIC, Uniprot, HGNC, Gramene, Wormbase and dbSNP mapped to Ensembl. These major databases give biomaRt users direct access to a diverse set of data and enable a wide range of powerful online queries from gene annotation to database mining.Author: Steffen Durinck, Wolfgang HuberMaintainer: Steffen DurinckCitation (from within R, enter citation("biomaRt")):InstallationTo install this package, start R and enter:source("http://bioconductor.org/biocLite.R")biocLite("biomaRt")DocumentationTo view documentation for the version of this package installed in your system, start R and enter:browseVignettes("biomaRt")PDFR ScriptThe biomaRt users guidePDF Reference ManualDetailsbiocViewsAnnotation, SoftwareVersion2.22.0In Bioconductor sinceBioC 1.6 (R-2.1) or earlierLicenseArtistic-2.0DependsmethodsImportsutils, XML, RCurl, AnnotationDbiSuggestsannotateSystem RequirementsURLDepends On MeChIPpeakAnno, customProDB, dagLogo, domainsignatures, DrugVsDisease,Fletcher2013b, genefu, GenomeGraphs, MineICA, PSICQUIC, Roleswitch, Sushi,VegaMCImports Meaffycoretools, ArrayExpressHTS, cobindR, customProDB, DEXSeq, DOQTL,easyRNASeq, GenomicFeatures, GOexpress, Gviz, HTSanalyzeR,IdMappingRetrieval, KEGGprofile, MEDIPS, metaseqR, methyAnalysis, oposSOM,phenoTest, R453Plus1Toolbox, RNAither, SeqGSEASuggests MeBiocCaseStudies, ccTutorial, DEGreport, GeneAnswers, Genominator, h5vc,isobar, leeBamViews, massiR, MineICA, MiRaGE, oneChannelGUI, paxtoolsr,Pbase, piano, Rcade, RforProteomics, RIPSeeker, RnaSeqTutorial, rTANDEM,rTRM, ShortRead, SIM, systemPipeR, trackViewerPackage ArchivesFollow Installation instructions to use this package in your R session.Package SourcebiomaRt_2.22.0.tar.gzWindows BinarybiomaRt_2.22.0.zip (32- & 64-bit)Mac OS X 10.6 (Snow Leopard)biomaRt_2.22.0.tgzMac OS X 10.9 (Mavericks)biomaRt_2.22.0.tgzBrowse/checkout source(username/password: readonly)Package Downloads ReportDownload Statshttp://www.bioconductor.org/packages/release/bioc/html/biomaRt.html使用BiomaRt获得在线注释信息 425 七 2013 | 程序员 Tags: 教程 · 生物信息学完整的生物信息学分析步骤往往会包含注释工作。在Bioconductor中,最方便的办法是使用注释包。注释资源除了以包的形式进行封装外,还可以通过诸如BiomaRt等工具获取在线的注释数据。使用在线资源为我们提供了更加及时以及丰富的注释资源。那么,什么是BiomaRt呢?如何理解和使用BiomaRt呢?为了更好的理解和掌握biomaRt,我们可以先通过在线资源来了解一下它的原型biomart (http://www.biomart.org)。 biomart是为生物科研提供数据服务的免费软件,它为数据下载提供打包方案。它有许多成功的应用实例,比如欧洲生物信息学中心(The European Bioinformatics Institute ,EBI)维护的Ensembl数据库(http://www.ensembl.org/)就使用biomart提供数据批量下载服务, 还有COSMIC, Uniprot, HGNC, Gramene, Wormbase以及dbSNP等。我们首先点击Ensembl主页上导航菜单中的BioMart链接可以进入下图所示的页面。我们可以通过页面下方的优酷链接查看视频教程。这个页面是biomart提供的默认风格,布局分三个部分:主菜单,左侧导航条,右侧信息显示以及具体表单区。首先在页面左侧从上至下依次选择所需的数据源(dataset),过滤器(filters)以及数据组成(attributes)。之后就可以点击主菜单中的结果(Results)按钮来查看结果了。我们可以看到,在Attributes中选中的每一项都会以列名的形式显示出来。在这一页中我们可以选择格式后点击GO按钮下载。有了上面的介绍,我们就可以开始了解如何使用biomaRt软件包了。我们的任务是使用biomaRt实现基因名与Entrez Id及Ensemble ID之间的注释。来看代码:>biocLite("biomaRt") #使用bioconnductor的biocLite安装biomaRt包>library("biomaRt") #载入biomaRt包>mart<- useMart("ensembl", "hsapiens_gene_ensembl")>entrez<- c("673","7157","837")>getBM(attributes=c("entrezgene","hgnc_symbol", "ensembl_gene_id", "affy_hg_u133_plus_2"),+ filters = "entrezgene",+ values = entrez,+ mart = mart) entrezgene hgnc_symbol ensembl_gene_id affy_hg_u133_plus_21 673 BRAF ENSG00000157764 206044_s_at2 673 BRAF ENSG00000157764 236402_at3 673 BRAF ENSG00000157764 243829_at4 7157 TP53 ENSG00000141510 5 7157 TP53 ENSG00000141510 211300_s_at6 7157 TP53 ENSG00000141510 201746_at7 837 CASP4 ENSG00000196954 209310_s_at8 837 CASP4 ENSG00000196954 9 837 CASP4 ENSG00000196954 213596_at从上面的操作来看,使用biomaRt只需要两步,1,指定mart数据库,2,使用getBM获得注释。但是首先,我们如何知道有哪些服务器,以及这些服务器上哪些数据库呢?其次,我们如何获阳getBM中attributes,filters的正确设置呢?关于第一个问题,我们可以使用biomaRt中的listMarts以及listDatasets两个函数来解决。>marts<- listMarts(); head(marts) #查看当前可用的数据源 biomart version1 ensembl ENSEMBL GENES 72 (SANGER UK)2 snp ENSEMBL VARIATION 72 (SANGER UK)3 functional_genomics ENSEMBL REGULATION 72 (SANGER UK)4 vega VEGA 52 (SANGER UK)5 fungi_mart_18 ENSEMBL FUNGI 18 (EBI UK)6 fungi_variations_18 ENSEMBL FUNGI VARIATION 18 (EBI UK)>ensembl<- useMart("ensembl") #使用ensembl数据源>datasets<- listDatasets(ensembl); datasets[1:10,] #查看ensembl中可用数据库 dataset description version1 oanatinus_gene_ensembl Ornithorhynchus anatinus genes (OANA5) OANA52 tguttata_gene_ensembl Taeniopygia guttata genes (taeGut3.2.4) taeGut3.2.43 cporcellus_gene_ensembl Cavia porcellus genes (cavPor3) cavPor34 gaculeatus_gene_ensembl Gasterosteus aculeatus genes (BROADS1) BROADS15 lafricana_gene_ensembl Loxodonta africana genes (loxAfr3) loxAfr36 itridecemlineatus_gene_ensembl Ictidomys tridecemlineatus genes (spetri2) spetri27 mlucifugus_gene_ensembl Myotis lucifugus genes (myoLuc2) myoLuc28 hsapiens_gene_ensembl Homo sapiens genes (GRCh37.p11) GRCh37.p119 choffmanni_gene_ensembl Choloepus hoffmanni genes (choHof1) choHof110 csavignyi_gene_ensembl Ciona savignyi genes (CSAV2.0) CSAV2.0对于第二个问题,我们使用biomaRt中的listFilters以及listAttributes两个函数来解决。>mart<- useMart("ensembl", "hsapiens_gene_ensembl")>filters<- listFilters(mart); filters[grepl("entrez", filters[,1]),] name description38 with_entrezgene with EntrezGene ID(s)122 entrezgene EntrezGene ID(s) [e.g. 100287163]>attributes<- listAttributes(mart); attributes[grepl("^ensembl|hgnc", attributes[,1]), ] name description1 ensembl_gene_id Ensembl Gene ID2 ensembl_transcript_id Ensembl Transcript ID3 ensembl_peptide_id Ensembl Protein ID4 ensembl_exon_id Ensembl Exon ID51 hgnc_id HGNC ID(s)52 hgnc_symbol HGNC symbol53 hgnc_transcript_name HGNC transcript name134 ensembl_gene_id Ensembl Gene ID135 ensembl_transcript_id Ensembl Transcript ID136 ensembl_peptide_id Ensembl Protein ID162 ensembl_exon_id Ensembl Exon ID165 ensembl_gene_id Ensembl Gene ID166 ensembl_transcript_id Ensembl Transcript ID167 ensembl_peptide_id Ensembl Protein ID175 ensembl_gene_id Ensembl Gene ID176 ensembl_transcript_id Ensembl Transcript ID177 ensembl_peptide_id Ensembl Protein ID1616 ensembl_gene_id Ensembl Gene ID1617 ensembl_transcript_id Ensembl Transcript ID1618 ensembl_peptide_id Ensembl Protein ID1691 ensembl_gene_id Ensembl Gene ID1706 ensembl_transcript_id Ensembl Transcript ID1707 ensembl_peptide_id Ensembl Protein ID1715 ensembl_exon_id Ensembl Exon ID最后的问题是,biomaRt会被如何使用呢?我们做注释的时候,怎么就想到要使用biomaRt呢?因为在注释上,各种ID,symbol, name之间的转换都可以考虑使用biomaRt来做。更重要的是,biomaRt还会有很多SNP, alternative splicing, exon, intron, 5’utr, 3’utr等等信息。当然,只要能做也数据库并使用SQL访问的数据都可以使用biomaRt来获取。所以我们的思路可以更加发散一些。http://pgfe.umassmed.edu/ou/archives/3281