The Comparative Toxicogenomics Database: update 2013

Table 1.

Increase in CTD content from 2008 to 2012

	July 2012	December 2010	December 2008
Curated data types
Articles	94 513	23 918	10 854
Chemicals	11 755	6217	4323
Genes	27 950	18 446	15 140
Diseases	5987	3703	3445
Relationships
Direct chemical–gene interactions	599 182	283 976	147 285
Direct gene–disease relationships	23 395	12 505	7456
Direct chemical–disease relationships	176 627	9264	4181
Inferred gene–disease relationships	10 132 094	1 170 317	472 423
Inferred chemical–disease relationships	913 622	284 205	117 974
Enriched chemical–GO relationships	2 221 348	1 166 669	n/a
Enriched chemical–pathway relationships	211 782	213 261	n/a
Inferred disease–pathway relationships	46 912	24 258	n/a
Gene–GO annotations^a	807 848	855 215	685 781
Gene–pathway annotations^a	63 393	55 912	45 795
Inferred disease–GO relationships	465 797	229 810	n/a
Total relationships	15 662 000	4 305 392	1 480 895

	July 2012	December 2010	December 2008
Curated data types
Articles	94 513	23 918	10 854
Chemicals	11 755	6217	4323
Genes	27 950	18 446	15 140
Diseases	5987	3703	3445
Relationships
Direct chemical–gene interactions	599 182	283 976	147 285
Direct gene–disease relationships	23 395	12 505	7456
Direct chemical–disease relationships	176 627	9264	4181
Inferred gene–disease relationships	10 132 094	1 170 317	472 423
Inferred chemical–disease relationships	913 622	284 205	117 974
Enriched chemical–GO relationships	2 221 348	1 166 669	n/a
Enriched chemical–pathway relationships	211 782	213 261	n/a
Inferred disease–pathway relationships	46 912	24 258	n/a
Gene–GO annotations^a	807 848	855 215	685 781
Gene–pathway annotations^a	63 393	55 912	45 795
Inferred disease–GO relationships	465 797	229 810	n/a
Total relationships	15 662 000	4 305 392	1 480 895

^aImported from external databases.

n/a, not available.

Link-outs and adoption of CTD content by other databases

CTD continues to expand its connectivity with external databases. We now include links on CTD Chemical pages to ChEBI (15), a dictionary of molecular entities focused on small chemical compounds; to PubChem (16), a repository of chemical compounds and their associated biological activities; and to TOXLINE (17), a bibliographic database of toxicology articles. CTD Gene pages now link to WikiGenes, an author-driven wiki system of biological information (18) and NCBI Gene (19) provides links back to CTD Gene pages. In total, CTD links out to 25 external databases from our Chemical, Gene, Disease, Organism, GO, Pathway and Reference pages (Table 2). As a federally funded public database, CTD content is often linked to, repackaged or integrated with other database products. Currently, we are aware of 37 external databases that either use CTD content at their site or link back to CTD (Table 3). This connectivity augments data access for users of both CTD and other linked resources. This interoperability and adoption of CTD data allows for cross-integration of additional information with CTD content in the future. In compliance with the bioDBcore initiative (20), the core attributes describing CTD are provided in Supplementary Table S1. CTD data files are freely available from either individual pages or our ‘Downloads’ tab (http://ctdbase.org/downloads/) in multiple formats (CSV, TSV, XML, Excel and OBO).

Table 2.

CTD’s links to external databases

CTD page	Links to	Linking URL
Chemical	CCRIS	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?CCRIS
	ChEBI	http://www.ebi.ac.uk/chebi/
	ChemIDplus	http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp
	DrugBank	http://www.drugbank.ca/
	GENE-TOX	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?GENETOX
	Household products DB	http://hpd.nlm.nih.gov/
	Hazardous substance DB	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?HSDB
	MeSH	http://www.nlm.nih.gov/mesh/
	PubChem	http://pubchem.ncbi.nlm.nih.gov/
	TOXLINE	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE
Gene	NCBI gene	http://www.ncbi.nlm.nih.gov/gene
	UniProt	http://www.uniprot.org/
	PharmGKB	http://www.pharmgkb.org/search/
	WikiGenes	http://www.wikigenes.org/
Disease	MeSH	http://www.nlm.nih.gov/mesh/
Disease	OMIM	http://www.omim.org/
Organism	NCBI taxonomy	http://www.ncbi.nlm.nih.gov/taxonomy
Gene ontology	AmiGO	http://amigo.geneontology.org/
	MGI	http://www.informatics.jax.org/searches/GO_form.shtml
	QuickGO	http://www.ebi.ac.uk/QuickGO/
	RGD	http://rgd.mcw.edu/rgdweb/ontology/search.html
	WormBase	http://www.wormbase.org/search/gene/
Pathway	KEGG	http://www.genome.jp/kegg/pathway.html
Pathway	Reactome	http://www.reactome.org/ReactomeGWT/entrypoint.html
Reference	PubMed	http://www.ncbi.nlm.nih.gov/pubmed/
Reference	DOI	http://www.doi.org/

CTD page	Links to	Linking URL
Chemical	CCRIS	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?CCRIS
	ChEBI	http://www.ebi.ac.uk/chebi/
	ChemIDplus	http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp
	DrugBank	http://www.drugbank.ca/
	GENE-TOX	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?GENETOX
	Household products DB	http://hpd.nlm.nih.gov/
	Hazardous substance DB	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?HSDB
	MeSH	http://www.nlm.nih.gov/mesh/
	PubChem	http://pubchem.ncbi.nlm.nih.gov/
	TOXLINE	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE
Gene	NCBI gene	http://www.ncbi.nlm.nih.gov/gene
	UniProt	http://www.uniprot.org/
	PharmGKB	http://www.pharmgkb.org/search/
	WikiGenes	http://www.wikigenes.org/
Disease	MeSH	http://www.nlm.nih.gov/mesh/
Disease	OMIM	http://www.omim.org/
Organism	NCBI taxonomy	http://www.ncbi.nlm.nih.gov/taxonomy
Gene ontology	AmiGO	http://amigo.geneontology.org/
	MGI	http://www.informatics.jax.org/searches/GO_form.shtml
	QuickGO	http://www.ebi.ac.uk/QuickGO/
	RGD	http://rgd.mcw.edu/rgdweb/ontology/search.html
	WormBase	http://www.wormbase.org/search/gene/
Pathway	KEGG	http://www.genome.jp/kegg/pathway.html
Pathway	Reactome	http://www.reactome.org/ReactomeGWT/entrypoint.html
Reference	PubMed	http://www.ncbi.nlm.nih.gov/pubmed/
Reference	DOI	http://www.doi.org/

Table 2.

CTD’s links to external databases

CTD page	Links to	Linking URL
Chemical	CCRIS	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?CCRIS
	ChEBI	http://www.ebi.ac.uk/chebi/
	ChemIDplus	http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp
	DrugBank	http://www.drugbank.ca/
	GENE-TOX	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?GENETOX
	Household products DB	http://hpd.nlm.nih.gov/
	Hazardous substance DB	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?HSDB
	MeSH	http://www.nlm.nih.gov/mesh/
	PubChem	http://pubchem.ncbi.nlm.nih.gov/
	TOXLINE	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE
Gene	NCBI gene	http://www.ncbi.nlm.nih.gov/gene
	UniProt	http://www.uniprot.org/
	PharmGKB	http://www.pharmgkb.org/search/
	WikiGenes	http://www.wikigenes.org/
Disease	MeSH	http://www.nlm.nih.gov/mesh/
Disease	OMIM	http://www.omim.org/
Organism	NCBI taxonomy	http://www.ncbi.nlm.nih.gov/taxonomy
Gene ontology	AmiGO	http://amigo.geneontology.org/
	MGI	http://www.informatics.jax.org/searches/GO_form.shtml
	QuickGO	http://www.ebi.ac.uk/QuickGO/
	RGD	http://rgd.mcw.edu/rgdweb/ontology/search.html
	WormBase	http://www.wormbase.org/search/gene/
Pathway	KEGG	http://www.genome.jp/kegg/pathway.html
Pathway	Reactome	http://www.reactome.org/ReactomeGWT/entrypoint.html
Reference	PubMed	http://www.ncbi.nlm.nih.gov/pubmed/
Reference	DOI	http://www.doi.org/

CTD page	Links to	Linking URL
Chemical	CCRIS	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?CCRIS
	ChEBI	http://www.ebi.ac.uk/chebi/
	ChemIDplus	http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp
	DrugBank	http://www.drugbank.ca/
	GENE-TOX	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?GENETOX
	Household products DB	http://hpd.nlm.nih.gov/
	Hazardous substance DB	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?HSDB
	MeSH	http://www.nlm.nih.gov/mesh/
	PubChem	http://pubchem.ncbi.nlm.nih.gov/
	TOXLINE	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE
Gene	NCBI gene	http://www.ncbi.nlm.nih.gov/gene
	UniProt	http://www.uniprot.org/
	PharmGKB	http://www.pharmgkb.org/search/
	WikiGenes	http://www.wikigenes.org/
Disease	MeSH	http://www.nlm.nih.gov/mesh/
Disease	OMIM	http://www.omim.org/
Organism	NCBI taxonomy	http://www.ncbi.nlm.nih.gov/taxonomy
Gene ontology	AmiGO	http://amigo.geneontology.org/
	MGI	http://www.informatics.jax.org/searches/GO_form.shtml
	QuickGO	http://www.ebi.ac.uk/QuickGO/
	RGD	http://rgd.mcw.edu/rgdweb/ontology/search.html
	WormBase	http://www.wormbase.org/search/gene/
Pathway	KEGG	http://www.genome.jp/kegg/pathway.html
Pathway	Reactome	http://www.reactome.org/ReactomeGWT/entrypoint.html
Reference	PubMed	http://www.ncbi.nlm.nih.gov/pubmed/
Reference	DOI	http://www.doi.org/

Table 3.

Databases using CTD content or providing links to CTD

Database	Description	Database URL
AutismKB	Autism knowledgebase	http://autismkb.cbi.pku.edu.cn/index.php
BIAdb	Benzylisoquinoline alkaloids database	http://crdd.osdd.net/raghava/biadb/
BioGraph	Biomedical knowledge discovery server	http://biograph.be/about/welcome
BioXM	BioXM™ Knowledge Management Environment	http://www.biomax.com/products/bioxm.php
BPAGenomics	Bisphenol A genomics data portal	http://www.eh3.uc.edu/GenomicsPortals/tiles.jsp?portal= BPAGenomics
CancerResource	Cancer-related database	http://bioinf-data.charite.de/cancerresource/index.php?site =home
Chem2Bio2RDF	Semantic system for chemical biology	http://cheminfov.informatics.indiana.edu:8080/
ChemIDplus	Chemical dictionary and structure database	http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp
ChemProt	Annotated and predicted chemical–protein interactions	http://www.cbs.dtu.dk/services/ChemProt/
ChemSpider	Chemical structures and property predictions	http://www.chemspider.com/
DDSS	Drug Discovery and Diagnostic Support System	http://www.ps.noda.tus.ac.jp/ddss/
GAD	Genetics Association Database	http://geneticassociationdb.nih.gov/
Galaxy	Web-based platform for biomedical data analysis	https://main.g2.bx.psu.edu/
GeneSetDB	Meta-database integrating human disease and pharmacology	http://www.genesetdb.auckland.ac.nz/haeremai.html
GeneWeaver	Integrates functional genomics experiments	http://geneweaver.org/
GPSy	Gene Prioritization SYstem that prioritizes genes for functional analyses	http://gpsy.genouest.org/
Harvester Portal	Aggregate portal of scientific sites	http://harvester.kit.edu/harvester/
HOMER	Human Organ-specific Molecular Electronic Repository	http://discern.uits.iu.edu:8340/Homer/index.html
MIRIAM	Pharmacogenomics data collections	http://www.ebi.ac.uk/miriam/main/tags/MIR:00600039
NCBI Gene	Gene LinkOuts	http://www.ncbi.nlm.nih.gov/gene
PharmDB	Pharmacological network database	http://pharmdb.org/
PharmGKB	PharmacoGenomics KnowledgeBase	http://www.pharmgkb.org/
PhenoHM	Human-Mouse comparative phenome-genome server	http://phenome.cchmc.org/phenoBrowser/Phenome
PPDB	Pathogenic Pathway Database for Periodontitis	http://bio-omix.tmd.ac.jp/disease/perio/
PubChem	Database of chemical molecules	http://pubchem.ncbi.nlm.nih.gov/
Reactome	Pathway database	http://www.reactome.org/ReactomeGWT/entrypoint.html
RefGene	Index of genes and antibodies	http://refgene.com/
RGD	Rat Genome Database disease and pathway portals	http://rgd.mcw.edu/rgdweb/ontology/search.html
STITCH	Search Tool for InTeractions of CHemicals	http://stitch.embl.de/
T3DB	Toxin, Toxin-Target Database	http://www.t3db.org/
ToppGene	Portal of gene information	http://toppgene.cchmc.org/
TOXLINE	Toxicology literature online	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE
TOXNET	Toxicology data network	http://toxnet.nlm.nih.gov/
UCSC	UCSC genome browser	http://genome.ucsc.edu/
UniProt	Universal Protein Resource	http://www.uniprot.org/
WENDI	Web Engine for Non-obvious Drug Information	https://cheminfov.informatics.indiana.edu:8443/WENDI_ PUBLIC/WENDI.jsp
WhichGenes	Gene-set building portal	http://www.whichgenes.org/

Database	Description	Database URL
AutismKB	Autism knowledgebase	http://autismkb.cbi.pku.edu.cn/index.php
BIAdb	Benzylisoquinoline alkaloids database	http://crdd.osdd.net/raghava/biadb/
BioGraph	Biomedical knowledge discovery server	http://biograph.be/about/welcome
BioXM	BioXM™ Knowledge Management Environment	http://www.biomax.com/products/bioxm.php
BPAGenomics	Bisphenol A genomics data portal	http://www.eh3.uc.edu/GenomicsPortals/tiles.jsp?portal= BPAGenomics
CancerResource	Cancer-related database	http://bioinf-data.charite.de/cancerresource/index.php?site =home
Chem2Bio2RDF	Semantic system for chemical biology	http://cheminfov.informatics.indiana.edu:8080/
ChemIDplus	Chemical dictionary and structure database	http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp
ChemProt	Annotated and predicted chemical–protein interactions	http://www.cbs.dtu.dk/services/ChemProt/
ChemSpider	Chemical structures and property predictions	http://www.chemspider.com/
DDSS	Drug Discovery and Diagnostic Support System	http://www.ps.noda.tus.ac.jp/ddss/
GAD	Genetics Association Database	http://geneticassociationdb.nih.gov/
Galaxy	Web-based platform for biomedical data analysis	https://main.g2.bx.psu.edu/
GeneSetDB	Meta-database integrating human disease and pharmacology	http://www.genesetdb.auckland.ac.nz/haeremai.html
GeneWeaver	Integrates functional genomics experiments	http://geneweaver.org/
GPSy	Gene Prioritization SYstem that prioritizes genes for functional analyses	http://gpsy.genouest.org/
Harvester Portal	Aggregate portal of scientific sites	http://harvester.kit.edu/harvester/
HOMER	Human Organ-specific Molecular Electronic Repository	http://discern.uits.iu.edu:8340/Homer/index.html
MIRIAM	Pharmacogenomics data collections	http://www.ebi.ac.uk/miriam/main/tags/MIR:00600039
NCBI Gene	Gene LinkOuts	http://www.ncbi.nlm.nih.gov/gene
PharmDB	Pharmacological network database	http://pharmdb.org/
PharmGKB	PharmacoGenomics KnowledgeBase	http://www.pharmgkb.org/
PhenoHM	Human-Mouse comparative phenome-genome server	http://phenome.cchmc.org/phenoBrowser/Phenome
PPDB	Pathogenic Pathway Database for Periodontitis	http://bio-omix.tmd.ac.jp/disease/perio/
PubChem	Database of chemical molecules	http://pubchem.ncbi.nlm.nih.gov/
Reactome	Pathway database	http://www.reactome.org/ReactomeGWT/entrypoint.html
RefGene	Index of genes and antibodies	http://refgene.com/
RGD	Rat Genome Database disease and pathway portals	http://rgd.mcw.edu/rgdweb/ontology/search.html
STITCH	Search Tool for InTeractions of CHemicals	http://stitch.embl.de/
T3DB	Toxin, Toxin-Target Database	http://www.t3db.org/
ToppGene	Portal of gene information	http://toppgene.cchmc.org/
TOXLINE	Toxicology literature online	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE
TOXNET	Toxicology data network	http://toxnet.nlm.nih.gov/
UCSC	UCSC genome browser	http://genome.ucsc.edu/
UniProt	Universal Protein Resource	http://www.uniprot.org/
WENDI	Web Engine for Non-obvious Drug Information	https://cheminfov.informatics.indiana.edu:8443/WENDI_ PUBLIC/WENDI.jsp
WhichGenes	Gene-set building portal	http://www.whichgenes.org/

Table 3.

Databases using CTD content or providing links to CTD

Database	Description	Database URL
AutismKB	Autism knowledgebase	http://autismkb.cbi.pku.edu.cn/index.php
BIAdb	Benzylisoquinoline alkaloids database	http://crdd.osdd.net/raghava/biadb/
BioGraph	Biomedical knowledge discovery server	http://biograph.be/about/welcome
BioXM	BioXM™ Knowledge Management Environment	http://www.biomax.com/products/bioxm.php
BPAGenomics	Bisphenol A genomics data portal	http://www.eh3.uc.edu/GenomicsPortals/tiles.jsp?portal= BPAGenomics
CancerResource	Cancer-related database	http://bioinf-data.charite.de/cancerresource/index.php?site =home
Chem2Bio2RDF	Semantic system for chemical biology	http://cheminfov.informatics.indiana.edu:8080/
ChemIDplus	Chemical dictionary and structure database	http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp
ChemProt	Annotated and predicted chemical–protein interactions	http://www.cbs.dtu.dk/services/ChemProt/
ChemSpider	Chemical structures and property predictions	http://www.chemspider.com/
DDSS	Drug Discovery and Diagnostic Support System	http://www.ps.noda.tus.ac.jp/ddss/
GAD	Genetics Association Database	http://geneticassociationdb.nih.gov/
Galaxy	Web-based platform for biomedical data analysis	https://main.g2.bx.psu.edu/
GeneSetDB	Meta-database integrating human disease and pharmacology	http://www.genesetdb.auckland.ac.nz/haeremai.html
GeneWeaver	Integrates functional genomics experiments	http://geneweaver.org/
GPSy	Gene Prioritization SYstem that prioritizes genes for functional analyses	http://gpsy.genouest.org/
Harvester Portal	Aggregate portal of scientific sites	http://harvester.kit.edu/harvester/
HOMER	Human Organ-specific Molecular Electronic Repository	http://discern.uits.iu.edu:8340/Homer/index.html
MIRIAM	Pharmacogenomics data collections	http://www.ebi.ac.uk/miriam/main/tags/MIR:00600039
NCBI Gene	Gene LinkOuts	http://www.ncbi.nlm.nih.gov/gene
PharmDB	Pharmacological network database	http://pharmdb.org/
PharmGKB	PharmacoGenomics KnowledgeBase	http://www.pharmgkb.org/
PhenoHM	Human-Mouse comparative phenome-genome server	http://phenome.cchmc.org/phenoBrowser/Phenome
PPDB	Pathogenic Pathway Database for Periodontitis	http://bio-omix.tmd.ac.jp/disease/perio/
PubChem	Database of chemical molecules	http://pubchem.ncbi.nlm.nih.gov/
Reactome	Pathway database	http://www.reactome.org/ReactomeGWT/entrypoint.html
RefGene	Index of genes and antibodies	http://refgene.com/
RGD	Rat Genome Database disease and pathway portals	http://rgd.mcw.edu/rgdweb/ontology/search.html
STITCH	Search Tool for InTeractions of CHemicals	http://stitch.embl.de/
T3DB	Toxin, Toxin-Target Database	http://www.t3db.org/
ToppGene	Portal of gene information	http://toppgene.cchmc.org/
TOXLINE	Toxicology literature online	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE
TOXNET	Toxicology data network	http://toxnet.nlm.nih.gov/
UCSC	UCSC genome browser	http://genome.ucsc.edu/
UniProt	Universal Protein Resource	http://www.uniprot.org/
WENDI	Web Engine for Non-obvious Drug Information	https://cheminfov.informatics.indiana.edu:8443/WENDI_ PUBLIC/WENDI.jsp
WhichGenes	Gene-set building portal	http://www.whichgenes.org/

Database	Description	Database URL
AutismKB	Autism knowledgebase	http://autismkb.cbi.pku.edu.cn/index.php
BIAdb	Benzylisoquinoline alkaloids database	http://crdd.osdd.net/raghava/biadb/
BioGraph	Biomedical knowledge discovery server	http://biograph.be/about/welcome
BioXM	BioXM™ Knowledge Management Environment	http://www.biomax.com/products/bioxm.php
BPAGenomics	Bisphenol A genomics data portal	http://www.eh3.uc.edu/GenomicsPortals/tiles.jsp?portal= BPAGenomics
CancerResource	Cancer-related database	http://bioinf-data.charite.de/cancerresource/index.php?site =home
Chem2Bio2RDF	Semantic system for chemical biology	http://cheminfov.informatics.indiana.edu:8080/
ChemIDplus	Chemical dictionary and structure database	http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp
ChemProt	Annotated and predicted chemical–protein interactions	http://www.cbs.dtu.dk/services/ChemProt/
ChemSpider	Chemical structures and property predictions	http://www.chemspider.com/
DDSS	Drug Discovery and Diagnostic Support System	http://www.ps.noda.tus.ac.jp/ddss/
GAD	Genetics Association Database	http://geneticassociationdb.nih.gov/
Galaxy	Web-based platform for biomedical data analysis	https://main.g2.bx.psu.edu/
GeneSetDB	Meta-database integrating human disease and pharmacology	http://www.genesetdb.auckland.ac.nz/haeremai.html
GeneWeaver	Integrates functional genomics experiments	http://geneweaver.org/
GPSy	Gene Prioritization SYstem that prioritizes genes for functional analyses	http://gpsy.genouest.org/
Harvester Portal	Aggregate portal of scientific sites	http://harvester.kit.edu/harvester/
HOMER	Human Organ-specific Molecular Electronic Repository	http://discern.uits.iu.edu:8340/Homer/index.html
MIRIAM	Pharmacogenomics data collections	http://www.ebi.ac.uk/miriam/main/tags/MIR:00600039
NCBI Gene	Gene LinkOuts	http://www.ncbi.nlm.nih.gov/gene
PharmDB	Pharmacological network database	http://pharmdb.org/
PharmGKB	PharmacoGenomics KnowledgeBase	http://www.pharmgkb.org/
PhenoHM	Human-Mouse comparative phenome-genome server	http://phenome.cchmc.org/phenoBrowser/Phenome
PPDB	Pathogenic Pathway Database for Periodontitis	http://bio-omix.tmd.ac.jp/disease/perio/
PubChem	Database of chemical molecules	http://pubchem.ncbi.nlm.nih.gov/
Reactome	Pathway database	http://www.reactome.org/ReactomeGWT/entrypoint.html
RefGene	Index of genes and antibodies	http://refgene.com/
RGD	Rat Genome Database disease and pathway portals	http://rgd.mcw.edu/rgdweb/ontology/search.html
STITCH	Search Tool for InTeractions of CHemicals	http://stitch.embl.de/
T3DB	Toxin, Toxin-Target Database	http://www.t3db.org/
ToppGene	Portal of gene information	http://toppgene.cchmc.org/
TOXLINE	Toxicology literature online	http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE
TOXNET	Toxicology data network	http://toxnet.nlm.nih.gov/
UCSC	UCSC genome browser	http://genome.ucsc.edu/
UniProt	Universal Protein Resource	http://www.uniprot.org/
WENDI	Web Engine for Non-obvious Drug Information	https://cheminfov.informatics.indiana.edu:8443/WENDI_ PUBLIC/WENDI.jsp
WhichGenes	Gene-set building portal	http://www.whichgenes.org/

Enhanced data features

We enhanced CTD by adding four new computational and network scoring features:

DiseaseComps. Every curated disease now includes a ‘DiseaseComp’ data tab. This metric statistically identifies diseases with shared toxicogenomic profiles, allowing users to find other disorders similar to their disease-of-interest (21). Users can refine their analysis based upon the type of disease interaction: either via chemical or gene connections or either via marker/mechanism or therapeutic associations. For example, in CTD autistic disorder is directly connected to 84 chemicals and 242 genes. DiseaseComps statistically identifies other diseases with similar chemical and gene interaction profiles, including whether the relationship is etiological or therapeutic and ranks them based upon their similarity index to produce a list of comparable disorders. DiseaseComps that are based on genes with a marker/mechanism relationship to autistic disorder include intellectual disability and schizophrenia, as well as less intuitive diseases such as prostate and lung cancer, hypotension, hypertension and obesity (Figure 1), provoking the testable hypotheses of whether similar pathways may be involved in these disorders and subsequently if current therapeutics for those diseases might also have beneficial effects on autism.
Filtering data sets. The ability to filter data based upon the type of interactions (as shown above for DiseaseComps) was also applied to other analytical tools in CTD. GeneComps and ChemComps (22) can now be filtered by the type of interaction (activity, expression, binding) and by the direction of the interaction (increase versus decrease) to provide the user with more comparable results. For example, chemicals that increase the expression of gene HMOX1 also increase the expression of a group of genes (including NQO1, GCLC and NOS2), while chemicals that decrease the expression of HMOX1 decrease the expression of a very different gene set (Figure 2). Our analytical tool VennViewer (which allows users to compare associated data sets for up to three chemicals, diseases or genes) can now also be filtered by the direction and type of chemical–gene interaction.
Inference gene–disease network scores. Inferred gene–disease relationships form the bulk of CTD content (Table 1). These inferences are powerful hypothesis-generating data sets: if gene A has a curated interaction with chemical B, and independently chemical B is directly associated with disease C, then CTD integration generates an inference between gene A and disease C (inferred via chemical B). Similar to our chemical–disease network scores (4, B. L. King et al., submitted for publication), we now also utilize local network topology-based statistics to evaluate these inferred gene–disease relationships. These scores allow users to sort and rank the predicted gene–disease relationships to help prioritize hypothesis testing.
Enriched pathway relationships. KEGG and Reactome are resources that provide widely used annotations that assign gene products to molecular pathways (13,14). Typically, such pathway annotations are used to retrieve and organize extensive biological knowledge about gene lists. We have uniquely used these genetic pathway annotations to help explore the actions of non-genetic molecules (i.e. chemicals) by associating pathway data with chemicals via their curated interacting genes. These data are provided on the ‘Pathway’ data tabs for chemicals. They are calculated similar to our previously described enriched GO annotations for chemicals (4) and are intended to help users generate testable hypotheses about molecular pathways perturbed by chemical exposures.

Figure 1.

DiseaseComps finds similar disorders. CTD’s Disease page for autistic disorders contains a ‘DiseaseComps’ data tab that allows users to see similar disorders based upon shared chemicals or genes and either via marker/mechanism or therapeutic relationships. Users can toggle open any of the different representations of the comparable diseases, as shown here for ‘via gene marker/mechanism associations’. In addition to intuitive disorders such as intellectual disability and schizophrenia (the top two comparable diseases identified), it is also discovered that autism shares many genes with non-obvious diseases (red boxes) such as prostatic neoplasms (30 genes), lung neoplasms (17 genes), hypotension (12 genes), obesity (10 genes) and hypertension (11 genes). Clicking on the hyperlinked gene count in the right-hand column opens another window listing the common interacting genes. The Similarity Index is derived from the Jaccard similarity coefficient (22).

Figure 2.

Filtering GeneComps by type of interaction. CTD users can now filter ChemComps and GeneComps based on the direction and type of interaction, as shown here for gene HMOX1. The panel on the left displays other genes that are comparable to HMOX1 based on filtering for chemicals that increase the expression of the genes (red lariat). The panel on the right, however, produces a different set of comparable genes to HMOX1 based on chemicals that decrease the expression of genes (green lariat). Users can also filter for activity, binding or all (unfiltered) interaction types.

New tools

To help navigate the 15.6 million toxicogenomic relationships in CTD, we created a package of analytical and visualization tools, accessible under the ‘Analyze’ menu. We have previously described the Batch Query, VennViewer and MyGeneVenn tools (4,5). To this suite, we added the following:

MyVenn. This tool allows users to generate Venn diagram for expanded CTD data sets including GO terms and Pathway annotations, as well as any user-defined terms (http://ctdbase.org/tools/myVenn.go). The tool automatically converts the input items to lower case and compares them in a case-insensitive manner, as well as removing any duplicate items in a data set. The tool allows users to quickly generate a Venn diagram for comparative analysis of data sets.
Gene set enricher. This tool finds enriched GO or pathway annotations associated with any gene set. A user can access the tool directly (http://ctdbase.org/tools/enricher.go) with their specific list of genes (using either NCBI gene symbols or accession identifiers), choose their enrichment analysis and configure the results via any corrected (or raw) P-value threshold. The tool is also integrated with every chemical–disease view in CTD (i.e. the ‘Diseases’ data tab on a CTD Chemical page or the ‘Chemicals’ data tab on a CTD Disease page). For example, CTD indicates that the organophosphorus nerve agent Soman interacts with 14 genes known to play a role in seizures, forming the inference network ‘Soman—14 genes—seizures’ (Figure 3). With the Gene Set Enricher tool options embedded in the web display, users simply click on the ‘GO’ button under the ‘Enrichment Analysis’ column to identify GO terms that are enriched for those 14 genes. The output ranks 84 GO terms enriched for these 14 genes, including the biological processes synaptic transmission (GO:0007268) and cognition (GO:0050890). From this results page, users can further revise the analysis by selecting corrected versus raw P-values, changing the P-value threshold and filtering the results via the three ontology branches of GO (Figure 3). Similarly, by clicking on the ‘Pathway’ button under the ‘Enrichment Analysis’ column, users can identify pathways that are enriched for those genes and learn more about the molecular mechanisms that may underlie a chemical–disease connection. For example, the most highly enriched pathway for the Soman-seizures relationships (data not shown) is the neuroactive ligand–receptor interaction (KEGG:04080).

Figure 3.

Enrichment analysis of genes in chemical inference networks. CTD’s Chemical page for the nerve agent Soman has the ‘Diseases’ data tab highlighted, listing the diseases to which Soman can be linked (either directly or by an inferred network of genes). By clicking the ‘GO’ button under the ‘Enrichment Analysis’ column for the first listed disease (Seizures), the tool automatically sends the 14 genes listed in the ‘Inference Network’ column (red dashed box) to the Gene Set Enricher tool (red arrow). The results (red inset box) include 84 enriched GO terms associated with these 14 genes. The list can be further revised by selecting corrected versus raw P-values, changing the P-value threshold itself and filtering the results for any of the three GO branches. Similar analysis can be performed for Pathway annotations by clicking the ‘Pathway’ button under the ‘Enrichment Analysis’ column.

New visualization strategies

A growing challenge for databases is developing ways to visualize large data sets to enhance knowledge management for the user (23–25). Toward that end, CTD has begun implementing processes to visualize our content using three different approaches.

All curated chemical–gene interactions are now color-coded on web pages to indicate the directionality of the interaction. Statements that describe an ‘increase’ in an interaction are colored red, ‘decrease’ interactions are displayed in green and for instances where the direction is not specified by the authors, the interaction is colored brown (Figure 4a). The red/green color choice parallels the directionality described in early microarray assays.
The ‘ChemComps’ data tab on a CTD Chemical page now provides the option to visualize the networks of common interacting genes for the top 10 ranked comparable chemicals using a Cytoscape Web display to enhance the visualization and interconnectivity of the molecules that form the share toxicogenomic profile (Figure 4b). The Cytoscape map is customizable by the user, allowing for different layout styles and toggling node and edge labels. Right-clicking on any node provides additional options. The map may be exported in several image formats (PNG, PDF, SVG) and data formats (XGMML, GRAPHML, SIF). This visualization works particularly well with smaller networks and requires both JavaScript and Flash on a user’s computer. For larger networks, a XGMML file is provided for users to use via a desktop version of the open-source application Cytoscape (26).
To curate disease information, CTD biocurators annotate using MEDIC (9), a merged disease vocabulary of Medical Subject Headings (MeSH) disease terms (27) and the Online Mendelian Inheritance in Man (28). MEDIC contains over 9700 primary terms and 59 000 synonyms, making it a practical disease vocabulary that is both deep and broad (9). To summarize this vocabulary, we created a ‘MEDIC-Slim’ list. MEDIC-Slim is a high-level set of terms, derived from the MeSH tree structure for Diseases [C] and Mental Disorders [F03] branches, that organizes all 9700 MEDIC diseases into 36 generic categories, allowing similar types of diseases to be grouped and analyzed for meta-analysis, better visualization and improved knowledge management. The mapping of diseases to their MEDIC-Slim levels was accomplished by collapsing terms upward in the hierarchy until resolving at a top-level MEDIC-Slim term. Because MEDIC is a broad hierarchy, individual diseases often map to more than one MEDIC-Slim level; for example, ‘Diabetes Mellitus, Type 1’ resolves to three generic categories: metabolic disease, endocrine system disease and immune disease, providing a quick classification of the disorder. These mappings to MEDIC-Slim are now displayed on CTD and are available in our downloadable MEDIC files (http://ctdbase.org/downloads/#alldiseases). MEDIC-Slim reduces the complexity of interpreting inferred disease relationships. Currently, CTD contains data for almost 6000 unique diseases, including >200 000 direct disease relationships and 11 million inferred relationships (Table 1). Viewing this extensive data set via the 36 MEDIC-Slim categories provides a perspective of the entire disease landscape in CTD (Figure 5). The top disease categories for both direct and inferred relationships currently include cancer, nervous system, cardiovascular and digestive system diseases. To help manage this knowledge, users can now filter disease relationships via MEDIC-Slim categories on any ‘Diseases’ data tab in CTD. For example, the chemical bisphenol A is associated with 1965 unique diseases. A user interested in exploring how that compound may affect heart defects can apply the ‘cardiovascular disease’ filter to retrieve just the 188 diseases relevant to that filter (Figure 6).

Figure 4.

New visualization at CTD. (a) Manually curated interactions are now color-coded on web pages to rapidly discern between statements that describe an ‘increased’ interaction (red font), a ‘decreased’ interaction (green font) or one in which the directionality is not specified (brown font). (b) The ‘ChemComps’ data tab on a CTD Chemical page provides the option to visualize networks of common interacting genes for the top 10 ranked comparable chemicals using a web version of Cytoscape. The chemicals that form the ChemComps are depicted as blue triangles and the connecting genes are green nodes. The map is customizable by the user (data not shown). For larger networks, XGMML files can be downloaded and used on a desktop platform of Cytoscape (inset).

Figure 5.

CTD disease landscape. CTD currently contains over 11 million disease relationships (both direct and inferred) for 5987 unique diseases MEDIC-Slim reduces the complexity of this information into 36 generic disease categories (y-axis) to show the overall landscape of disease information at CTD for both direct relationships (blue bars) and inferred relationships (yellow bars), as a percentage of the total number of relationships.

Figure 6.

MEDIC-Slim adds functionality, reduces complexity of disease information and eases data management. CTD biocurators use the MEDIC disease vocabulary to curate disease relationships. These MEDIC diseases are now mapped to 36 MEDIC-Slim generic disease categories, which help reduce complexity and add the functionality of allowing users to easily retrieve and manage the information. Under its ‘Diseases’ data tab, the chemical bisphenol A is associated with 1965 diseases (red box). This data set can be filtered for any of the 36 MEDIC-Slim categories from a pick-list, such as ‘Cardiovascular disease’ (red circle), to retrieve only the 188 cardiovascular diseases associated with bisphenol A (red arrow).

Other CTD features

In addition to the above features, we also increased the utility of GO and pathway annotations at CTD. These annotations are directly assigned to gene symbols by external sources, and through integration with CTD data we can create novel connections to diseases with which the same genes are involved. We expanded CTD’s GO and Pathway pages to include a ‘Diseases’ data tab that list these associations. For example, as of July 2012 CTD’s Pathway page for ‘TGF-beta signaling pathway’ (http://ctdbase.org/detail.go?type=pathway&acc=KEGG%3a04350) is directly associated with 375 genes via KEGG, which in turn can be integrated via CTD to 316 diseases, including lung neoplasms, craniofacial abnormalities and sepsis. Similar integrated relationships are available for GO terms on CTD’s GO pages, allowing users to explore diseases from GO and pathway perspectives.

On CTD Gene pages, the listed synonyms are now seamlessly hyperlinked to keyword query searches to help find related genes. For example, CTD’s Gene page TP53 (http://ctdbase.org/detail.go?type=gene&acc=7157) contains the synonym ‘p53 tumor suppressor’, which, when clicked, finds other genes that use that phrase, including the mouse-specific version of the gene (called TRP53), as well as several p53-binding proteins (e.g. TP53BP1 and TP53BP2). This simple feature can alert users to other genes that may be relevant to their gene-of-interest and is particularly helpful because of CTD’s cross-species gene aggregation.

Finally, the Batch Query tool (http://ctdbase.org/tools/batchQuery.go) has been expanded to accommodate literature retrieval by now accepting PubMed identification numbers or digital object identifiers as an input type. This feature allows users to retrieve all curated data content for batches of articles.

SUMMARY AND FUTURE DIRECTIONS

CTD provides detailed information about manually curated chemical–gene interactions, chemical–disease relationships and gene–disease relationships. Integrating these core data with other data sets, CTD helps turn knowledge into discoveries by identifying novel connections between chemicals, genes, diseases, pathways and GO annotations that might not otherwise be apparent using other biological resources.

Here, we have highlighted recent major improvements to CTD, including expanded data content, greater connectivity with other databases, new analytical tools and novel visualization strategies that help users view and organize information. These features make CTD a unique scientific resource for promoting understanding of the effects of environmental chemicals on human health and for generating testable hypotheses about the mechanisms underlying the etiology of environmental diseases.

In the future, we hope to expand the depth and breadth of the manually curated core data, especially by curating recent toxicology journals triaged via a new journal-centric approach to help improve data currency at CTD (A. P. Davis et al., submitted for publication) and expanding into new knowledge spaces, including exposure science (29) and phenotypes. We also plan to increase the visualization and analysis capacity of CTD. For example, heat maps are practical visual devices that help users rapidly interpret large data sets (30). We are currently experimenting with different visualization prototypes to present MEDIC-Slim summaries for disease relationships.

FUNDING

National Institute of Environmental Health Sciences (NIEHS) grants ‘Comparative Toxicogenomics Database’ [R01-ES014065]; ‘Generation of a centralized and integrated resource for exposure data’ [R01-ES019604]. Funding for open access charge: NIEHS [R01-ES014065 and R01-ES019604].

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank Dr Heather Keating for contributions to the curation of the Pfizer-selected toxicology corpus and Roy McMorran for CTD system/database administration. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

REFERENCES

1

Mortensen

HM

,

Euling

SY

.

Integrating mechanistic and polymorphism data to characterize human genetic susceptiblity for environmental chemical risk assessment in the 21st centruy

,

Toxicol. Appl. Pharmacol

,

2011

February 1. (doi:10.1016/j.taap.2011.01.015; epub ahead of print)

2

Mahadevan

B

,

Snyder

RD

,

Waters

MD

,

Benz

RD

,

Kemper

RA

,

Tice

RR

,

Richard

AM

.

Genetic toxicology in the 21st century: reflections and future directions

,

Environ. Mol. Mutagen.

,

2011

, vol.

52

(pg.

339

-

354

)

3

Mattingly

CJ

,

Rosenstein

MC

,

Davis

AP

,

Colby

GT

,

Forrest

JN

,

Boyer

JL

.

The Comparative Toxicogenomics Database: a cross-species resource for building chemical-gene interaction networks

,

Toxicol. Sci.

,

2006

, vol.

92

(pg.

587

-

595

)

4

Davis

AP

,

King

BL

,

Mockus

S

,

Murphy

CG

,

Saraceni-Richards

C

,

Rosenstein

M

,

Wiegers

T

,

Mattingly

CJ

.

The Comparative Toxicogenomics Database: update 2011

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D1067

-

D1072

)

5

Davis

AP

,

Murphy

CG

,

Saraceni-Richards

CA

,

Rosenstein

MC

,

Wiegers

TC

,

Mattingly

CJ

.

Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D786

-

D792

)

6

Davis

AP

,

Murphy

CG

,

Rosenstein

MC

,

Wiegers

TC

,

Mattingly

CJ

.

The Comparative Toxicogenomics Database facilitates identification and understanding of chemical-gene-disease associations: arsenic as a case study

,

BMC Med. Genomics

,

2008

, vol.

1

pg.

48

7

Davis

AP

,

Wiegers

TC

,

Rosenstein

MC

,

Murphy

CG

,

Mattingly

CJ

.

The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database

,

Database

,

2011

2011, bar034

8

Swanson

DR

,

Smalheiser

NR

.

An interactive system for finding complementary literatures: a stimulus to scientific discovery

,

Artif. Intell.

,

1997

, vol.

91

(pg.

183

-

203

)

Crossref

9

Davis

AP

,

Wiegers

TC

,

Rosenstein

MC

,

Mattingly

CJ

.

MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database

,

Database

,

2012

2012, bar065

10

Wiegers

TC

,

Davis

AP

,

Cohen

KB

,

Hirschman

L

,

Mattingly

CJ

.

Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)

,

BMC Bioinformatics

,

2009

, vol.

10

pg.

326

11

Hirschman

L

,

Burns

GA

,

Krallinger

M

,

Arighi

C

,

Cohen

KB

,

Valencia

A

,

Wu

CH

,

Chatr-Aryamontri

A

,

Dowel

KG

,

Huala

E

, et al.

Text mining for the biocuration workflow

,

Database

,

2012

, vol.

2012

pg.

bas020

12

Ashburner

M

,

Ball

CA

,

Blake

JA

,

Botstein

D

,

Butler

H

,

Cherry

JM

,

Davis

AP

,

Dolinski

K

,

Dwight

SS

,

Eppig

JT

, et al.

Gene Ontology: tool for the unification of biology

,

Nat. Genet.

,

2000

, vol.

25

(pg.

25

-

29

)

13

Kanehisa

M

,

Goto

S

,

Sato

Y

,

Furumichi

M

,

Tanabe

M

.

KEGG for integration and interpretation of large-scale molecular data sets

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D109

-

D114

)

14

Croft

D

,

O’Kelly

G

,

Wu

G

,

Haw

R

,

Gillespie

M

,

Matthews

L

,

Caudy

M

,

Garapati

P

,

Copinath

G

,

Jassal

B

, et al.

Reactome: a database of reactions, pathways and biological processes

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D691

-

D697

)

15

de Matos

P

,

Alcantara

R

,

Dekker

A

,

Ennis

M

,

Hastings

J

,

Haug

K

,

Spiteri

I

,

Turner

S

,

Steinbeck

C

.

Chemical entities of biological interest: an update

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D249

-

D254

)

16

Wang

Y

,

Xiao

J

,

Suzek

TO

,

Zhang

J

,

Wang

J

,

Zhou

Z

,

Han

L

,

Karapetyan

K

,

Dracheva

S

,

Shoemaker

BA

, et al.

PubChem’s bioassay database

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D400

-

D412

)

17

Schultheisz

RJ

.

TOXLINE: evolution of an online interactive bibliographic database

,

J. Am. Soc. Inf. Sci.

,

1981

, vol.

32

(pg.

421

-

429

)

18

Hoffmann

R

.

A wiki for the life sciences where authorship matters

,

Nat. Genet.

,

2008

, vol.

40

(pg.

1047

-

1051

)

19

Sayers

EW

,

Barrett

T

,

Benson

DA

,

Bolton

E

,

Bryant

SH

,

Canese

K

,

Chetvernin

V

,

Church

DM

,

Dicuccio

M

,

Federhen

S

, et al.

Database resources of the National Center for Biotechnology Information

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D13

-

D25

)

20

Gaudet

P

,

Bairoch

A

,

Field

D

,

Sansone

SA

,

Taylor

C

,

Attwood

TK

,

Bateman

A

,

Blake

JA

,

Bult

CJ

,

Cherry

JM

, et al.

Towards BioDBcore: a community-defined information specification for biological database

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D7

-

D10

)

21

Davis

AP

,

Rosenstein

MC

,

Wiegers

TC

,

Mattingly

CJ

.

DiseaseComps: a metric that discovers similar diseases based upon common toxicogenomics profiles at CTD

,

Bioinformation

,

2011

, vol.

7

(pg.

154

-

156

)

22

Davis

AP

,

Murphy

CG

,

Saraceni-Richards

CA

,

Rosenstein

MC

,

Wiegers

TC

,

Hampton

TH

,

Mattingly

CJ

.

GeneComps and ChemComps: a new CTD metric to identify genes and chemicals with shared toxicogenomic profiles

,

Bioinformation

,

2009

, vol.

4

(pg.

173

-

174

)

23

Kennedy

J

,

Roerdink

J

.

Highlights of the 1st IEEE Symposium on biological data visualization

,

BMC Bioinformatics

,

2012

, vol.

13

Suppl. 8

pg.

S1

PubMed