New experimental methods allow researchers within molecular and systems biology to rapidly generate larger and larger amounts of data. This data is often made publicly available on the Internet and although this data is extremely useful, we are not using its full capacity. One important reason is that we still lack good ways to connect or integrate information from different resources.
One kind of resource is the over 1000 data sources freely available on the Web. As most data sources are developed and maintained independently, they are highly heterogeneous. Information is also updated frequently. Other kinds of resources that are not so well-known or commonly used yet are the ontologies and the standards. Ontologies aim to define a common terminology for a domain of interest. Standards provide a way to exchange data between data sources and tools, even if the internal representations of the data in the resources and tools are different.
In this chapter we argue that ontological knowledge and standards should be used for integration of data. We describe properties of the different types of data sources, ontological knowledge and standards that are available on the Web and discuss how this knowledge can be used to support integrated access to multiple biological data sources. Further, we present an integration approach that combines the identified ontological knowledge and standards with traditional information integration techniques. Current integration approaches only cover parts of the suggested approach. We also discuss the components in the model on which much recent work has been done in more detail: ontology-based data source integration, ontology alignment and integration using standards.
Although many of our discussions in this chapter are general we exemplify mainly using work done within the REWERSE working group on Adding Semantics to the Bioinformatics Web.