While there are many proposals for path indexes on XML documents, none of them is perfectly suited for indexing large-scale collections of interlinked XML documents. Existing strategies lack support for links, require large amounts of time to build or space to store the index, or cannot efficiently answer connection queries. This paper presents the
framework for connection indexing that supports large, heterogeneous document collections with links, using the existing path indexes as building blocks. We introduce some example configurations of the framework that are appropriate for many important application scenarios. Experiments show the feasibility of our approach.