Abstract
The management of electronic document collections is fundamentally different from the management of paper documents. The ephemeral nature of some electronic documents means that the document address (i.e., reference details of the document) can become incorrect some time after coming into use, resulting in references, such as index entries and hypertext links, failing to correctly address the document they describe. A classic case of invalidated references is on the World Wide Web—links that point to a named resource fail when the domain name, file name, or any other aspect of the addressed resource is changed, resulting in the well-known Error 404. Additionally, there are other errors which arise from changes to document collections.
This paper surveys the strategies used both in World Wide Web software and other hypertext systems for managing the integrity of references and hence the integrity of links. Some strategies are preventative, not permitting errors to occur; others are corrective, discovering references errors and sometimes attempting to correct them; while the last strategy is adaptive, because references are calculated on a just-in-time basis, according the current state of the document collection.
Supplemental Material
Available for Download
this is the very good description
- ACM. 2000. ACM Digital Library, http://www.acm. org/dl/.]]Google Scholar
- ARNOLD-MOORE, T. AND SACKS-DAVIS, R. 1994. Databases of Legislation: The Problems of Consolidation, Technical Report CITRI/TR-94- 9, Royal Melbourne Institute of Technology.]]Google Scholar
- ASHMAN, H. 1997. Theory and Practice of Large- Scale Hypermedia Management Systems, Ph.D. thesis, Royal Melbourne Institute of Technology.]]Google Scholar
- ASHMAN, H. AND DAVIS, H. 1998. Missing the 404: link integrity on the World Wide Web. In Proceedings of the Seventh International World Wide Web Conference, Elsevier, 761-762; also issued as Computer Networks and ISDN Systems 30, 1-7, http://www.scu.edu.au/programme/panels/ 1942/com1942.htm.]] Google Scholar
- ASHMAN, H., GARRIDO, A., AND OINAS-KUKKONEN, H. 1997. Hand-made and computed links, precomputed and dynamic links. In Proceedings of Hypermedia-Information Retrieval- Multimedia '97 (HIM '97) Conference, 191-208.]]Google Scholar
- BERNERS-LEE, T. 1996. Universal resource identifiers in WWW: a unifying syntax for the expression of names and addresses of objects on the network as used in the World Wide Web, World Wide Web Journal 1, 2 3-19.]]Google Scholar
- BERNERS-LEE, T., FIELDING, R., AND FRYSTYK, H. 1996. Hypertext transfer protocol HTTP/1.0, World Wide Web Journal 1, 2 59-94.]]Google Scholar
- BROWNE, S., DONGARRA, J., GREEN, S., MOORE, K., PEPIN, T., ROWAN, T., AND WADE, R. 1995. Location-Independent Naming for Virtual Distributed Software Repositories, http://www.netlib.org/utk/-papers/lifn/main. html.]]Google Scholar
- CAJUN. 2000. The CAJUN Project. Electronic Publishing Research Group. http://cajun.cs.nott. ac.uk.]]Google Scholar
- CARR, L., HILL, G., DE ROURE, D., HALL, W., AND DAVIS, H. 1996. Open information services. In Proceedings of the Fifth International WWW Conference; also issued as Computer Networks and ISDN Systems 28, 7-11, 1027-1036, http://www5conf.inria.fr/fich html/papers/P12/ Overview.html.]] Google Scholar
- CHANKHUNTHOD, A., DANZIG, P., NEERDAELS, C., SCHWARTZ, M., AND WORRELL, K. 1995. A Hierarchical Internet Object Cache, http://excalibur.usc.edu/cache-html/cache.html.]]Google Scholar
- CNRI. Corporation for National Research Initiatives. 1998. The Handle System, http://www. handle.net/.]]Google Scholar
- CONNOLLY, D. 1996. Names and addresses; URIs, URLs, URNs, URCs. http://www.w3.org/pub/ www/Addressing/.]]Google Scholar
- CREECH, M. 1996. Author-oriented link management. In Proceedings of the 5th International WWW Conference; also issued as Computer Networks and ISDN Systems 28, 7-11, 1015-1025, http://www5conf.inria.fr/fich html/papers/P11/ Overview.html.]] Google Scholar
- DAVIS, H. 1995. To embed or not to embed, Communications of the ACM 38, 8 (Aug.), 108-109.]] Google Scholar
- DAVIS, H. 1998. Referential integrity of links in open hypermedia systems. In Proceedings of ACM Hypertext '98, 207-216.]] Google Scholar
- DAVIS, H. HALL, W., HEATH, I., HILL, G., AND WILKINS, R. 1992. Towards an integrated information environment with open hypermedia systems. In Proceedings of the Second European Conference on Hypertext, ACM, 181-190.]] Google Scholar
- IANELLA, R., SUE, H., AND LEONG, D. 1996. BURNS: basic urn service resolution for the internet. In Proceedings of the Asia-Pacific World Wide Web Conference, Beijing and Hong Kong, http://www.dstc.edu.au/Research/Research/ Resource Discovery/publications/apweb96/ index.html.]]Google Scholar
- INGHAM, D., CAUGHY, S., AND LITTLE, M. 1996. Fixing the "broken-link" problem: the w3objects approach. In Proceedings of the 5th International WWW Conference; also issued as Computer Networks and ISDN Systems 28, 7-11, 1255-1268, http://www5conf.inria.fr/ fich html/papers/P32/Overview.html.]] Google Scholar
- IDF98. International DOI Foundation. 1998. About the DOI, http://www.doi.org/about the doi. html.]]Google Scholar
- Jane's. 2000. Jane's Information Group, All the World's Aircraft, CD-ROM.]]Google Scholar
- KANTOR, B. AND LAPSLEY, P. 1986. Network News Transfer Protocol-A Proposed Standard for the Stream-Based Transmission of News. Internet RFC 977, http://www.w3.org/ Protocols/rfc977/rfc977.txt.]] Google Scholar
- KAPLAN, S. AND MAAREK, Y. 1990. Incremental maintenance of semantic links in dynamically changing hypertext systems, Interacting with Computers 2, 3, 337-366.]] Google Scholar
- KAPPE, F. 1995. A scalable architecture for maintaining referential integrity in distributed information systems, Journal of Universal Computer Science 1, 2 http://www. iicm.edu/jucs 1 2/a scalable architecture for.]]Google Scholar
- LUOTONEN, A. AND ALTIS, K. 1994. World wide web proxies. In Proceedings of the WWW'94 conference; also issued as Computer Networks and ISDN Systems 27, 2, 147-154, http://www.cern.ch/PapersWWW94/luotonen.ps.]] Google Scholar
- MAIOLI, C., SOLA, F., AND VITALI, F. 1993. Wide area distribution issues in hypertext systems. In Proceedings of ACM SIGDOC '93, 185-198.]] Google Scholar
- NELSON, T. 1988. Managing immense storage. Byte 13, 1 (Jan.), 225-238.]] Google Scholar
- OCLC, 1996. Online Computer Library Center, Inc. PURL, http://purl.oclc.org/.]]Google Scholar
- OJP. 1999. Open Journal Project. http://journals. ecs.soton.ac.uk.]]Google Scholar
- PITKOW, J. 1998. Summary of WWW characterizations, In Proceedings of the 7th International World Wide Web Conference, Elsevier, 551-558; also issued as Computer Networks and ISDN Systems 30, 1-7, http://www.scu.edu.au/programme/fullpapers/ 1877/com1877.htm.]] Google Scholar
- PITKOW, J. AND JONES, R. 1996. Supporting the web: a distributed hyperlink database system. In Proceedings of the 5th International WWW Conference; also issued as Computer Networks and ISDN Systems 28, 7-11, 981-991, http://www5conf.inria.fr/fich html/papers/P10/ Overview.html.]] Google Scholar
- TANAKA, K., NISHIKAWA, N., HIRAYIMA, S., AND NANBA, K. 1991. Query pairs as hypertext links. In Proceedings of the 7th International Conference on Data Engineering, IEEE Computer Science Press, 456-463.]] Google Scholar
- THISTLEWAITE, P. 1995. Managing large hypermedia information bases: a case study involving the Australian parliament. Proceedings of the Ausweb '95 Conference, 223-228, http://ausweb.scu.edu.au/sponsored/ausweb/ ausweb95/papers/management/thistlewaite/.]]Google Scholar
- THISTLEWAITE, P. 1997. Automatic construction and management of large open webs. In M. AGOSTI and J. ALLAN eds., Special issue on methods and tools for the automatic construction of hypermedia. Information Processing and Management 33, 2, 161-173, Elsevier.]] Google Scholar
- TICHY, W. 1985. RCS: A system for version control. Software-Practice and Experience 15, 7, 637- 654.]] Google Scholar
- VANZYL, A., CESNIK, B., HEATH, I., AND DAVIS, H. 1994. Open hypertext systems: An examination of requirements, and analysis of implementation strategies, comparing microcosm, hyperTED, and the world wide web, http://www.inf-wiss.unikonstanz.de/Res/openhypermedia.html.]]Google Scholar
- VERBYLA, J. AND ASHMAN, H. 1994. A userconfigurable hypermedia-based interface via the functional model of the link, Hypermedia 6, 3, 193-208.]]Google Scholar
Index Terms
- Electronic document addressing: dealing with change
Recommendations
Electronic Document Publishing Using DjVu
DAS '02: Proceedings of the 5th International Workshop on Document Analysis Systems VOnline access to complex compound documents with client side search and browsing capability is one of the key requirements of effective content management. "DjVu" (D j Vu) is a highly efficient document image compression methodology, a file format, and ...
Knowledge-based document retrieval in office environments: the Kabiria system
In the office environment, the retrieval of documents is performed using the concepts contained in the documents, information about the procedural context where the documents are used, and information about the regulations and laws that discipline the ...
An Electronic Document for Distributed Electronic Services
Computer Information Systems and Industrial ManagementAbstractThe paper presents the role of documents in the implementation of various types of transactions. The main features of the document determining its usefulness in the effective exchange of legal information, ensuring the authenticity, integrity and ...
Comments