Abstract
In a digital library system, documents are available in digital form and therefore are more easily copied and their copyrights are more easily violated. This is a very serious problem, as it discourages owners of valuable information from sharing it with authorized users. There are two main philosophies for addressing this problem: prevention and detection. The former actually makes unauthorized use of documents difficult or impossible while the latter makes it easier to discover such activity.In this paper we propose a system for registering documents and then detecting copies, either complete copies or partial copies. We describe algorithms for such detection, and metrics required for evaluating detection mechanisms (covering accuracy, efficiency, and security). We also describe a working prototype, called COPS, describe implementation issues, and present experimental results that suggest the proper settings for copy detection parameters.
- 1 C. Anderson. Robocops: Stewart and Feder's mechanized misconduct search. Nature, 350(6318}:454-455, April 1991.]]Google Scholar
- 2 J. Brassil, S. Low, N. Maxemchuk, and L.O'Gorman. Document marking and identification using both line and word shifting. Technical report, AT&T Bell Labratories, 1994. May be obtained from ftp://ft p. research, at t. co m / dist / brassil / do cmark 2 .ps.]]Google Scholar
- 3 J. Brassil, S. Low, N. Maxemchuk, and L.O'Gorman. Electronic marking and identification techniques to discourage document copying. Technical report, AT&T Bell Labratories, 1994.]]Google Scholar
- 4 A. Choudhury, N. Maxemchuk, S. Paul, and H. Schulzrinne. Copyright protection for electronic publishing over computer networks. Technical report, AT&T Bell Labratories, 1994. Submitted to IEEE Network Magazine June 1994.]]Google Scholar
- 5 J. R. Garrett and J. S. Alen. Toward a copyright management system for digital libraries. Technical report, Copyright Clearance Center, 1991.]]Google Scholar
- 6 G. N. Griswold. A method for protecting copyright on networks. In Joint Harvard MIT Workshop on Technology Strategies/or Protecting Intellectual Property in the Networked Multimedia Environment, April 1993.]]Google Scholar
- 7 M. B. Jensen. Making copyright work in electronic publishing models. Serials Review, 18(1-2):62-66, 1992.]]Google ScholarCross Ref
- 8 R. E. Kahn. Deposit, registration and recordation in an electronic copyright management system. Technical report, Corporation for National Research Initiatives, Reston, Virginia, August 1992.]]Google Scholar
- 9 P. A. Lyons. Knowledge-based systems and copyright. Serials Review, 18(1-2):88-91, 1992.]]Google ScholarCross Ref
- 10 U. Manber. Finding similar files in a large file system. In USENIX, pages 1-10, San Francisco, CA, January 1994.]] Google ScholarDigital Library
- 11 A. Parker and J. O. Hamblen. Computer algorithms for plagiarism detection. IEEE Trasnactions on Education, 32(2):94-99, May 1989.]]Google ScholarDigital Library
- 12 G.J. Popek and C.S. Kline. Encryption and secure computer networks. ACM Computing Surveys, 11(4):331- 356, December 1979.]] Google ScholarDigital Library
- 13 D. Wheeler. Computer networks are said to offer new opportunities for plagarists. The Chronicle of Hzgher Education, pages 17, 19, June 1993.]]Google Scholar
Index Terms
- Copy detection mechanisms for digital documents
Recommendations
Copy detection mechanisms for digital documents
SIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of dataIn a digital library system, documents are available in digital form and therefore are more easily copied and their copyrights are more easily violated. This is a very serious problem, as it discourages owners of valuable information from sharing it ...
Copy Detection Systems for Digital Documents
ADL '00: Proceedings of the IEEE Advances in Digital Libraries 2000Partial or total duplication of document content is common to large digital libraries. In this paper, we present a copy detection system to automate the detection of duplication in digital documents. The system we present is sentence-based and makes ...
Comments