ABSTRACT
Browsing and searching for documents in large, online enterprise document repositories are common activities. While internet search produces satisfying results for most user queries, enterprise search has not been as successful because of differences in document types and user requirements. To support users in finding the information they need in their online enterprise repository, we created DocuBrowse, a faceted document browsing and search system. Search results are presented within the user-created document hierarchy, showing only directories and documents matching selected facets and containing text query terms. In addition to file properties such as date and file size, automatically detected document types, or genres, serve as one of the search facets. Highlighting draws the user's attention to the most promising directories and documents while thumbnail images and automatically identified keyphrases help select appropriate documents. DocuBrowse utilizes document similarities, browsing histories, and recommender system techniques to suggest additional promising documents for the current facet and content filters.
Supplemental Material
- A. Abecker, A. Bernardi, K. Hinkelmann, M. Sintek. Enterprise Information Infrastructures for Active, Context-Sensitive Knowledge Delivery, in Knowledge Management Systems: Theory and Practice, S. Barnes (ed.), Thomson Learning, pp. 146--160, 2002.Google Scholar
- E.S. Boese and A.E. Howe. Effects of web document evolution on genre classification. In Proc. of ACM CIKM 2005, pp. 632--639, 2005. Google ScholarDigital Library
- F. Chen, S. Putz, D. Brotsky. Automatic method of selecting multi-word key phrases from a document. US Patent 5745602.Google Scholar
- M.D. Gupta and P. Sarkar. A shared parts model for document image recognition. In Proc. of the Ninth International Conference on Document Analysis and Recognition, pp. 1163--1172, 2007. Google ScholarDigital Library
- M. Hearst. Design Recommendations for Hierarchical Faceted Search Interfaces. In Proc. of the ACM SIGIR Workshop on Faceted Search, pp. 26--30, 2006.Google Scholar
- D. Hawking. Challenges in Enterprise Search. In Proc. of Australasian Database Conference, pp. 15--24, 2004. Google ScholarDigital Library
- W. Janssen and K. Popat. UpLib: a universal personal digital library system. Proc. of ACM Symposium on Document Engineering, pp. 234--242, 2003. Google ScholarDigital Library
- T. Joachims, Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press, pp. 169--184, 1999. Google ScholarDigital Library
- A. Karlson, G. Robertson, D. Robbins, M. Czerwinski, and G. Smith. FaThumb: a facet-based interface for mobile search. In Proc. of CHI'06 (B, pp. 711--720, 2006. Google ScholarDigital Library
- Y. Kim and S. Ross. Examining variations of prominent features in genre classification. In Proc. of Hawaii International Conference on System Sciences, p. 132. 2008. Google ScholarDigital Library
- B. Lee, G. Smith, G. Robertson, M. Czerwinski, D. Tan. FacetLens: exposing trends and relationships to support sensemaking within faceted datasets. In Proc. of CHI '09 (B, pp. 1293--1302, 2009. Google ScholarDigital Library
- R. Mukherjee and J. Mao. Enterprise search: Tough stuff. Queue, pp. 36--46, 2004. Google ScholarDigital Library
- M. Plu, L. Agosto, L. Vignollet, J.-C. Marty, A Contact Recommender System for a Mediated Social Media, Enterprise information systems VI, Vol. 58, I. Seruca, J. Cordeiro, S. Hammoudi (ed.), Springer, 293--300, 2006.Google Scholar
- C. Shin and D. S. Doermann. Classification of document page images based on visual similarity of layout structures. In Proc. SPIE 2000, pp. 182--190, 2000.Google Scholar
- H. Simon. Sciences of the Artificial, 3rd Edition. MIT Press, Cambridge, Massachusetts, 1996. Google ScholarDigital Library
- G. Smith, M. Czerwinski, B. Meyers, D. Robbins, G. Robertson, D. Tan. FacetMap: A scalable search and browse visualization. IEEE Trans. Visualization and Computer Graphics, 12, 5, pp. 797--804, 2006. Google ScholarDigital Library
- P. Turney. Extraction of Keyphrases from Text: Evaluation of Four Algorithms. National Research Council of Canada Technical Report ERB-1051, 1997.Google Scholar
- M.L. Wilson and m.c. schraefel. A longitudinal study of exploratory and keyword search. In Proc. of JCDL, pp. 52--56, 2008. Google ScholarDigital Library
- I. Witten, G. Paynter, E. Frank, C. Gutwin, C. Nevill-Manning. KEA: practical automatic keyphrase extraction, In Proc. of ACM DL, pp.254--255, 1999. Google ScholarDigital Library
- L. Zhen, G. Huang, Z. Jiang, An Inner-Enterprise Knowledge Recommender System, Expert Systems with Applications, Elsevier, pp. 1703--1712, 2009. Google ScholarDigital Library
Index Terms
- DocuBrowse: faceted searching, browsing, and recommendations in an enterprise context
Recommendations
A passage-based approach to learning to rank documents
AbstractAccording to common relevance-judgments regimes, such as TREC’s, a document can be deemed relevant to a query even if it contains a very short passage of text with pertinent information. This fact has motivated work on passage-based document ...
Document Cards: A Top Trumps Visualization for Documents
Finding suitable, less space consuming views for a document’s main content is crucial to provide convenient access to large document collections on display devices of different size. We present a novel compact visualization which represents the document’...
Passage detection using text classification
Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage ...
Comments