skip to main content
article
Free Access

Web mining research: a survey

Authors Info & Claims
Published:01 June 2000Publication History
First page image

References

  1. {1} S. Abiteboul. Querying semi-structured data. In F. N. Afrati and P. Kolaitis, editors, Database Theory - ICDT '97, 6th International Conference, Delphi, Greece, January 8-10, 1997, Proceedings, volume 1186 of Lecture Notes in Computer Science, pages 1-18. Springer, 1997. Google ScholarGoogle Scholar
  2. {2} S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The lorel query language for semistructured data. Int. J. on Digital Libraries, 1(1):68-88, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  3. {3} H. Ahonen, O. Heinonen, M. Klemettinen, and A. Verkamo. Applying data mining techniques for descriptive phrase extraction in digital document collections. In Advances in Digital Libraries (ADL'98), Santa Barbara, California, USA, April 1998, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {4} H. Ahonen, O. Heinonen, M. Klemettinen, and A. Verkamo. Finding co-occurring text phrases by combining sequence and frequent set discovery. In R. Feldman, editor, Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, pages 1-9, 1999.Google ScholarGoogle Scholar
  5. {5} J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998, 1998.Google ScholarGoogle Scholar
  6. {6} J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval August 24 - 28, 1998, pages 37-45, Melbourne Australia, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} D. E. Appelt and D. Israel. Introduction to information extraction technology. In Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI-99, Tutorial, 1999.Google ScholarGoogle Scholar
  8. {8} G. O. Arocena and A. O. Mendelzon. Weboql: Restructuring documents, databases, and webs. Theory and Practice of Object Systems, 5(3):127-141, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} P. Atzeni and G. Mecca. Cut & paste. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 12-14, 1997, Tucson, Arizona, pages 144-153. ACM Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} R. Baeza-Yates and e. Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Company, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} M. Balabanovi'c and Y. Shoham. Fab: Content-based, collaborative recommendation. Communications of the ACM, 40(3):66-70, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} A. Büchner, M. Baumgarten, S. Anand, M. Mulvenna, and J. Hughes. Navigation pattern discovery from internet data. In Proceedings of the WEBKDD '99 Workshop on Web Usage Analysis and User Profiling, August 15, 1999, San Diego, CA, USA, 1999.Google ScholarGoogle Scholar
  13. {13} K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval August 24 - 28, 1998, pages 104-111, Melbourne Australia, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {14} D. Billsus and M. Pazzani. A hybrid user model for news story classification. In Proceedings of the Seventh International Conference on User Modeling (UM '99), Banff, Canada, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {15} J. Borges and M. Levene. Mining association rules in hypertext databases. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), August 27-31, 1998, New York City, New York, USA, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {16} J. Borges and M. Levene. Data mining of user navigation patterns. In Proceedings of the WBBKDD'99 Workshop on Web Usage Analysis and User Profiling, August 15, 1999, San Diego, CA, USA, pages 31-36, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {17} S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Seventh International World Wide Web Conference, Brisbane, Australia, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {18} P. Buneman. Semistructured data. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 12-14, 1997, Tucson, Arizona, pages 117-121. ACM Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {19} P. Buneman, S. B. Davidson, G. G. Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured data. In H. V. Jagadish and I. S. Mumick, editors, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996, pages 505-516. ACM Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} J. Carbonell, M. Craven, S. Fienberg, T. Mitchell, and Y. Yang. Report on the conald workshop on learning from text and the web. In CONALD Workshop on Learning from Text and the Web, June, 1998, 1998.Google ScholarGoogle Scholar
  21. {21} J. Caxbonell, Y. Yang, and W. Cohen. Special issue of machine learning on information retrieval introduction. Machine Learning, 39:99-101, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {22} C. Cardie. Empirical methods in information extraction. AI Magazine, 18(4):65-79, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {23} S. Chakrabarti. Data mining for hypertext: A tutorial survey. ACM SIGKDD Explorations, 1(2):1-11, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {24} S. Chakrabarti, B. Dora, D. Gibson, J. Kleinberg, S. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Mining the link structure of the world wide web. IEEE Computer, 32(8):60-67, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {25} S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In L. M. Haas and A. Tiwary, editors, SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA, pages 307-318. ACM Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {26} S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The tsimmis project: Integration of heterogeneous information sources. In Proceedings of the 10th Meeting of the Information Processing Society of Japan, pages 7-18, 1994.Google ScholarGoogle Scholar
  27. {27} W.W. Cohen. Learning to classify english text with ilp methods. In Advances in Inductive Logic Programming (Ed. L. De Raedt), IOS Press, 1995.Google ScholarGoogle Scholar
  28. {28} W.W. Cohen. Some practical observations on integration of web information. In ACM SIGMOD Workshop on The Web and Databases (WebDB'99), pages 55-60, Philadelphia, Pennsylvania, USA, 1999.Google ScholarGoogle Scholar
  29. {29} W. W. Cohen. What can we learn from the web? In Proceedings of the Sixteenth International Conference on Machine Learning (ICML'99), pages 515-521, 1999.Google ScholarGoogle Scholar
  30. {30} R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {31} R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), 1999.Google ScholarGoogle Scholar
  32. {32} R. W. Cooley. Web Usage Mining: Discovery and Application of Interesting Patterns from Web data. PhD thesis, Dept. of Computer Science, University of Minnesota, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. {33} J. Cowie and W. Lehnert. Information extraction. Communications of the ACM, 39(1):80-91, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. {34} M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, mad S. Slattery. Learning to extract symbolic knowledge from the world wide web. In Proceedings of the Fifteenth National Conference on Artificial Intellligence (AAAI98), pages 509-516, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. {35} F. Crimmins, A. Smeaton, T. Dkaki, and J. Mothe. Tétrafusion: Information discovery on the internet. IEEE Intelligent Systems, 14(4):55-62, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. {36} S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  37. {37} L. Dehaspe and L. de Raedt. Mining association rules in multiple relations. In Proceedings of the 7th International Workshop on Inductive Logic Programming, volume 1297 of Lecture Notes in Computer Science, pages 125-132, Prague, Czech Republic, 1997. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. {38} J. A. Delgado. Agent-Based Information Filtering and Recommender System On the Internet. PhD thesis, Dept. of Intelligence Computer Science, Nagoya Institute of Technology, March 2000.Google ScholarGoogle Scholar
  39. {39} S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of the 1998 ACM 7th international conference on Information and knowledge management, pages 148-155, Washington United States, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. {40} J. S. T. Eliassi-Rad. Intelligent agents for web-based tasks: An advice-taking approach. In Working Notes of the AAAI/ICML-98 Workshop on Learning for Text Categorization, Madison, WI, pages 588-589, 1999.Google ScholarGoogle Scholar
  41. {41} O. Etzioni. The world wide web: Quagmire or gold mine. Communications of the ACM, 39(11):65-68, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. {42} U. Fayyad, S. Djorgovski, and N. Weir. Automating the analysis and cataloging of sky surveys. In Advances in Knowledge Discovery and Data Mining, pages 471- 493. AAAI Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. {43} U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, pages 1-34. AAAI Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. {44} U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Knowledge discovery and data mining: toward a unifying framework. In Proceeding of The Second Int. Conference on Knowledge Discovery and Data Mining , pages 82-88, 1996.Google ScholarGoogle Scholar
  45. {45} R. Feldman and I. Dagan. Knowledge discovery in textual databases (kdt). In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 112-117, Montreal, Canada, 1995.Google ScholarGoogle Scholar
  46. {46} R. Feldman, M. Fresko, Y. Kinar, Y. Lindell, O. Liphstar, M. Rajman, Y. Schler, and O. Zamir. Text mining at the term level. In Principles of Data Mining and Knowledge Discovery, Second European Symposium, PKDD '98, volume 1510 of Lecture Notes in Computer Science, pages 56-64. Springer, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. {47} D. Fensel, C. Knoblock, N. Kushmerick, and M.-C. Rousset. Workshop on intelligent information integration (iii'99). AI Magazine, 21(1):91-94, 2000.Google ScholarGoogle Scholar
  48. {48} M. F. Fernandez, D. Floreseu, A. Y. Levy, and D. Suciu. A query language for a web-site management system. SIGMOD Record, 26(3):4-11, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. {49} R. E. Filman and S. Pant. Searching the internet - guest editors' introduction. IEEE Internet Computing, 2(4):21-23, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. {50} D. Florescu, A. Y. Levy, and A. O. Mendelzon. Database techniques for the world-wide web: A survey. SIGMOD Record, 27(3):59-74, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. {51} E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning. Domain-specific keyphrase extraction. In Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI-99, pages 668-673, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. {52} D. Freitag. Information extraction from html: Application of a general learning approach. In Proceedings of the Fifteenth Conference on Artificial Intelligence AAAI-98 (1998), pages 517-523, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. {53} D. Freitag and A. McCallum. Information extraction with hmms and shrinkage. In Proceedings of the AAAI- 99 Workshop on Machine Learning for Information Extraction, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. {54} J. Fürnkranz. Exploiting structural information for text classification on the www. In Advances in Intelligent Data Analysis, Third International Symposium, IDA-99, pages 487-498, 1999.Google ScholarGoogle Scholar
  55. {55} M. N. Garofalakis, R. Rastogi, S. Seshadri, and K. Shim. Data mining and the web: Past, present and future. In Workshop on Web Information and Data Management, 1999, pages 43-47, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. {56} R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, and M. A. Jeusfeld, editors, VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece, pages 436-445. Morgan Kaufmann, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. {57} R. Goldman and J. Widom. Approximate dataguides. In Proceedings of the Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats , 1999.Google ScholarGoogle Scholar
  58. {58} S. Green, L. Hurst, B. Nangle, P. Cunningham, F. Somers, and R. Evans. Software agents: A review. Technical Report TCD-CS-1997-06, Technical Report of Trinity College, University of Dublin, 1997.Google ScholarGoogle Scholar
  59. {59} S. Grumbach and G. Mecca. In search of the lost schema. In Database Theory - ICDT '99, 7th International Conference, pages 314-331, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. {60} J. Hammer, H. Garcia-Molina, J. Cho, A. Crespo, and R. Aranha. Extracting semistructured information from the web. In Proceedings of the Workshop on Management of Semistructured Data, pages 18-25, 1997.Google ScholarGoogle Scholar
  61. {61} A. Hauptmann. Integrating and using large databases of text, image, video and audio. IEEE Intelligent Systems , 14(5):34-35, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. {62} M. A. Hearst. Untangling text data mining. In Proceedings of ACL'99: the 37th Annual Meeting of the Association for Computational Linguistics, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. {63} T. Hofmann. The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. In Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI-99, pages 682-687, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. {64} S. J. Hong and S. M. Weiss. Advances in predictive model generation for data mining. Technical Report Report RC-21570, IBM Research Report, 1999.Google ScholarGoogle Scholar
  65. {65} T. Honkela, S. Kaski, K. Lagus, and T. Kohonen. Websom - self-organizing maps of document collections. In Proc. of Workshop on Self-Organizing Maps 1997 (WSOM'97), pages 310-315, 1997.Google ScholarGoogle Scholar
  66. {66} A. Houston, H. Chen, S. M. Hubbard, B. R. Schatz, T. D. Ng, R. R. Sewell, and K. M. Tolle. Medical data mining on the internet: Research on a cancer information system. Artificial Intelligence Review, 13:437-446, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. {67} C.-N. Hsu and M.-T. Dung. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems, 23(8):521-538, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. {68} T. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A tour guide for the world wide web. In Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-97, pages 770-777, 1997.Google ScholarGoogle Scholar
  69. {69} M. Junker, M. Sintek, and M. Rinck. Learning for text categorization and information extraction with ilp. In Proceedings of the Workshop on Learning Language in Logic, Bled, Slovenia, 1999, 1999.Google ScholarGoogle Scholar
  70. {70} H. L. K. Wang. Discovering association of structure from semistructured objects. To appear in IEEE Transactions on Knowledge and Data Engineering, 1999.Google ScholarGoogle Scholar
  71. {71} H. Kargupta, I. Hamzaogiu, and B. Stafford. Distributed data mining using an agent based architecture. In Proceedings of Knowledge Discovery And Data Mining, pages 211-214. AAAI Press, 1997.Google ScholarGoogle Scholar
  72. {72} H. Kautz, B. Selman, and M. Shah. The hidden web. Al magazine, 18(2):27-36, 1997.Google ScholarGoogle Scholar
  73. {73} S. Khoshafian and A. B. Baker. Multimedia and Imaging Databases. Morgan Kaufmann Publishers, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. {74} J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of ACM-SIAM Symposium on Discrete Algorithms, 1998, pages 668-677, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. {75} Y. Kodratoff. About knowledge discovery in texts: A definition and an example. In Proc. of Advanced Course on Artificial Intelligence 1999 (ACAI-99) on Machine Learning Applications (Invited talk), 1999.Google ScholarGoogle Scholar
  76. {76} S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cybercommunities. In Proceedings of the Eighth World Wide Web Conference (WWW8), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. {77} N. Kushmerick. Gleaning the web. IEEE Intelligent Systems, 14(2):20-22, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. {78} N. Kushmerick, D. Weld, and R. Doorenbos. Wrapper induction for information extraction. In Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-97, pages 729-737, 1997.Google ScholarGoogle Scholar
  79. {79} L. Lakshmanem, F. Sadri, and I. Subramanian. A declarative language for querying and restructuring the web. In Proceedings of 6th. International Workshop on Research Issues in Data Engineering, RIDGE '96, pages 12-21, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. {80} P. Langley. User modeling in adaptive interfaces. In Proceedings of the Seventh International Conference on User Modeling, pages 357-370, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. {81} S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107-109, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  82. {82} Iberto O. Mendelzon, G. A. Mihalla, and T. Milo. Querying the world wide web. In Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems, pages 80-91, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. {83} B. Lent, R. Agrawal, and R. Srikant. Discovering trends in text databases. In Proc. 3rd Int Conf. On Knowledge Discovery and Data Mining (KDD 1997), pages 227-230, 1997.Google ScholarGoogle Scholar
  84. {84} A. Y. Levy and D. S. Weld. Intelligent internet systems. Artificial Intelligence, 118(1-2), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. {85} S. K. Madria, S. S. Bhowmick, W. K. Ng, and E.-P. Lira. Research issues in web data mining. In Proceedings of Data Warehousing and Knowledge Discovery, First International Conference, DaWaK '99, pages 303-312, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. {86} P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7):30-40, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. {87} B. Masand and M. Spiliopoulou. Webkdd-99: Workshop on web usage analysis and user profiling. SIGKDD Explorations, 1(2), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. {88} A. McCallum, K. Nigam, J. Rennie, and K. Seymore. A machine learning approach to building domain-specific search engines. In Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-99, pages 662-667, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. {89} T. Mitchell. Machine Learning. McGraw Hill, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. {90} T. M. Mitchell. Machine learning and data mining. Communications of the ACM, 42(11):30-36, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. {91} D. Mladenic. Text-learning and related intelligent agents. IEEE Intelligent Systems, 14(4):44-54, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. {92} D. Mladenic and M. Grobelnik. Feature selection for unbalanced class distribution and naïve bayes. In Proceedings of the 16th International Conference on Machine Learning ICML-99, pages 258-267, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. {93} I. Muslea. Extraction patterns for information extraction tasks: A survey. In AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.Google ScholarGoogle Scholar
  94. {94} I. Muslea, S. Minton, and C. Knoblock. Wrapper induction for semistructured, web-based information sources. In Proceedings of the Conference on Automatic Learning and Discovery CONALD-98, 1998.Google ScholarGoogle Scholar
  95. {95} U. Y. Nahm and R. J. Mooney. Ua mutually beneficial integration of data mining and information extraction. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. {96} S. Nestorov, S. Abiteboul, and R. Motwani. Infering structure in semistructured data. SIGMOD Record, 26(4), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. {97} S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. In L. M. Haas and A. Tiwary, editors, SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA, pages 295-306. ACM Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. {98} K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61-67, 1999.Google ScholarGoogle Scholar
  99. {99} G. Paliouras, C. Papatheodorou, V. Karkaletsis, P. Tzitziras, and C. D. Spyropoulos. Large-scale mining of usage data on web sites. In AAAI 2000 Spring Symposium on Adaptive User Interfaces, 2000.Google ScholarGoogle Scholar
  100. {100} M. T. Pazienza, editor. Information Extraction: A multidisciplinary Approach to an Emerging Information Technology, volume 1299 of Lecture Notes in Computer Science. International Summer School, SCIE-97, Frascati (Rome), Springer, 1997.Google ScholarGoogle Scholar
  101. {101} M. T. Pazienza, editor. Information Extraction, Frascati (Rome), 1999. International Summer School, SCIE-99, Frascati (Rome).Google ScholarGoogle Scholar
  102. {102} G. Piatetsky-Shapiro, R. Braachman, T. Khabaza, W. Kloesgen, and E. Simoudis. An overview of issues in developing industrial data mining and knowledge discovery applications. In Proceeding of The Second Int. Conference on Knowledge Discovery and Data Mining, 1996, pages 89-95, 1996.Google ScholarGoogle Scholar
  103. {103} M. Rajman and R. Besançon. Text mining - knowledge extraction from unstructured textual data. In Proc. of 6th Conference of International Federation of Classification Societies (IFCS-98), Roma (Italy), pages 473- 480, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  104. {104} A. Rauber and D. Merkl. Automatic labeling of self-organizing maps: Making a treasure-map reveal its secrets. In Proc of the Pacific Asia Conf on Knowledge Discovery and Data Mining (PAKDD'99), Beijing, China, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  105. {105} J. Rennie and A. McCallum. Using reinforcement learning to spider the web efficiently. In Proceedings of the 16th International Conference on Machine Learning ICML-99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. {106} E. Riloff. Little words can make a big difference for text classification. In E. A. Fox, P. Ingwersen, and R. Fidel, editors, SIGIR'95, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, Washington, USA, July 9-13, 1995 (Special Issue of the SIGIR Forum), pages 130-136. ACM Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. {107} G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw Hill, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. {108} S. Scott and S. Matwin. Feature engineering for text classification. In Proceedings of the 16th International Conference on Machine Learning ICML-99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. {109} L. Singh, B. Chen, R. Haight, P. Scheuermann, and K. Aoki. A robust system architecture for mining semistructured data. In Proceeding of The Second Int. Conference on Knowledge Discovery and Data Mining, 1998, pages 329-333, 1998.Google ScholarGoogle Scholar
  110. {110} P. Smyth, U. M. Fayyad, M. C. Burl, and P. Perona. Modeling subjective uncertainty in image annotation. Advances in Knowledge Discovery and Data Mining, pages 517-539, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. {111} S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1-3):233-272, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. {112} M. Spiliopoulou. Data mining for the web. In Principles of Data Mining and Knowledge Discovery, Second European Symposium, PKDD '99, pages 588-589, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. {113} J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. {114} V. S. Subrahmanian. Principles of Multimedia Database Systems. Morgan Kaufmann Publishers, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. {115} A.-H. Tan. Text mining: The state of the art and the challenges. In Proc of the Pacific Asia Conf on Knowledge Discovery and Data Mining PAKDD'99 workshop on Knowledge Discovery from Advanced Databases, pages 65-70, 1999.Google ScholarGoogle Scholar
  116. {116} H. Toivonen. On knowledge discovery in graph-structured data. In Workshop on Knowledge Discovery from Advanced Databases (KDAD'99), pages 26-31, 1999.Google ScholarGoogle Scholar
  117. {117} R. Uthurusamy. From data mining to knowledge discovery: Current challenges and future directions. In Advances in Knowledge Discovery and Data Mining, pages 561-569, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. {118} S. Vaithyanathan. Introduction: Data mining on the internet. Artificial Intelligence Review, 13(5/6):343- 344, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. {119} C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. {120} K. Wang and H. Liu. Schema discovery for semistructured data. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), pages 271-274, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. {121} S. M. Weiss, C. Apté, F. Damerau, D. E. Johnson, F. J. Oles, T. Goetz, and T. Hampp. Maximizing text-mining performance. IEEE Intelligent Systems, 14(4):63-69, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. {122} W. Wiener, J. Pedersen, and A. Weigend. A neural network approach to topic spotting. In Proceedings of the 4th Symposium on Document Analysis and Information Retrieval (SDAIR 95), pages 317-332, 1995.Google ScholarGoogle Scholar
  123. {123} Y. Wilks. Information Extraction as a core language technology, volume 1299 of Lecture Notes in Computer Science, chapter In M-T. Pazienza (ed.), Information Extraction, pages 1-9. Springer, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. {124} I. H. Witten, Z. Bray, M. Mahoui, and W. J. Teahan. Text mining: A new frontier for lossless compression. In Data Compression Conference 1999, pages 198-207, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. {125} Y. Yang, J. Carbonell, R. Brown, T. Pierce, B. T. Archibald, and X. Liu. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems , 14(4):32-43, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. {126} Y. Yang and J. Pedersen. Guest editors' introduction: Intelligent information retrieval. IEEE Intelligent Systems , 14(4):30-31, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. {127} O. Zaïane and J. Han. Webml: Querying the world-wide web for resources and knowledge. In Proc. ACM CIKM'98 Workshop on Web Information and Data Management (WIDM'98), pages 9-12, 1998.Google ScholarGoogle Scholar
  128. {128} O. R. Zaiane, J. Han, Z.-N. Li, S. H. Chee, and J. Chiang. Multimediaminer: a system prototype for multimedia data mining. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, pages 581-583, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Web mining research: a survey

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader