A New Method of Forward Index with Obtaining ID Automatically

Article Preview

Abstract:

Classification and clustering of similar articles have gained a lot of importance in a time when we drown in information but are starved for knowledge. In this paper, we propose a novel index that uses termID obtained automatically to establish forward index. This method divides files by dictionary information of forward index and realizes the map between terms and files in order to establish efficient index. Later, if it were need to process information of term, it could only handle the corresponding documents.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 468-471)

Pages:

596-600

Citation:

Online since:

February 2012

Export:

Price:

[1] Sara Moriizumi, Bongsung Chu, Haiyan Cao, Hiroaki Matsukawa. SupplyChain Risk Driver Extraction using Text Mining Technique. Information-An International Interdisciplinary Journal. 2011,14(6):(1935)

Google Scholar

[2] R. W. P. Luk, K.-L.Kwok.A Comparison of Chinese Document IndexingStrategies and Retrieval Models.ACM Transactions on Asian LanguageInformation Processing. 2002, 3(1):225–268

DOI: 10.1145/772755.772758

Google Scholar

[3] F. Scholer, H. E. Williams, J. Yiannis, et al. Compression of InvertedIndexes for Fast Query Evaluation. Proceedings of the 25th ACM-SIGIRInternational Conference on Research and Development in Information Retrieval.2002:222–229

DOI: 10.1145/564376.564416

Google Scholar

[4] T. ckerChiueh, L. Huang. Efficient Real-time Index Updates in Text RetrievalSystems. Tech. rep., Experimental Computer Systems Lab, Departmentof Computer Science, State University of New, (1999)

Google Scholar

[5] A. Moffat, J. Zobel. Compression and Fast Indexing for Multi-gigabyteText Databases.Australian Computer Journal. 1994, 26(1):1–9

Google Scholar

[6] E. W. Brown, J. P. Callan, W. B. Croft. Fast Incremental Indexing forFull-text Information Retrieval. Proceedings of the 20th International Conferenceon Very Large Databases (VLDB). Santiago, Chille, 1994:192–202

Google Scholar

[7] A. Tomasic, H. Garc´ıa-Molina, K. Shoens. Incremental Updates of InvertedLists for Text Document Retrieval.Proceedings of the InternationalConference on Management of Data. 1994:289–300

DOI: 10.1145/191843.191896

Google Scholar

[8] Zobel, Moffat, Ramamohanarao. Inverted Files Versus Signature Files forText Indexing. ACMTDS: ACM Transactions on Database Systems. 1998,23:453–490

DOI: 10.1145/296854.277632

Google Scholar

[9] A. Moffat, J. Zobel. Self-indexing Inverted Files for Fast Text Retrieval.ACM Transactions on Information Systems. 1996, 14(4):349–379

DOI: 10.1145/237496.237497

Google Scholar

[10] M. H. Butler, J. Rutherford. Distributed Lucene : A Distributed Free TextIndex for Hadoop. Tech. Rep. HPL-2008-64, Hewlett Packard Laboratories,(2008)

Google Scholar

[11] M.-S. Kim, K.-Y.Whang, J.-G.Lee, et al. Structural Optimization ofa Full-text N -gram Index Using Relational Normalization.The VLDBJournal. 2008, 17(6):1485–1507

DOI: 10.1007/s00778-007-0082-x

Google Scholar

[12] E. D. D. Alej, R. López-ortiz. A Linear Lower Bound on Index Size forText Retrieval.Journal of Algorithms. 2003, 48:2–15

Google Scholar

[13] B. Janet, A. V. Reddy. Cube Index for Unstructured Text Analysis andMining. Proceedings of the 2011 International Conference on Communication,Computing & Security. Odisha, India, 2011:397–402

DOI: 10.1145/1947940.1948023

Google Scholar

[14] T. Jo. Inverted Index Based Modified Version of K-means Algorithm forText Clustering.Journal of Information Processing Systems. 2008, 4(2):67–76

DOI: 10.3745/jips.2008.4.2.067

Google Scholar