Skip to main content

2015 | OriginalPaper | Buchkapitel

A Semantic Text Retrieval for Indonesian Using Tolerance Rough Sets Models

verfasst von : Gloria Virginia, Hung Son Nguyen

Erschienen in: Transactions on Rough Sets XIX

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The research of Tolerance Rough Sets Model (TRSM) ever conducted acted in accordance with the rational approach of AI perspective. This article presented studies who complied with the contrary path, i.e. a cognitive approach, for an objective of a modular framework of semantic text retrieval system based on TRSM specifically for Indonesian. In addition to the proposed framework, this article proposes three methods based on TRSM, which are the automatic tolerance value generator, thesaurus optimization, and lexicon-based document representation. All methods were developed by the use of our own corpus, namely ICL-corpus, and evaluated by employing an available Indonesian corpus, called Kompas-corpus. The endeavor of a semantic information retrieval system is the effort to retrieve information and not merely terms with similar meaning. This article is a baby step toward the objective.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Key statistical highlights: ITU data release June 2012. URL: http://​www.​itu.​int. Accessed on 25 October 2012.
 
2
Utterances may include sound, marks, gesture, grunts, and groans (anything that can signal an intention).
 
3
The reason is, in the context of speech act, we do not concern about whether the belief of a speaker is true or not, rather we concern about the intention of speaker what he/she wants to represent by his/her utterance. Thus, it might be the case that a speaker represents his/her false belief as a true belief to the audience, e.g. a speaker utters ‘it is raining’, while in fact ‘it is a sunny day’.
 
4
In other words, ‘the mind to fit the world’. It is because a belief is like a statement, can be true or false; if the statement is false then it is the fault of the statement, not the world. The world-to-mind direction of fit is applied for the psychological mode such as desire or promise; if the promise is broken, it is the fault of the promiser.
 
5
BPS-Statistics Indonesia. URL: http://​www.​bps.​go.​id/​. Accessed on 25 October 2012.
 
6
July 2012 estimation of The World Factbook. URL: https://​www.​cia.​gov. Accessed on 25 October 2012.
 
7
Portal Nasional Indonesia (National Portal of Indonesia). URL: http://​www.​indonesia.​go.​id. Accessed on 25 October 2012.
 
8
Key statistical highlights: International Telecommunication Union (ITU) data release June 2012. URL: http://​www.​itu.​int. Accessed on 25 October 2012.
 
9
URL: http://​www.​internetworldsta​ts.​com. Accessed on 25 October 2012.
 
10
The graph was taken from the International Telecommunication Union (ITU). URL: http://​www.​itu.​int/​ITU-D/​ict/​statistics/​explorer/​index.​html. Accessed on 25 October 2012.
 
11
Appendix A provides an explanation about the TF*IDF weighting scheme.
 
12
The cognitive modeling is an approach employed in the Cognitive Science (CS). Cognitive science is an interdisciplinary study of mental representations and computations and of the physical systems that support those processes [18, p. xv].
 
13
Explanation about all corpora used in this article is available in Appendix C.
 
14
TREC is a forum for IR community which provides an infrastructure necessary to evaluate an IR system on a broad range of problems. URL: http://​trec.​nist.​gov/​.
 
15
Appendix B provides explanation about Cosine similarity measure as a document ranking algorithm.
 
16
Consistent with VSM, GVSM interprets index term vectors as linearly independent, however they are not orthogonal.
 
17
ICL-corpus consists of 1,000 documents taken from an Indonesian choral mailing list, while WORDS-corpus consists of 1,000 documents created from ICL-corpus in an annotation process conducted by human experts. Further explanation of these corpora is available in Appendix C.1.
 
18
We collaborated with 3 choral experts during annotation process. Their backgrounds could be reviewed in Appendix C.3.
 
19
We used CS stemmer and Vega’s stopword in all of our studies presented in this article.
 
20
Please see Appendix C.1 for explanation of annotation process.
 
21
Please see Appendix C.1.
 
22
If the size of tolerance classes are smaller then the size of upper sets will be smaller, and vice versa.
 
23
These values are for the process with stemming task.
 
24
Most of the foreign terms was English.
 
25
It comes from an English term workshop and an Indonesian suffix -nya.
 
26
Inverted index was applied for document representations in all experiments in this article.
 
27
It is an open source project implemented in Java licensed under the liberal Apache Software License [40]. We used Lucene 3.1.0 in our study. URL for download: http://​lucene.​apache.​org/​core/​downloads.​html.
 
28
JAMA has been developed by the MathWorks and NIST. It provides user-level classes for constructing and manipulating real, dense matrices. We used JAMA 1.0.2 in this study. URL: http://​math.​nist.​gov/​javanumerics/​jama/​.
 
29
We used the trec_eval.9.0 which is publicly available on http://​trec.​nist.​gov/​trec_​eval/​.
 
30
WORDS-corpus is generated based on ICL-corpus hence they dwell in a single domain.
 
31
Base method means that we employed the TF*IDF weighting scheme only without TRSM implementation.
 
32
Please see Appendix C.2.
 
33
Explanation about Cosine as a document ranking is available in Appendix B.
 
34
In fact, we found the same result between ICL_1000 and ICL_1000 + WORDS_1000 in all calculations we made, such as in R-Precision, Precision@10, Precision@20, and Precision@30.
 
35
It is an Indonesian lexicon created by the University of Indonesia described in a study of Nazief and Adriani in 1996 [43] which consists of 29,337 Indonesian root words. The lexicon has been used in other studies [10, 38].
 
36
KBBI is a dictionary copyrighted by Pusat Bahasa (in English: Language Center), Indonesian Ministry of Education, which consists of 27,828 root words.
 
37
The index terms of thesaurus are in the form of single term, hence we choose term partitur as the representative of the karya musik concept.
 
38
Figure 35 serves as a basis for the choice of \(\theta \) values in which the TRSM-representation, LEX-representation, TRSM-representation, and TFIDF-representation outperform the other representations at \(\theta \) = 2, \(\theta \) = 8, \(\theta \) = 41, and \(\theta \) = 88 in respective order. However, particularly at \(\theta \) = 88, the TFIDF-representation only performs better than the LEX-representation.
 
39
The base model means that we employed the TF*IDF weighting scheme without TRSM implementation nor the mapping process.
 
40
Kompas-corpus is a TREC-like Indonesian testbed which is composed of 3,000 newswire articles and is accompanied by 20 topics. Please see Appendix C.4 for more explanation.
 
41
Big data is a term to describe the enormity of data, both structured and unstructured, in volume, velocity, and variety [45].
 
42
Please see Appendix C.4 for more explanation about Kompas-corpus.
 
44
DBpedia is a community project which was started and is administered by research group from Universität Leipzig, Freie Universität Berlin, and OpenLink Software. The project is an effort to extract information from Wikipedia, make this information available on the Web under an open license, and interlink the DBpedia dataset with other open datasets on the Web. The Indonesian short abstracts of DBpedia was downloaded from http://​downloads.​dbpedia.​org/​3.​7/​id/​.
 
Literatur
1.
Zurück zum Zitat Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engine. MIT Press, Cambridge (2010) Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engine. MIT Press, Cambridge (2010)
2.
Zurück zum Zitat Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J.: Text Mining - Predictive Methods for Analyzing Unstructured Information. Springer, New York (2005) Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J.: Text Mining - Predictive Methods for Analyzing Unstructured Information. Springer, New York (2005)
3.
Zurück zum Zitat Eifring, H., Theil, R.: Linguistics for Students of Asian and African Languages (2005) Eifring, H., Theil, R.: Linguistics for Students of Asian and African Languages (2005)
5.
Zurück zum Zitat Searle, J.R.: Intentionality: An Essay in the Philosophy of Mind. Cambridge University Press, Cambridge (1983)CrossRef Searle, J.R.: Intentionality: An Essay in the Philosophy of Mind. Cambridge University Press, Cambridge (1983)CrossRef
6.
Zurück zum Zitat Grice, H.P.: Studies in the Way of Words. Harvard University Press, Cambridge (1989) Grice, H.P.: Studies in the Way of Words. Harvard University Press, Cambridge (1989)
7.
Zurück zum Zitat Haugh, M., Jaszczolt, K.M.: Speaker intentions and intentionality. In: Allan, K., Jaszczolt, K.M. (eds.) The Cambridge Handbook of Pragmatics, pp. 87–112. Cambridge University Press, Cambridge (2012)CrossRef Haugh, M., Jaszczolt, K.M.: Speaker intentions and intentionality. In: Allan, K., Jaszczolt, K.M. (eds.) The Cambridge Handbook of Pragmatics, pp. 87–112. Cambridge University Press, Cambridge (2012)CrossRef
8.
Zurück zum Zitat Akand, M.: Grice and searle on meaning. Copula - J. Philos. Dept XXVIII, 51–58 (2011) Akand, M.: Grice and searle on meaning. Copula - J. Philos. Dept XXVIII, 51–58 (2011)
9.
Zurück zum Zitat Adriani, M., Manurung, R.: A survey of bahasa Indonesia NLP research conducted at the University of Indonesia. In: Proceedings of the 2nd International MALINDO Workshop (2008) Adriani, M., Manurung, R.: A survey of bahasa Indonesia NLP research conducted at the University of Indonesia. In: Proceedings of the 2nd International MALINDO Workshop (2008)
10.
Zurück zum Zitat Asian, J.: Effective techniques for Indonesian text retrieval. Ph.D. thesis, School of Computer Science and Information Technology, RMIT University, Doctor of Philosophy Thesis (March 2007) Asian, J.: Effective techniques for Indonesian text retrieval. Ph.D. thesis, School of Computer Science and Information Technology, RMIT University, Doctor of Philosophy Thesis (March 2007)
11.
Zurück zum Zitat Asian, J., Williams, H.E., Tahaghoghi, S.M.M.: A testbed for Indonesian text retrieval. In: Bruza, P., Moffat, A., Turpin, A. (eds.) ADCS, pp. 55–58. University of Melbourne, Department of Computer Science (2004) Asian, J., Williams, H.E., Tahaghoghi, S.M.M.: A testbed for Indonesian text retrieval. In: Bruza, P., Moffat, A., Turpin, A. (eds.) ADCS, pp. 55–58. University of Melbourne, Department of Computer Science (2004)
12.
Zurück zum Zitat Sneddon, J.: The Indonesian Language: It’s History and Role in Modern Society. UNSW Press, Sydney (2003) Sneddon, J.: The Indonesian Language: It’s History and Role in Modern Society. UNSW Press, Sydney (2003)
13.
Zurück zum Zitat Kawasaki, S., Nguyen, N.B., Ho, T.-B.: Hierarchical document clustering based on tolerance rough set model. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 458–463. Springer, Heidelberg (2000) CrossRef Kawasaki, S., Nguyen, N.B., Ho, T.-B.: Hierarchical document clustering based on tolerance rough set model. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 458–463. Springer, Heidelberg (2000) CrossRef
14.
Zurück zum Zitat Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. Int. J. Intell. Syst. 17(2), 199–212 (2002)CrossRef Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. Int. J. Intell. Syst. 17(2), 199–212 (2002)CrossRef
15.
Zurück zum Zitat Nguyen, H.S., Ho, T.B.: Rough document clustering and the internet. In: Handbook of Granular Computing, pp. 987–1003. Wiley, Hoboken (2008) Nguyen, H.S., Ho, T.B.: Rough document clustering and the internet. In: Handbook of Granular Computing, pp. 987–1003. Wiley, Hoboken (2008)
16.
Zurück zum Zitat Wu, Y., Ding, Y., Wang, X., Xu, J.: On-line hot topic recommendation using tolerance rough set based topic clustering. J. Comput. 5, 549–556 (2010) Wu, Y., Ding, Y., Wang, X., Xu, J.: On-line hot topic recommendation using tolerance rough set based topic clustering. J. Comput. 5, 549–556 (2010)
17.
Zurück zum Zitat Gaoxiang, Y., Heping, H., Zhengding, L., Ruixuan, L.: A novel web query automatic expansion based on rough set. Wuhan Univ. J. Nat. Sci. 11(5), 1167–1171 (2006)CrossRef Gaoxiang, Y., Heping, H., Zhengding, L., Ruixuan, L.: A novel web query automatic expansion based on rough set. Wuhan Univ. J. Nat. Sci. 11(5), 1167–1171 (2006)CrossRef
18.
Zurück zum Zitat Bly, B.M., Rumelhart, D.E. (eds.): Cognitive Science: Handbook of Perception and Cognition, 2nd edn. Academic Press, Millbrae (1999) Bly, B.M., Rumelhart, D.E. (eds.): Cognitive Science: Handbook of Perception and Cognition, 2nd edn. Academic Press, Millbrae (1999)
19.
Zurück zum Zitat Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Pearson Education Inc., Upper Saddle River (2010) Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Pearson Education Inc., Upper Saddle River (2010)
20.
Zurück zum Zitat Voorhees, E.M., Harman, D.: Overview of the ninth text retrieval conference (TREC-9). In: Proceedings of the Ninth Text Retrieval Conference (TREC-9), National Institute of Standards and Technology (NIST), pp. 1–14 (2000) Voorhees, E.M., Harman, D.: Overview of the ninth text retrieval conference (TREC-9). In: Proceedings of the Ninth Text Retrieval Conference (TREC-9), National Institute of Standards and Technology (NIST), pp. 1–14 (2000)
21.
Zurück zum Zitat Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999) Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
22.
Zurück zum Zitat Chomsky, N.: Language and Mind, 3rd edn. Cambridge University Press, New York (2006)CrossRef Chomsky, N.: Language and Mind, 3rd edn. Cambridge University Press, New York (2006)CrossRef
23.
Zurück zum Zitat Furnas, G.W., Deerwester, S., Dumais, S.T., Landauer, T.K., Harshman, R.A., Streeter, L.A., Lochbaum, K.E.: Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 1988, New York, NY, USA, pp. 465–480. ACM (1988) Furnas, G.W., Deerwester, S., Dumais, S.T., Landauer, T.K., Harshman, R.A., Streeter, L.A., Lochbaum, K.E.: Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 1988, New York, NY, USA, pp. 465–480. ACM (1988)
24.
Zurück zum Zitat Grossman, D.A., Frieder, O.: Information Retrieval: Algorithms and Heuristics, 2nd edn. Springer, Netherlands (2004)CrossRef Grossman, D.A., Frieder, O.: Information Retrieval: Algorithms and Heuristics, 2nd edn. Springer, Netherlands (2004)CrossRef
25.
Zurück zum Zitat Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial intelligence. IJCAI 2007, San Francisco, CA, USA, pp. 1606–1611. Morgan Kaufmann Publishers Inc (2007) Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial intelligence. IJCAI 2007, San Francisco, CA, USA, pp. 1606–1611. Morgan Kaufmann Publishers Inc (2007)
26.
Zurück zum Zitat Gottron, T., Anderka, M., Stein, B.: Insights into explicit semantic analysis. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. CIKM 2011, New York, NY, USA, pp. 1961–1964. ACM (2011) Gottron, T., Anderka, M., Stein, B.: Insights into explicit semantic analysis. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. CIKM 2011, New York, NY, USA, pp. 1961–1964. ACM (2011)
27.
Zurück zum Zitat Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector spaces model in information retrieval. In: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 1985, New York, NY, USA, pp. 18–25. ACM (1985) Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector spaces model in information retrieval. In: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 1985, New York, NY, USA, pp. 18–25. ACM (1985)
28.
Zurück zum Zitat Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended document representation for search result clustering. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 77–95. Springer, Heidelberg (2012) CrossRef Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended document representation for search result clustering. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 77–95. Springer, Heidelberg (2012) CrossRef
29.
Zurück zum Zitat Nguyen, S.H., Jaśkiewicz, G., Świeboda, W., Nguyen, H.S.: Enhancing search result clustering with semantic indexing. In: Proceedings of the Third Symposium on Information and Communication Technology. SoICT 2012, New York, NY, USA, pp. 71–80. ACM (2012) Nguyen, S.H., Jaśkiewicz, G., Świeboda, W., Nguyen, H.S.: Enhancing search result clustering with semantic indexing. In: Proceedings of the Third Symposium on Information and Communication Technology. SoICT 2012, New York, NY, USA, pp. 71–80. ACM (2012)
30.
Zurück zum Zitat Szczuka, M., Janusz, A., Herba, K.: Semantic clustering of scientific articles with use of DBpedia knowledge base. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 61–76. Springer, Heidelberg (2012) CrossRef Szczuka, M., Janusz, A., Herba, K.: Semantic clustering of scientific articles with use of DBpedia knowledge base. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 61–76. Springer, Heidelberg (2012) CrossRef
32.
Zurück zum Zitat Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough Sets: A Tutorial, pp. 3–98. Springer, Singapore (1998) Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough Sets: A Tutorial, pp. 3–98. Springer, Singapore (1998)
33.
Zurück zum Zitat Pawlak, Z.: Some issues on rough sets. In: Peters, J.F., Skowron, A., Grzymała-Busse, J.W., Kostek, B., Swiniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3100, pp. 1–58. Springer, Heidelberg (2004) CrossRef Pawlak, Z.: Some issues on rough sets. In: Peters, J.F., Skowron, A., Grzymała-Busse, J.W., Kostek, B., Swiniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3100, pp. 1–58. Springer, Heidelberg (2004) CrossRef
34.
Zurück zum Zitat Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inf. 27, 245–253 (1996)MathSciNet Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inf. 27, 245–253 (1996)MathSciNet
35.
Zurück zum Zitat Lassila, O., Mcguinness, D.: The role of frame-based representation on the semantic web. Technical report, Knowledge System Laboratory, Standford University (2001) Lassila, O., Mcguinness, D.: The role of frame-based representation on the semantic web. Technical report, Knowledge System Laboratory, Standford University (2001)
36.
Zurück zum Zitat Virginia, G., Nguyen, H.S.: Lexicon-based document representation. Fundamenta Informaticae 124, 27–45 (2013, to appear) Virginia, G., Nguyen, H.S.: Lexicon-based document representation. Fundamenta Informaticae 124, 27–45 (2013, to appear)
37.
Zurück zum Zitat Vega, V.B.: Information retrieval for the Indonesian language. Master’s thesis, National University of Singapore, Unpublished (2001) Vega, V.B.: Information retrieval for the Indonesian language. Master’s thesis, National University of Singapore, Unpublished (2001)
38.
Zurück zum Zitat Adriani, M., Asian, J., Nazief, B., Tahaghoghi, S.M.M., Williams, H.E.: Stemming indonesian: a confix-stripping approach. ACM Trans. Asian Lang. Inf. Process. 6, 1–33 (2007)CrossRef Adriani, M., Asian, J., Nazief, B., Tahaghoghi, S.M.M., Williams, H.E.: Stemming indonesian: a confix-stripping approach. ACM Trans. Asian Lang. Inf. Process. 6, 1–33 (2007)CrossRef
39.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRef Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRef
40.
Zurück zum Zitat McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action. Manning Publications Co., Greenwich (2010) McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action. Manning Publications Co., Greenwich (2010)
41.
Zurück zum Zitat Virginia, G., Nguyen, H.S.: An algorithm for tolerance value generator in tolerance rough sets model. In: Na, M.G., Toro, C., Posada, J., Howlett, R.J., Jain, L.C. (eds.) Advances in Knowledge-Based and Intelligent Information and Engineering Systems. KES 2012, Netherlands, pp. 595–604. IOS Press (2012) Virginia, G., Nguyen, H.S.: An algorithm for tolerance value generator in tolerance rough sets model. In: Na, M.G., Toro, C., Posada, J., Howlett, R.J., Jain, L.C. (eds.) Advances in Knowledge-Based and Intelligent Information and Engineering Systems. KES 2012, Netherlands, pp. 595–604. IOS Press (2012)
42.
Zurück zum Zitat Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996) Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
43.
Zurück zum Zitat Adriani, M., Nazief, B.: Confix-Stripping: Approach to Stemming Algorithm for Bahasa Indonesia. Internal Publication, Depok (1996) Adriani, M., Nazief, B.: Confix-Stripping: Approach to Stemming Algorithm for Bahasa Indonesia. Internal Publication, Depok (1996)
44.
Zurück zum Zitat Obadi, G., Dráždilová, P., Hlaváček, L., Martinovič, J., Snášel, V.: A tolerance rough set based overlapping clustering for the DBLP data. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops. WI-IAT 2010, vol. 3, pp. 57–60. IEEE (2010) Obadi, G., Dráždilová, P., Hlaváček, L., Martinovič, J., Snášel, V.: A tolerance rough set based overlapping clustering for the DBLP data. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops. WI-IAT 2010, vol. 3, pp. 57–60. IEEE (2010)
46.
Zurück zum Zitat Ingwersen, P.: Information Retrieval Interaction, 1st edn. Taylor Graham, London (1992) Ingwersen, P.: Information Retrieval Interaction, 1st edn. Taylor Graham, London (1992)
47.
Zurück zum Zitat Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRef Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRef
Metadaten
Titel
A Semantic Text Retrieval for Indonesian Using Tolerance Rough Sets Models
verfasst von
Gloria Virginia
Hung Son Nguyen
Copyright-Jahr
2015
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-47815-8_9