ABSTRACT
This paper shows the efficiency of prefix and non-prefix codes for searching over compressed handheld databases. Byte Pair Encoding (BPE), Tagged Suboptimal Code (TSC), and Huffman encoding are the compression techniques used in the evaluation. By compressing handheld databases and searching over compressed text without needing to expand the databases, more data will be stored and more applications can be used. Experimental results show that about 33% more space has been achieved in the compressed handhelds' databases when using Searching over Compressed Text using BPE (SCTB) or Searching over Compressed Text using TSC (SCTT) solutions. Moreover, both solutions are 6.6 times faster than decompressing the databases followed by a linear search in all different sizes of databases. Efficiency performance shows that SCTB is the recommended solution for databases consisting of large-sized records and rarely updated, and SCTT is the recommended method for frequently updated databases or consisting of small-sized records. TSC and BPE compression schemes could also be used to accelerate wireless connectivity, web clipping, or databases transfer between handheld devices and computers, since these databases are usually small in size.
- A. Amir and G. Benson. Efficient Two-dimensional Compressed Matching. In Proc. Second IEEE Data Compression Conference, pages 279--288, March 1992.]]Google ScholarCross Ref
- Christopher Bey, Elly Freeman, and Jean Ostrem. Palm OS® Programmer's Companion, Palm Inc, 2000]]Google Scholar
- Ricardo Baeza-Yates, Gaston H. Gonnet. A new approach to text searching. Communications of the ACM. Volume 35 Issue 10, Pages: 74--82 October 1992]] Google ScholarDigital Library
- Robert S. Boyer, J. Strother Moore. A fast string searching algorithm. Communications of the ACM Volume 20 Issue 10, pages 62--72, October 1977]] Google ScholarDigital Library
- M. Farach and M. Thorup. String-Matching in Lempel-Ziv Compressed Strings. In 27th ACM STOC, p 703--713, 1995.]] Google ScholarDigital Library
- Eroc Giguere. Palm database programming: the complete developer's guide. New York, Wiley, 1999.]] Google ScholarDigital Library
- P. Gage. A new algorithm for data compression. The C Users Journal, 12(2), 1994]] Google ScholarDigital Library
- T. Kida, M. Takeda, A, Shinohara, and S. Arikawa. Shift-And approach to pattern matching in LZW compressed text. 10th Ann. Symp. on Combinatorial Pattern Matching, p. 1--13. Spring-Verlag, 1999.]] Google ScholarDigital Library
- Nivio Ziviani, Edleno de Moura, Gonzalo Navarro and Ricardo Baeza-Yates. Compression: A Key for Next-Generation Text Retrieval Systems. IEEE Computer, v. 33 issue 11, pages 37--44, Nov 2000.]] Google ScholarDigital Library
- A. Bellaachia, I. AL Rassan. String Matching over Compressed Text on Handheld Devices. Proceeding of the International Conference on Embedded Systems and Applications ESA 03, Pages: 80--86, June 2003.]]Google Scholar
- A. Bellaachia, I. AL Rassan. Fast Searching Over Compressed Text Using a New Coding Technique: Tagged Sub-optimal Code (TSC). DCC 2004: IEEE Data Compression Conference, Snowbird, Utah, March 23-25, 2004.]] Google ScholarDigital Library
- A. Bellaachia, I. AL Rassan, Speeding Up String Matching Over Compressed Text On Handheld Devices using Tagged Sub-optimal Code (TSC), RTAS 2004: IEEE Real-Time and Embedded Technology and Applications Symposium, Toronto, Canada, May 25 --- May 28, 2004.]] Google ScholarDigital Library
Index Terms
- Efficiency of prefix and non-prefix codes in string matching over compressed databases on handheld devices
Recommendations
String Matching Over Compressed Text on Handheld Devices Using Tagged Sub-Optimal Code (TSC)
This paper presents Tagged Sub-optimal code (TSC), a new coding technique to speed up string matching over compressed databases on personal digital assistants (PDA). TSC is a variable-length sub-optimal code that supports minimal prefix property. It ...
Speeding up String Matching over Compressed Text on Handheld Devices using Tagged Sub-optimal Code (TSC)
RTAS '04: Proceedings of the 10th IEEE Real-Time and Embedded Technology and Applications SymposiumTagged Sub-optimal code (TSC) is a new codingtechnique presented in this paper to speed up stringmatching over compressed databases on PDAs. TSCis a variable-length sub-optimal code that supportsminimal prefix property. It always determines itscodeword ...
Comments