Search engines of main-stream literature digital libraries such as ACM Digital Library, Google Scholar, and PubMed employ file-based systems, and provide users with a basic boolean keyword search functionalities. As a result, new and powerful querying capabilities are not easy to implement on top of such systems, and not provided. In comparison, query languages of database systems traditionally have high expressive power. This paper evaluates the scalability of the approach of deploying relational databases as backend systems to digital libraries, and, thus, making use of the query languages and the query processing capabilities of database query engines for literature digital libraries.
To evaluate our approach, we built a scalable prototype digital library built on top of a relational database management system, and its advanced query interface which allows users to specify dynamic text and path queries in an intuitive, hierarchical manner. This paper evaluates the scalability of two search query processing approaches, namely, ad-hoc queries, pre-compiled queries (stored-procedures). We demonstrate that, with reasonably priced hardware, we are able to build an RDBMS-based digital library search engine that can scale to handle millions of queries per day.