Skip to main content
Log in

Extensive simulations for longest common subsequences

Finite size scaling, a cavity solution, and configuration space properties

  • Published:
The European Physical Journal B - Condensed Matter and Complex Systems Aims and scope Submit manuscript

Abstract:

Given two strings X and Y of N and M characters respectively, the Longest Common Subsequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Using extensive Monte-Carlo simulations for this problem, we find a finite size scaling law of the form for the average LCS length of two random strings of size N over S letters. We provide precise estimates of for .We consider also a related Bernoulli Matching model where the different entries of an array are occupied with a match independently with probability 1/S. On the basis of a cavity-like analysis we find that the length of a longest sequence of matches in that case behaves as where r=M/N and . This formula agrees very well with our numerical computations. It provides a very good approximation for the Random String model, the approximation getting more accurate as S increases. The question of the “universality class” of the LCS problem is also considered. Our results for the Bernoulli Matching model show very good agreement with the scaling predictions of [#!HwaLassig96_PRL!#] for Needleman-Wunsch sequence alignment. We find however that the variance of the LCS length has a scaling different from Var in the Random String model, suggesting that long-ranged correlations among the matches are relevant in this model. We finally study the “ground state” properties of this problem. We find that the number of solutions typically grows exponentially with N. In other words, this system does not satisfy “Nernst's principle”. This is also reflected at the level of the overlap between two LCSs chosen at random, which is found to be self averaging and to approach a definite value q S <1 as .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Additional information

Received: 23 April 1998 / Revised: 30 July 1998 / Accepted: 14 August 1998

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boutet de Monvel, J. Extensive simulations for longest common subsequences . Eur. Phys. J. B 7, 293–308 (1999). https://doi.org/10.1007/s100510050616

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s100510050616

Navigation