Abstract
Two algorithms for the efficient identification of segment neighborhoods are presented. A segment neighborhood is a set of contiguous residues that share common features. Two procedures are developed to efficiently find estimates for the parameters of the model that describe these features and for the residues that define the boundaries of each segment neighborhood. The algorithms can accept nearly any model of segment neighborhood, and can be applied with a broad class of best fit functions including least squares and maximum likelihood. The algorithms successively identify the most important features of the sequence. The application of one of these methods to the haemagglutinin protein of influenza virus reveals a possible mechanism for conformational change through the finding of a break in a strong heptad repeat structure.
Similar content being viewed by others
Literature
Akaike, H. 1970. “Statistical Predictor Identification.”Ann. Inst. Statist. Math. 22, 203–217.
— 1974. “A New Look At Statistical Model Identification.”IEEE Trans. Auto. Control 19, 716–723.
Allen, D. M. 1971. “Mean Square Error of Prediction as a criterion for Selecting Variables.”Technometrics 16, 469–475.
Bellman, R. and R. Roth. 1966. “Curve Fitting by Segmented Straight Lines.”J. Am. Statist. Assoc 46, 1079–1084.
Bement, T. R. and M. S. Waterman. 1977. “Locating Maximum variance Segments in Sequential Data.”Math. Geol. 9, 55–61.
Box, G. E. P. and S. Watson. 1962. “Robustness to non-normality of Regression Tests.”Biometrika 17, 83–91.
Cohen C. and D. A. D. Parry. 1986. “α-Helical Coiled Coils—A Widespread Motif in Proteins”.Trends in Biochemical Sciences 11, 245–248.
Crick, F. H. 1953. “The packing of α-helices: Simple Coiled Coil.”Acta Cryst. 6, 689–697.
Cornette, J. L., K. B. Cease, H. Margalit, J. L. Spouge, J. A. Berzofsky and C. DeLisi. 1987. “Hydrophobicity Scales and Computational Techniques for Detecting Amphipathic Structures in Proteins.”J. Molec. Biol. 195, 659–685.
Dayhoff, M. O., R. N. Schwartz and B. C. Orcutt. 1978.Atlas of Protein Sequence and Structure, Vol 3, pp. 345–352. Silver Spring, MD: National Biomedical Research Foundation.
DeLisi C. and J. A. Berzofsky. 1985. “T-cell Antigenic Sites Tend to be Amphipathic Structures.”Proc. Natn. Acad. Sci. U.S.A. 82, 7048–7052.
Eisenberg, D., E. Schwarz, M. Komaromy and R. Wall. 1984. “Analysis of Membrane and Surface Protein Sequences with the Hydrophobic Moment Plot.”J. Molec. Biol. 179, 125–142.
Engelman, D. M. and G. Zaccai. 1980. “Bacteriorhodopsin is an Inside-Out Protein.”Proc. Natn. Acad. Sci. U.S.A. 77, 5894–5898.
Eventoff, W., M. G. Rossmann, S. S. Taylor, H. J. Torff, H. Meyer, W. Keil and H. H. Kiltz. 1977. “Structural Adaptation of Lactate Dehydrogenase Isozymes.”Proc. Natn. Acad. Sci. U.S.A. 74, 2677–2681.
Feder, P. I. 1975a. “On Asymptotic Distribution Theory in Segmented Regression Problems—Identified Cases.”Ann. Statistics 3, 49–83.
— 1975b. “The log Likelihood Ratio in Segmented Regression.”Ann Statistics 3, 84–97.
Flory, P. J. 1956. “Theory of Elastic Mechanisms in Fibrous Proteins.”J. Am. Chem. Soc. 78, 5222–5235.
Fousler, D. E. and S. Karlin. 1987. “Maximal Success Duration for A Semi-Markov Process.”Stochastic Processes Applic. 24, 203–224.
Hawkins, D. M. 1976. “Point Estimation of the Parameters of Piecewise Regression Models.”Appl. Statistics 25, 51–57.
Heijne, G. von, 1986. “Mitochondrial Targeting Sequences May Form Amphiphilic Helices.”EMBO J. 5, 1335–1342.
Hinkley, D. V. 1971. “Inference in Two-phase Regression.”J. Am. Statis. Assoc. 66, 736–743.
Hopps, T. P. and K. P. Woods. 1981. “Prediction of Protein Antigenic Determinations from Amino Acid Sequences.”Proc. Natn. Acad. Sci. U.S.A. 78, 3824–3828.
Karlin, S. and G. Ghandour. 1985. “Multiple Alphabet Amino Acid Sequence Comparisons of the Immunoglobulin Kappa-gene.”Proc. Natn. Acad. Sci. U.S.A. 82, 8597–8601.
Kendall, M. and A. Stuart. 1979.The Advanced Theory of Statistics, New York: Macmillan.
Kirschner, K. and H. Bisswanger. 1976. “Multifunctional Proteins.”A. Rev. Biochem. 45, 143–166.
Kyte, J. and R. P. Doolittle. 1982. “A Simple Method for Displaying the Hydropathic Character of a Protein.”J. Molec. Biol. 157, 105–132.
Lawrence, C. E. and A. A. Reilly. 1985. “Maximum Likelihood Estimation of Subsequence Conservation.”J. Theor. Biol. 113, 425–439.
Lerman, P. M. 1980. “Fitting Segmented Regression Models by Grid Search.”Appl. Statistics 29, 77–84.
Leszczynski, J. F. and G. D. Rose. 1986. “Loops in Globular Proteins: A Novel Category of Secondary Structure.”Science 234, 849–855.
Mallows, C. L. 1973. “Some Comments onC p .”Technometrics 15, 661–675.
Pearson, E. S. and N. W. Please. 1975. “Relation Between the Shape of Population Distributions and the Robustness of Four Simple Statistical Tests.”Biometrika 62, 223–241.
Quandt, R. E. 1972. “New Approaches to Estimating Switching Regressions.”J. Am. Statist. Assoc. 67, 306–330.
Rose, G. D. 1979. “Hierarchic Organization of Domains in Globular Proteins.”J. Molec. Biol. 134, 447–470.
Sankoff, D. and J. B. Kruskal (Eds.). 1983.Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.
Schulz, G. E. and R. H. Schirmer. 1979.Principles of Protein Structure. New York: Springer.
Seber, G. A. F. 1977.Linear Regression Analysis. Wiley: New York.
Skehel, J.J., P. M. Baylel, E. B. Brown, S. R. Martin, M. D. Waterfield, J. M. White, I. A. Wilson and D. C. Wiley, 1982. “Changes in the Conformation of Influenza Virus Hemagglutinin at the pH Optimum of Virus-mediated Membrane Fusion.”Proc. Natn. Acad. Sci. U.S.A. 79, 968–972.
Sternberg, M. J. and E. Thornton. 1977. “On the Conformation of Proteins: An Analysis of β-Pleated Sheets.”J. Molec. Biol. 110, 285–296.
Waterman, M. S. 1984. “General Methods of Sequence Comparison.”Bull. Math. Biol. 46, 473–500.
Wetlaufer, D. E. 1972. “Nucleation, Rapid Folding, and Globular Intrachain Regions in Proteins.”Proc. Natn. Acad. Sci. U.S.A. 70, 697–701.
Wilson, I. A., J. J. Skehel and D. C. Wiley. 1981. “Structure of the Haemagglutinin Membrane Glycoprotein of Influenza Virus at 3 Å Resolution.”Nature 289, 366–373.
Worsley, K. J. 1983. “Testing for Two-phase Multiple Regression.”Technometrics 25, 35–42.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Auger, I.E., Lawrence, C.E. Algorithms for the optimal identification of segment neighborhoods. Bltn Mathcal Biology 51, 39–54 (1989). https://doi.org/10.1007/BF02458835
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02458835