Abstract
This article presents MANULEX, a Web-accessible database that provides grade-level word frequency lists of nonlemmatized and lemmatized words (48,886 and 23,812 entries, respectively) computed from the 1.9 million words taken from 54 French elementary school readers. Word frequencies are provided for four levels: first grade (G1), second grade (G2), third to fifth grades (G3-5), and all grades (G1-5). The frequencies were computed following the methods describedby Carroll, Davies, and Richman (1971) and Zeno, Ivenz, Millard, and Duwuri (1995), with four statistics at each level (F, overall word frequency;D, index of dispersion across the selectedreaders;U, estimated frequencyper million words; andSFI, standard frequency index). The database also provides the number of letters in the word and syntactic category information. MANULEX is intended to be a useful tool for studying language development through the selection of stimuli based on precise frequency norms. Researchers in artificial intelligence can also use it as a source of information on natural language processing to simulate written language acquisition in children. Finally, it may serve an educational purpose by providing basic vocabulary lists.
Article PDF
Similar content being viewed by others
References
Adams, M. J. (1990).Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press.
Arabia-Guidet, C., Chevrie-Muller, C., &Louis, M. (2000). Fréquence d’occurrence des mots dans les livres d’enfants de 3 à 5 ans.Revue Européenne de Psychologie Appliquée,50, 3–16.
Aristizabal, M. (1938).Détermination expérimentale du vocabulaire écrit pour servir à l’enregistrement de l’orthographe à l’école primaire. Louvain: Université de Louvain.
Baayen, R. H., Dijkstra, A. F. J., &Schreuder, R. (1997). Singulars and plurals in Dutch: Evidence for a parallel dual-route model.Journal of Memory & Language,37, 94–117.
Baayen, R. H., Piepenbrock, R., &Gulikers, L. (1995).The CELEX lexical database (CD-ROM). Philadelphia: University of Pennsylvania, Linguistic Data Consortium.
Breland, H. M. (1996). Word frequency and word difficulty: A comparison of counts in four corpora.Psychological Science,7, 96–99.
Burgess, C., &Livesay, B. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis.Behavior Research Methods, Instruments, & Computers,30, 272–277.
Carroll, J. B., Davies, P., &Richman, B. (Eds.) (1971).The American Heritage word-frequency book. Boston: Houghton Mifflin.
Catach, N., Jejcic, F., &the HESO group. (1984).Les listes orthographiques de base du français (LOB): Les mots les plus fréquents et leurs formes fléchies les plus fréquentes. Paris: Nathan.
Cattell, J. M. (1886). The time taken up by cerebral operations.Mind,11, 220–242, 377–392, 524–538.
Coltheart, M. (1981). The MRC psycholinguisticdatabase.Quarterly Journal of Experimental Psychology,33A, 497–505. [Available: http:// www.psych.rl.ac.uk/MRC_Psych_Db.html]
Content, A., Mousty, P., &Radeau, M. (1990). Brulex: Une base de données lexicales informatisée pour le français écrit et parlé.L’année Psychologique,90, 551–566. [Available: ftp://ftp.ulb.ac.be/pub/ packages/psyling/Brulex/]
De Cara, B., &Goswami, U. (2002). Similarity relations among spoken words: The special status of rimes inEnglish.Behavior Research Methods, Instruments, & Computers,34, 416–423.
Dolby, J. L., Resnikoff, H. L., &MacMurray, F. L. (1963). A tape dictionary for linguistic experiments. InProceedings of the American Federation of Information Processing Societies: Fall Joint Computer Conference (Vol. 24, pp. 419–423). Baltimore: Spartan Books.
Dottrens, R., & Massarenti, D. (no date).Vocabulaire fondamental du français. Neuchâtel: Delachaux & Niestlé.
Dubois, F., & Buyse, R. (1952). Échelle Dubois-Buyse.Bulletin de la Société Alfred Binet, No. 405. (Originally published 1940)
Dufour, S., Peereman, R., Pallier, C., &Radeau, M. (2002). VOCOLEX: Une base de données lexicales sur les similarités phonologiques entre les mots français.L’Année Psychologique,102, 725–746.
Francis, W., &Kučera, H. (1982).Frequency analysis of English usage. Boston: Houghton Mifflin.
Gilhooly, K. J., &Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words.Behavioral Research Methods & Instrumentation,12, 395–427.
Gougenheim, G., Michéa, R., Rivenc, P., &Sauvageot, A. (1964).L’élaboration du français fondamental (1° degré). Paris: Didier.
Henmon, V. C. A. (1924).A French word book based on a count of 400,000 running words. Madison: University of Wisconsin, Bureau of Educational Research.
Imbs, P. (1971).Dictionnaire des fréquences: Vocabulaire littéraire des XIXe et XXe siècles. I: Table alphabétique. II: Table des fréquences décroissantes. Nancy: CNRS, Didier.
Jacobs, A. M., &Grainger, J. (1994). Models of visual word recognition: Sampling the state of the art.Journal of Experimental Psychology: Human Perception & Performance,20, 1311–1334.
Jones, D. (1963).Everyman’s English pronouncing dictionary. London: Dent.
Juilland, A., Brodin, D., &Davidovitch, C. (1970).Frequency dictionary of French words. The Hague: Mouton.
Käding, J. W. (1897).Häufigkeitswörterbuch der deutschen Sprache. Steglitz: privately published.
Kiss, G. R., Armstrong, C., Milroy, R., &Piper, J. (1973). An associated thesaurus of English and its computer analysis. In A. J. Aitken, R. Bailey, & N. Hamilton-Smith (Eds.),The computer and literary studies. Edinburgh: Edinburgh University Press.
Kučera, H., &Francis, W. N. (1967).Computationalanalysis of present-day American English. Providence, RI: Brown University Press.
Lambert, E., &Chesnet, D. (2001). NOVLEX: Une base de données lexicales pour les élèves de primaire.L’Année Psychologique,101, 277–288. [Available: http://www2.mshs.univ-poitiers.fr/novlex/]
Leech, G., Rayson, P., &Wilson, A. (2001).Word frequencies in written and spoken English based on the British National Corpus. London: Longman.
Lété, B. (2003). Building the mental lexicon by exposure to print: A corpus-based analysis of French reading books. In P. Bonin (Ed.),Mental lexicon: Some words to talk about words (pp. 187–214). Hauppauge, NY: Nova Science.
Lexique 2 (2003). Retrieved from http://www.lexique.org/.
Lovelace, E. A. (1988). On using norms for low-frequency words.Bulletin of the Psychonomic Society,26, 410–412.
Monsell, S. (1991). The nature and locus of word frequency effects in reading. In D. Besner & G. W. Humphreys (Eds.),Basic processes in reading: Visual word recognition (pp. 148–197). Hillsdale, NJ: Erlbaum.
Nagy, W.E., &Anderson, R.C. (1984). How many words are there in printed school English?Reading Research Quarterly,19, 304–330.
Nation, P. (2001).Learning vocabulary in another language. Cambridge: Cambridge University Press.
New, B., Pallier, C., Ferrand, L., &Matos, R. (2001). Une base de données lexicales du français contemporain sur Internet: Lexique.L’Année Psychologique,101, 447–462. [Available: http://www.lexique. org/main/]
Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery and meaningfulness values for 925 words.Journal of Experimental Psychology,76(3, Pt. 2).
Peereman, R., &Content, A. (1999). LEXOP: A lexical database providing orthography-phonology statistics for French monosyllabic words.Behavior Research Methods, Instruments, & Computers,31, 376–379. [Available: ftp://ftp.ulb.ac.be/pub/packages/psyling/Lexop/]
Peereman, R., &Dufour, S. (2003). Un correctif aux codifications phonétiques de la base de données Lexique.L’Année Psychologique,103, 103–108. [Available: http://leadserv.u-bourgogne.fr/bases/ lexiquecorr/]
Plaut, D. C., McClelland, J. L., Seidenberg, M. S., &Patterson, K. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains.Psychological Review,103, 56–115.
Préfontaine, R. R., &Préfontaine, G. C. (1968).Échelle du vocabulaire oral des enfants de 5 à 8 ans au Canada français. Montréal: Beauchemin.
Prescott, M. D. A. (1929). Vocabulaire des enfants et des manuels de lecture.Archives de Psychologie,83–84, 225–274.
Robert, P. (1986).Dictionnaire du français primordial. Paris: Dictionnaire le Robert.
Seidenberg, M. S., &McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming.Psychological Review,96, 523–568.
Smolensky, P. (1996). On the comprehension/production dilemma in child language.Linguistic Inquiry,27, 720–731.
Sprenger-Charolles, L., Siegel, L. S., Béchennec, D., &Serniclaes, W. (2003). Development of phonological and orthographic processing in reading aloud, in silent reading and in spelling: A four year longitudinal study.Journal of Experimental Child Psychology,84, 194–217.
Taft, M. (1979). Recognition of affixed words and the word frequency effect.Memory & Cognition,7, 263–272.
Taft, M. (1991).Reading and the mental lexicon. Hillsdale, NJ: Erlbaum.
Ters, F., Mayer, G., &Reichenbach, D. (1969).L’échelle Dubois-Buyse d’orthographe usuelle française. Neuchâtel: Messeiller.
Thorndike, E. L. (1921).Teacher’s word book. New York: Columbia Teachers College.
Thorndike, E. L. (1932).A teacher’s word book of 20,000 words. New York: Columbia Teachers College.
Thorndike, E. L., &Lorge, I. (1944).The teacher’s word book of 30,000 words. New York: Columbia Teachers College.
Toglia, M. P., &Battig, W. R. (1978).Handbook of semantic word norms. Hillsdale, NJ: Erlbaum.
Vander Beke, G. E. (1935).French word book. New York: Macmillan.
Verlinde, S., &Selva, T. (2001). Corpus-based versus intuition-based lexicography: Defining a word list for a French learner’s dictionary. In P. Rayson, A. Wilson, T. McEnery, A. Hardie, & S. Khoja (Eds.),Proceedings of the Corpus Linguistics 2001 Conference (pp. 594–598). Lancaster: Lancaster University, University Centre for Computer Corpus Research on Language.
Zeno, S. M., Ivenz, S. H., Millard, R. T., &Duvvuri, R. (1995).The educator’s word frequency guide. Brewster, NY: Touchstone Applied Science Associates.
Zevin, J. D., &Seidenberg, M. S. (2002). Age of acquisition effects in word reading and other tasks.Journal of Memory & Language,47, 1–29.
Author information
Authors and Affiliations
Corresponding author
Additional information
Support for this research was provided by two national grants, Cognitive Sciences (COG-192) and Schools and Cognitive Sciences (2001, AL16b), and by subsidies from the National Institute of Pedagogical Research (INRP, 1997, 1998, 1999).
Rights and permissions
About this article
Cite this article
Lété, B., Sprenger-Charolles, L. & Colé, P. MANULEX: A grade-level lexical database from French elementary school readers. Behavior Research Methods, Instruments, & Computers 36, 156–166 (2004). https://doi.org/10.3758/BF03195560
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03195560