1. Skip to content


There is little dispute that formulaic language forms an important part of the lexicon, but to date there has been no principled way to prioritize the inclusion of formulaic items in pedagogic materials, such as ESL/EFL textbooks or tests of vocabulary knowledge. While wordlists have been used for decades, they have only provided information about individual word forms (e.g. the General Service List (West, 1953) and the Academic Word List (Coxhead, 2000)). My former PhD student Ron Martinez (now at San Francisco State University) and I have tried to addresses this deficiency by presenting the PHRASal Expressions List (PHRASE List). The list consists of the 505 most frequent non-transparent multiword expressions in English, intended especially for receptive use. The rationale and development of the list, as well as its compatibility with British National Corpus single-word frequency lists. are discussed in our academic paper in Applied Linguistics (Martinez and Schmitt, 2012). The actual PHRASE List is provided below in a Word document. In addition, we provide a User's Guide to help you use the list knowledgably and appropriately. We hope that you find the PHRASE List useful in providing a basis for the systematic integration of multiword lexical items into teaching materials, vocabulary tests, and learning syllabuses.

Vocabulary Tests

Vocabulary Levels Test

  • Vocabulary Levels Tests: Versions 1 & 2 (MS Word, 104 Kb)

    These two versions of the Vocabulary Levels Test were developed by myself, Diane Schmitt, and Caroline Clapham. Their development and initial validation is reported in the journal Language Testing as Schmitt, Schmitt, and Clapham (2001), which is available for download from my publications list. The tests give a frequency profile of a learner's vocabulary, and their interpretation requires an understanding of how vocabulary size relates to the ability to do things in English (e.g. participate in informal spoken discourse; read unsimplified texts). I suggest the Schmitt and Schmitt (2012) article in Language Teaching as an introduction to these ideas (also available for download). You can read more details about how to use and interpret the tests, and about their strengths and limitations, in my 2010 'Researching Vocabulary' book, John Read's 2000 'Assessing Vocabulary' book, and Nation and Gu's 2007 'Focus on Vocabulary' book. The tests are © Norbert Schmitt, but are freely available for research and pedagogical purposes, as long as they are non-commercial.

Test of English Derivatives

  • Test of English Derivatives (MS Word, 25 Kb)

    An important element of word knowledge is the ability to produce the various members of a word family (derivatives) in the appropriate contexts (e.g. 'stimulation' in a noun context, but 'stimulative' in an adjective context). I studied this derivative knowledge with my colleague Cheryl Zimmerman and found that even relatively advanced learners (students studying in presessional courses preparing to enter English-medium universities) typically did not know the main derivatives of our Academic Word List target words. The measurement instrument we used was the Test of English Derivatives (TED). The items in the TED consist of four sentences with blanks for the participants to write in the appropriate derivative form of the target item. See Schmitt and Zimmerman (2002) (available for download from the Publications list) for details about the test, how it was developed and interpreted, and the answer key we used.

Various Paul Meara Tests

  • Meara’s _lognostics Measurement Instruments

    There are a number of measurement instruments on Paul Meara’s _lognostics website. For details, see the documentation on the lognostics site, and discussion of the website in my 2010 'Researching Vocabulary' book. Highlights of the tests on the website include:

    Two vocabulary size tests in the Lex family:

    · X Lex A 5K vocabulary size test
    · Y Lex A 5-10K vocabulary size test

    Three tests which give a measure of productive vocabulary:

    · P Lex A program for evaluating the vocabulary used in short texts
    · D Tools A program that calculates the mean segmental TTR statistic vocd for short texts
    · V Size A program that estimates the productive vocabulary underlying short texts

    Two depth of knowledge tests:

    · V Quint An alternative assessment of vocabulary depth
    · Lex 30 Online word association test

    There is also a suite of language aptitude tests:

    · Llama language aptitude tests

Useful Websites for Vocabulary


  • Compleat Lexical Tutor (Lextutor)

    Compleat Lexical Tutor (Lextutor)

    Created and continuously updated and improved by Tom Cobb in Montreal, Lextutor is the most essential tool in the vocabulary researcher’s toolbox. It has a number of really useful functions, some of which are described below. I use it all the time. Fabulous.

    ·Frequency analysis
    Cut and paste a text into the web window (alternatively download larger texts) and Lextutor tell which frequency band the words in the text belong to, up to the 20,000th level (which will typically be all or nearly all of the words). The results are given in three ways. First, a frequency summary is given, showing what percentage of the text lies in each frequency band (see Figure 5.3 for an example of this, although it does not do justice to the colorized web output). Second, the text is given, with each word color-coded for frequency. Finally, lists of the words in each frequency band are given, according to token, type, and word family. This tool is excellent for getting an overview of the frequency profile of a text, and in highlighting low frequency vocabulary that may be a problem for lower-proficiency learners in a study.

    ·Range analysis
    The Range programs tell you about the distribution of words or other lexical units across a set of two or more texts. The texts can be comparable corpora or subdivisions of a corpus, or a set of texts supplied by a user. Lextutor can use its internal corpora to make comparisons between speech and writing in English (using BNC Sampler data), between speech and writing in French (150,000 words of each), and between the Press, Academic, and Fiction components of the Brown Corpus. You can also upload up to 25 of your own texts and see how many of them each word appears in, and in which specific texts it appears.

    ·Vocabulary Tests
    There are several vocabulary tests available, including the Vocabulary Levels Test, Vocabulary Size Test, the Word Associates Test, a test of the first 1,000 words, and a checklist test.

    ·Other tools
    A range of other corpus-based research tools include a concordancer, frequency word lists, an N-Gram extractor, a frequency level-based cloze passage generator and a traditional nth-word cloze builder. There are also tools for helping to build your own corpora.

    ·Reaction time experiment builder
    Lextutor has ventured into the psycholinguistic paradigm with a basic reaction-time experiment builder. You type in the words to be recognized, and the nonword distractors, and the program will build a word-recognition experiment where participants type 1 for ‘real word’, 3 for ‘nonword’, and 2 to move to the next stimulus. It then gives reaction time summaries for each of the real words.

    ·Pedagogical tools
    It is important to note that Lextutor is as useful for pedagogic purposes as research ones, with features such as concordance line builders, spelling information and activities, and cloze builders. Teachers would be well advised to become familiar with these and other Lextutor features.

Websites for Showing Semantic Associations

  • Wordnet

    WordNet is a freely-downloadable program which provides a range of information about queried words. It first acts like a dictionary, giving the various meaning senses, with definitions and examples. It then shows the various derived forms. It also gives thesaurus-like information, providing lists of synonyms, antonyms, hypernyms (X is one way to …), and troponyms (particular ways to …), as well as information as to how commonly the word is used. It is a quick and easy resource for obtaining semantically-based information about vocabulary of interest.

  • Visuwords

    WordNet is perhaps most accessible with a graphical interface, so that all of the associative links are more obvious. This free internet site is a nice example. You type in a word, and it produces a 3-D network of the connections, color-coded for word class (nouns = blue, verbs = green) and connection type (is a part of = torquoise, opposes = red). Rolling the cursor over any of the nodes brings up definitions and examples. A commercial graphical interface, Visual Thesaurus, has more features (http://www.visualthesaurus.com). It allows you to rotate the 3-D networks in any direction, and when you click on any of the nodes, that node is automatically starts its own new network. This makes browsing through the semantic space around a topic area very easy.

Corpus of Contemporary American English (COCA)

  • COCA Corpus

    COCA Corpus

    The COCA developed by Mark Davies is a very exciting corpus resource for a number of reasons. First, it is large, currently over 450 million words. Second, it is balanced, with the texts being equally divided among five genre/registers: spoken, fiction, popular magazines, newspapers. and academic journals. Third, as opposed to most corpora, the COCA will not be static. The plan is to update it at least twice each year, maintaining the balance proportions of the registers already in place. Fourth, the website contains a very powerful search interface, which allows a variety of interogations. Perhaps best of all, it is free, and only requires the user to register to access it.

Personal Websites of Vocabulary Specialists

Paul Nation

  • Paul Nation

    Paul Nation

    The leading specialist in second language vocabulary pedagogy has a personal website well worth visiting. To start with, his personal publications list is a mini vocabulary bibliography in itself, and many are downloadable. He also offers his large vocabulary bibliography, sorted alphabetically and by topic. The RANGE program, with either GSL/AWL lists or with BNC lists, is provided available for download. The website includes the GSL and AWL word lists, but in addition has a very interesting set of lists of survival vocabulary for 19 languages which includes common expressions like greetings and closings, numbers, units of money and weight and size, directions, and conversation gambits (e.g. please speak slowly). There is a list of graded readers divided into difficulty (i.e. vocabulary size) level.

    One of the highlights of the website is the multitude of vocabulary tests available:

    · one receptive version of the revised Vocabulary Levels Test (VLT)
    · two productive versions of the VLT
    · bilingual 1,000 and 2,000 receptive versions of the VLT
    (Chinese, Indonesian, Japanese, Russian, Samoan, Tagalog, Thai, Tongan, Vietnamese)
    · a basic True/False VLT version focusing on the first 1,000 word level. It is aimed at beginners, using very simple
    vocabulary and pictures to define the target words.
    · a monolingual English version of the Vocabulary Size Test (VST)
    · a bilingual Mandarin version of the VST

    Finally, for any researchers or students needing inspiration about vocabulary research topics, Nation offers a multitude grouped according to 11 categories, mirroring the organization of his book Learning Vocabulary in Another Language (2001).

Paul Meara

  • Paul Meara

    Paul Meara

    Meara’s _lognostics website includes a variety of material focusing on vocabulary acquisition, and features the VARGA (Vocabulary Acquisition Research Group Archive), which contains annotated bibliographies of most of the research on vocabulary acquisition since 1970. You can download the bibliography by individual year, or search the website database through keyword and range of years. This is the best vocabulary bibliography available, especially given that most publications have abstracts and that fact that Meara was the pioneer in collecting vocabulary research beginning with his CILT publication Vocabulary in a Second Language, in 1983. There is also a selection of downloadable papers from Meara and his colleagues.

    Equally notable is an interesting range of innovative vocabulary tests, language aptitude test, and assessment tools which Meara and his colleagues have developed, all downloadable in ZIP files: X_Lex, Y_Lex, P_Lex, D_Tools, V_Size, V_Quint, and Llama. There is also an online association test (Lex_30). The website also promises some future programs, including WA, a program for handling word association data.

    Other information on the site includes entries on the VocabularyWiki page on the Kent-Rosanoff association list, Spanish word frequency lists, the MacArthur Communicative Development Inventory (an assessment scale for monolingual children’s lexical growth). Finally, there are links to the websites of a number of other prominent vocabulary researchers.

Batia Laufer

  • Batia Laufer

    Batia Laufer

    Batia Laufer's university website contains an impressive personal publications bibliography, and also includes the CATSS test (Computer Adaptive Test of Size and Strength) available online.

Norbert Schmitt

Vocabulary Research Methodology

I put most of what I know about vocabulary research methodology into this book. It covers a wide range of research issues, with a focus on various kinds of vocabulary measurement.

Nation and Webb give detailed advice on how to research a number of vocabulary issues, including:

- examining vocabulary teaching techniques

- word cards

- guessing word meaning from context

- measuring lexical richness

It is very practical and easy to read.