Vocabulary Resources

PHaVE List

As researchers and practitioners are becoming more aware of the importance of multi-word items in English, there is little doubt that phrasal verbs deserve teaching attention in the classroom. However, there are thousands of phrasal verbs in English, and so the question for practitioners is which phrasal verbs to focus attention upon. Phrasal verb dictionaries typically try to be comprehensive, and this results in a very large number of phrasal verbs being listed, which does not help practitioners in selecting the most important ones to teach or test.

There are phrasal verb lists available (Gardner and Davies, 2007; Liu, 2011), but these have a serious pedagogical shortcoming in that they do not account for polysemy. Research indicates that phrasal verbs are highly polysemous, having on average 5.6 meaning senses, although many of these are infrequent and peripheral. Thus practitioners also need guidance about which meaning senses are the most useful to address in instruction or tests. In response to this need, my student Melodie Garnier developed the PHrasal VErb Pedagogical List (PHaVE List). It lists the 150 most frequent phrasal verbs, and provides information on their key meaning senses, which cover 75%+ of the occurrences in the Corpus of Contemporary American English.

The PHaVE List gives the percentage of occurrence for each of these key meaning senses, along with definitions and example sentences written to be accessible for second language learners, in the style of the General Service List (West, 1953). A users' manual is also provided, indicating how to use the list appropriately, as well as alphabetical and frequency indexes.

PHaVE List

PHaVE List journal article

PHaVE List User's Manual

PHaVE List Alphabetical Index

PHaVE List Frequency Ranking

PHRASE List

There is little dispute that formulaic language forms an important part of the lexicon, but to date there has been no principled way to prioritize the inclusion of formulaic items in pedagogic materials, such as ESL/EFL textbooks or tests of vocabulary knowledge. While wordlists have been used for decades, they have only provided information about individual word forms (e.g. the General Service List (West, 1953) and the Academic Word List (Coxhead, 2000)). My former PhD student Ron Martinez (now at Universidade Federal do Paraná (UFPR) - Brazil) and I have tried to addresses this deficiency by presenting the PHRASal Expressions List (PHRASE List).

The list consists of the 505 most frequent non-transparent multiword expressions in English, intended especially for receptive use. The rationale and development of the list, as well as its compatibility with British National Corpus single-word frequency lists. are discussed in our academic paper in Applied Linguistics (Martinez and Schmitt, 2012). The actual PHRASE List is provided below in a Word document. In addition, we provide a User's Guide to help you use the list knowledgeably and appropriately. We hope that you find the PHRASE List useful in providing a basis for the systematic integration of multiword lexical items into teaching materials, vocabulary tests, and learning syllabuses.

PHRASE List

Phrase List Users Guide

Martinez & Schmitt (2012)

Vocabulary Tests

Vocabulary Levels Test

These two versions of the Vocabulary Levels Test were developed by myself, Diane Schmitt, and Caroline Clapham. Their development and initial validation is reported in the journal Language Testing as Schmitt, Schmitt, and Clapham (2001), which is available for download from my publications list. The tests give a frequency profile of a learner's vocabulary, and their interpretation requires an understanding of how vocabulary size relates to the ability to do things in English (e.g. participate in informal spoken discourse; read unsimplified texts). I suggest the Schmitt and Schmitt (2012) article in Language Teaching as an introduction to these ideas (also available for download on the Articles page). You can read more details about how to use and interpret the tests, and about their strengths and limitations, in my 2010 Researching Vocabulary book, John Read's 2000 Assessing Vocabulary book, and Nation and Gu's 2007 Focus on Vocabulary book. The tests are freely available for research and pedagogical purposes. The tests are getting a bit dated now, and there are newer versions of the Vocabulary Levels Test format available. Nevertheless, the ones presented here can still work if you want a rough idea of the vocabulary knowledge of L2 learners.

Vocabulary Levels Tests Versions 1 and 2

Test of English Derivatives

An important element of word knowledge is the ability to produce the various members of a word family (derivatives) in the appropriate contexts (e.g. 'stimulation' in a noun context, but 'stimulative' in an adjective context). I studied this derivative knowledge with my colleague Cheryl Zimmerman and found that even relatively advanced learners (students studying in presessional courses preparing to enter English-medium universities) typically did not know the main derivatives of our Academic Word List target words. The measurement instrument we used was the Test of English Derivatives (TED). The items in the TED consist of four sentences with blanks for the participants to write in the appropriate derivative form of the target item. See Schmitt and Zimmerman (2002) (available for download on the Articles page) for details about the test, how it was developed and interpreted, and the answer key we used.

Test of English Derivatives

Various Paul Meara Tests

There are a number of measurement instruments on Paul Meara’s _lognostics website. For details, see the documentation on the lognostics site, and discussion of the website in my 2010 Researching Vocabulary book. Highlights of the tests on the website include:

The latest incarnation of checklist vocabulary size test:
- V_YesNo

Three tests which give measures of productive vocabulary:
- P Lex A program for evaluating the vocabulary used in short texts
- Lex 30 test of productive vocabulary
- D Tools A program that calculates the mean segmental TTR statistic vocd for short texts

A suite of language aptitude tests:
- Llama language aptitude tests

See below for more details on the lognostics webpage,

Meara’s _lognostics Measurement Instruments

Useful Websites for Vocabulary

Lextutor

Compleat Lexical Tutor (Lextutor)

Created and continuously updated and improved by Tom Cobb in Montreal, Lextutor is the most essential tool in the vocabulary researcher’s toolbox. It has a number of really useful functions, some of which are described below. I use it all the time. Fabulous.

Frequency analysis
Cut and paste a text into the web window (alternatively download larger texts) and Lextutor tell which frequency band the words in the text belong to, up to the 20,000th level (which will typically be all or nearly all of the words). The results are given in three ways. First, a frequency summary is given, showing what percentage of the text lies in each frequency band (see an example below of this paragraph). Second, the text is given, with each word color-coded for frequency. Finally, lists of the words in each frequency band are given, according to token, type, and word family. This tool is excellent for getting an overview of the frequency profile of a text, and in highlighting low frequency vocabulary that may be a problem for lower-proficiency learners in a study.

Range analysis
The Range programs tell you about the distribution of words or other lexical units across a set of two or more texts. The texts can be comparable corpora or subdivisions of a corpus, or a set of texts supplied by a user. Lextutor can use its internal corpora to make comparisons between speech and writing in English (using BNC Sampler data), between speech and writing in French (150,000 words of each), and between the Press, Academic, and Fiction components of the Brown Corpus. You can also upload up to 25 of your own texts and see how many of them each word appears in, and in which specific texts it appears.

Vocabulary Tests
There are several vocabulary tests available, including the Vocabulary Levels Test, Vocabulary Size Test, the Word Associates Test, a test of the first 1,000 words, and a checklist test.

Other tools
A range of other corpus-based research tools include a concordancer, frequency word lists, an N-Gram extractor, a frequency level-based cloze passage generator and a traditional nth-word cloze builder. There are also tools for helping to build your own corpora.

Reaction time experiment builder
Lextutor has ventured into the psycholinguistic paradigm with a basic reaction-time experiment builder. You type in the words to be recognized, and the nonword distractors, and the program will build a word-recognition experiment where participants type 1 for ‘real word’, 3 for ‘nonword’, and 2 to move to the next stimulus. It then gives reaction time summaries for each of the real words.

Pedagogical tools
It is important to note that Lextutor is as useful for pedagogic purposes as research ones, with features such as concordance line builders, spelling information and activities, and cloze builders. Teachers would be well advised to become familiar with these and other Lextutor features.

Laurence Anthony's Free Software

This website has a range of language analysis software available. Perhaps the best known is Antconc, considered by many to be the best concordancer available for free. But there is much more than this. In early 2020, I counted over 20 software applications. You are bound to find something useful.

Websites for Showing Semantic Associations

Wordnet

WordNet is a freely-downloadable program which provides a range of information about queried words. It first acts like a dictionary, giving the various meaning senses, with definitions and examples. It then shows the various derived forms. It also gives thesaurus-like information, providing lists of synonyms, antonyms, hypernyms (X is one way to …), and troponyms (particular ways to …), as well as information as to how commonly the word is used. It is a quick and easy resource for obtaining semantically-based information about vocabulary of interest.

Visuwords

WordNet is perhaps most accessible with a graphical interface, so that all of the associative links are more obvious. This free internet site is a nice example. You type in a word, and it produces a 3-D network of the connections, color-coded for word class (nouns = blue, verbs = green) and connection type (is a part of = torquoise, opposes = red). Rolling the cursor over any of the nodes brings up definitions and examples. A commercial graphical interface, Visual Thesaurus, has more features (http://www.visualthesaurus.com). It allows you to rotate the 3-D networks in any direction, and when you click on any of the nodes, that node is automatically starts its own new network. This makes browsing through the semantic space around a topic area very easy.

Corpus of Contemporary American English (COCA)

The COCA developed by Mark Davies is a very exciting corpus resource for a number of reasons. First, it is large, currently over one billion words. Second, it is balanced, with the texts being divided among five genre/registers in the original COCA: spoken, fiction, popular magazines, newspapers. and academic journals. But now there are also three new categories of less formal language from TV and movie subtitles, blogs, and web pages. Third, as opposed to most corpora, the COCA is not be static. It has been updated every year since its inception in 1990, maintaining the balance proportions of the registers already in place. Fourth, the website contains a very powerful search interface, which allows a variety of interrogations. Perhaps best of all, it is free or inexpensive. A free registration allows basic access, and a small reasonable fee unlocks full functionality.

Personal Websites of Vocabulary Specialists

Paul Nation

The leading specialist in second language vocabulary pedagogy has a personal website well worth visiting. To start with, his personal publications list is a mini vocabulary bibliography in itself, and many are downloadable. He also offers his large vocabulary bibliography, sorted alphabetically and by topic. The RANGE program, with either GSL/AWL lists or with BNC lists, is provided available for download. The website includes the GSL and AWL word lists, but in addition has a very interesting set of lists of survival vocabulary for 19 languages which includes common expressions like greetings and closings, numbers, units of money and weight and size, directions, and conversation gambits (e.g. please speak slowly).

One of the highlights of the website is the multitude of vocabulary tests available:

various versions of the Vocabulary Levels Test (VLT)

various versions of the Vocabulary Size Test (VST), both monolingual and bilingual

There is section on graded readers divided into difficulty (i.e. vocabulary size) level. A special feature is the higher-level mid-frequency graded readers, seldom seen elsewhere.

Finally, for any researchers or students needing inspiration about vocabulary research topics, Nation offers a multitude grouped according to 11 categories, mirroring the organization of his book Learning Vocabulary in Another Language (2013).

Paul Meara

Meara’s _lognostics website includes a variety of material focusing on vocabulary acquisition, and features the VARGA (Vocabulary Acquisition Research Group Archive), which contains annotated bibliographies of most of the research on vocabulary acquisition since 1970. You can download the bibliography by individual year, or search the website database through keyword and range of years. This is the best vocabulary bibliography available, especially given that most publications have abstracts and that fact that Meara was the pioneer in collecting vocabulary research beginning with his CILT publication Vocabulary in a Second Language, in 1983. There is also a selection of downloadable papers from Meara and his colleagues.

Equally notable is the interesting range of innovative vocabulary and aptitude tests mentioned above. There are also links to the websites of a number of other prominent vocabulary researchers. Perhaps the most interesting are his Lognostics Maps, which show show how various vocabulary researchers are linked together in citations (see below for an example).

Batia Laufer

Batia Laufer's university website contains an impressive personal publications bibliography, and also includes the CATSS test (Computer Adaptive Test of Size and Strength) available online.

Vocabulary Research Methodology

I put most of what I know about vocabulary research methodology into this

book. It covers a wide range of research issues, with a focus on various

kinds of vocabulary measurement.

Nation and Webb give detailed advice on how to research a number of

vocabulary issues, including:

- examining vocabulary teaching techniques

- word cards

- guessing word meaning from context

- measuring lexical richness

It is very practical and easy to read.

Quizzes about Vocabulary Specialists

How well do you know the scholars working in the field of vocabulary studies? Challenge your knowledge and take the following quizzes on 20 leading scholars and their specialties. Quiz #1 is a recent one I made as a party game for my Vocabulary Research Group. Quiz #2 is an older one (circa early 2000s) which I made as a classroom exercise for my MA Vocabulary class.

Norbert Schmitt

Emeritus Professor of Applied Linguistics

University of Nottingham