Created and continuously updated and improved by Tom Cobb in Montreal, Lextutor is the most essential tool in the vocabulary researcher’s toolbox. It has a number of really useful functions, some of which are described below. I use it all the time. Fabulous.
Cut and paste a text into the web window (alternatively download larger texts) and Lextutor tell which frequency band the words in the text belong to, up to the 20,000th level (which will typically be all or nearly all of the words). The results are given in three ways. First, a frequency summary is given, showing what percentage of the text lies in each frequency band (see an example below of this paragraph). Second, the text is given, with each word color-coded for frequency. Finally, lists of the words in each frequency band are given, according to token, type, and word family. This tool is excellent for getting an overview of the frequency profile of a text, and in highlighting low frequency vocabulary that may be a problem for lower-proficiency learners in a study.
The Range programs tell you about the distribution of words or other lexical units across a set of two or more texts. The texts can be comparable corpora or subdivisions of a corpus, or a set of texts supplied by a user. Lextutor can use its internal corpora to make comparisons between speech and writing in English (using BNC Sampler data), between speech and writing in French (150,000 words of each), and between the Press, Academic, and Fiction components of the Brown Corpus. You can also upload up to 25 of your own texts and see how many of them each word appears in, and in which specific texts it appears.
There are several vocabulary tests available, including the Vocabulary Levels Test, Vocabulary Size Test, the Word Associates Test, a test of the first 1,000 words, and a checklist test.
A range of other corpus-based research tools include a concordancer, frequency word lists, an N-Gram extractor, a frequency level-based cloze passage generator and a traditional nth-word cloze builder. There are also tools for helping to build your own corpora.
Reaction time experiment builder
Lextutor has ventured into the psycholinguistic paradigm with a basic reaction-time experiment builder. You type in the words to be recognized, and the nonword distractors, and the program will build a word-recognition experiment where participants type 1 for ‘real word’, 3 for ‘nonword’, and 2 to move to the next stimulus. It then gives reaction time summaries for each of the real words.
It is important to note that Lextutor is as useful for pedagogic purposes as research ones, with features such as concordance line builders, spelling information and activities, and cloze builders. Teachers would be well advised to become familiar with these and other Lextutor features.
As researchers and practitioners are becoming more aware of the importance of multi-word items in English, there is little doubt that phrasal verbs deserve teaching attention in the classroom. However, there are thousands of phrasal verbs in English, and so the question for practitioners is which phrasal verbs to focus attention upon. Phrasal verb dictionaries typically try to be comprehensive, and this results in a very large number of phrasal verbs being listed, which does not help practitioners in selecting the most important ones to teach or test.
There are phrasal verb lists available (Gardner and Davies, 2007; Liu, 2011), but these have a serious pedagogical shortcoming in that they do not account for polysemy. Research indicates that phrasal verbs are highly polysemous, having on average 5.6 meaning senses, although many of these are infrequent and peripheral. Thus practitioners also need guidance about which meaning senses are the most useful to address in instruction or tests. In response to this need, my student Melodie Garnier developed the PHrasal VErb Pedagogical List (PHaVE List). It lists the 150 most frequent phrasal verbs, and provides information on their key meaning senses, which cover 75%+ of the occurrences in the Corpus of Contemporary American English.
The PHaVE List gives the percentage of occurrence for each of these key meaning senses, along with definitions and example sentences written to be accessible for second language learners, in the style of the General Service List (West, 1953). A users' manual is also provided, indicating how to use the list appropriately, as well as alphabetical and frequency indexes.
There is little dispute that formulaic language forms an important part of the lexicon, but to date there has been no principled way to prioritize the inclusion of formulaic items in pedagogic materials, such as ESL/EFL textbooks or tests of vocabulary knowledge. While wordlists have been used for decades, they have only provided information about individual word forms (e.g. the General Service List (West, 1953) and the Academic Word List (Coxhead, 2000)). My former PhD student Ron Martinez (now at Universidade Federal do Paraná (UFPR) - Brazil) and I have tried to addresses this deficiency by presenting the PHRASal Expressions List (PHRASE List).
The list consists of the 505 most frequent non-transparent multiword expressions in English, intended especially for receptive use. The rationale and development of the list, as well as its compatibility with British National Corpus single-word frequency lists. are discussed in our academic paper in Applied Linguistics (Martinez and Schmitt, 2012). The actual PHRASE List is provided below in a Word document. In addition, we provide a User's Guide to help you use the list knowledgeably and appropriately. We hope that you find the PHRASE List useful in providing a basis for the systematic integration of multiword lexical items into teaching materials, vocabulary tests, and learning syllabuses.
Vocabulary Levels Test
These two versions of the Vocabulary Levels Test were developed by myself, Diane Schmitt, and Caroline Clapham. Their development and initial validation is reported in the journal Language Testing as Schmitt, Schmitt, and Clapham (2001), which is available for download from my publications list. The tests give a frequency profile of a learner's vocabulary, and their interpretation requires an understanding of how vocabulary size relates to the ability to do things in English (e.g. participate in informal spoken discourse; read unsimplified texts). I suggest the Schmitt and Schmitt (2012) article in Language Teaching as an introduction to these ideas (also available for download on the Articles page). You can read more details about how to use and interpret the tests, and about their strengths and limitations, in my 2010 Researching Vocabulary book, John Read's 2000 Assessing Vocabulary book, and Nation and Gu's 2007 Focus on Vocabulary book. The tests are freely available for research and pedagogical purposes. The tests are getting a bit dated now, and there are newer versions of the Vocabulary Levels Test format available. Nevertheless, the ones presented here can still work if you want a rough idea of the vocabulary knowledge of L2 learners.
Test of English Derivatives
An important element of word knowledge is the ability to produce the various members of a word family (derivatives) in the appropriate contexts (e.g. 'stimulation' in a noun context, but 'stimulative' in an adjective context). I studied this derivative knowledge with my colleague Cheryl Zimmerman and found that even relatively advanced learners (students studying in presessional courses preparing to enter English-medium universities) typically did not know the main derivatives of our Academic Word List target words. The measurement instrument we used was the Test of English Derivatives (TED). The items in the TED consist of four sentences with blanks for the participants to write in the appropriate derivative form of the target item. See Schmitt and Zimmerman (2002) (available for download on the Articles page) for details about the test, how it was developed and interpreted, and the answer key we used.
Various Paul Meara Tests
There are a number of measurement instruments on Paul Meara’s _lognostics website. For details, see the documentation on the lognostics site, and discussion of the website in my 2010 Researching Vocabulary book. Highlights of the tests on the website include:
Two vocabulary size tests in the Lex family:
XLex A 5K vocabulary size test
Y Lex A 5-10K vocabulary size test
Three tests which give a measure of productive vocabulary:
P Lex A program for evaluating the vocabulary used in short texts
D Tools A program that calculates the mean segmental TTR statistic vocd for short texts
V Size A program that estimates the productive vocabulary underlying short texts
Two depth of knowledge tests:
V Quint An alternative assessment of vocabulary depth
Lex 30 Online word association test
There is also a suite of language aptitude tests:
Llama language aptitude tests
Useful Websites for Vocabulary
Websites for Showing Semantic Associations
WordNet is a freely-downloadable program which provides a range of information about queried words. It first acts like a dictionary, giving the various meaning senses, with definitions and examples. It then shows the various derived forms. It also gives thesaurus-like information, providing lists of synonyms, antonyms, hypernyms (X is one way to …), and troponyms (particular ways to …), as well as information as to how commonly the word is used. It is a quick and easy resource for obtaining semantically-based information about vocabulary of interest.
WordNet is perhaps most accessible with a graphical interface, so that all of the associative links are more obvious. This free internet site is a nice example. You type in a word, and it produces a 3-D network of the connections, color-coded for word class (nouns = blue, verbs = green) and connection type (is a part of = torquoise, opposes = red). Rolling the cursor over any of the nodes brings up definitions and examples. A commercial graphical interface, Visual Thesaurus, has more features (http://www.visualthesaurus.com). It allows you to rotate the 3-D networks in any direction, and when you click on any of the nodes, that node is automatically starts its own new network. This makes browsing through the semantic space around a topic area very easy.
This website has a range of language analysis software available. Perhaps the best known is Antconc, considered by many to be the best concordancer available for free. But there is much more than this. In late 2017, I counted over 20 software applications. You are bound to find something useful.
The COCA developed by Mark Davies is a very exciting corpus resource for a number of reasons. First, it is large, currently over 520 million words. Second, it is balanced, with the texts being equally divided among five genre/registers: spoken, fiction, popular magazines, newspapers. and academic journals. Third, as opposed to most corpora, the COCA is not be static. The plan is to update it at least twice each year, maintaining the balance proportions of the registers already in place. Fourth, the website contains a very powerful search interface, which allows a variety of interrogations. Perhaps best of all, it is free, and only requires the user to register to access it.
Personal Websites of Vocabulary Specialists
The leading specialist in second language vocabulary pedagogy has a personal website well worth visiting. To start with, his personal publications list is a mini vocabulary bibliography in itself, and many are downloadable. He also offers his large vocabulary bibliography, sorted alphabetically and by topic. The RANGE program, with either GSL/AWL lists or with BNC lists, is provided available for download. The website includes the GSL and AWL word lists, but in addition has a very interesting set of lists of survival vocabulary for 19 languages which includes common expressions like greetings and closings, numbers, units of money and weight and size, directions, and conversation gambits (e.g. please speak slowly). There is a list of graded readers divided into difficulty (i.e. vocabulary size) level.
One of the highlights of the website is the multitude of vocabulary tests available:
one receptive version of the revised Vocabulary Levels Test (VLT)·
two productive versions of the VLT
bilingual 1,000 and 2,000 receptive versions of the VLT
(Chinese, Indonesian, Japanese, Russian, Samoan, Tagalog, Thai, Tongan, Vietnamese)
a basic True/False VLT version focusing on the first 1,000 word level. It is aimed at beginners, using very simple
vocabulary and pictures to define the target words.
a monolingual English version of the Vocabulary Size Test (VST)
a bilingual Mandarin version of the VST
Finally, for any researchers or students needing inspiration about vocabulary research topics, Nation offers a multitude grouped according to 11 categories, mirroring the organization of his book Learning Vocabulary in Another Language (2013).
Meara’s _lognostics website includes a variety of material focusing on vocabulary acquisition, and features the VARGA (Vocabulary Acquisition Research Group Archive), which contains annotated bibliographies of most of the research on vocabulary acquisition since 1970. You can download the bibliography by individual year, or search the website database through keyword and range of years. This is the best vocabulary bibliography available, especially given that most publications have abstracts and that fact that Meara was the pioneer in collecting vocabulary research beginning with his CILT publication Vocabulary in a Second Language, in 1983. There is also a selection of downloadable papers from Meara and his colleagues.
Equally notable is an interesting range of innovative vocabulary tests, a language aptitude test, and assessment tools which Meara and his colleagues have developed, all downloadable in ZIP files: X_Lex, Y_Lex, P_Lex, D_Tools, V_Size, V_Quint, and Llama. (See section above on this page fr more details.) There is also an online association test (Lex_30). The website also promises some future programs, including WA, a program for handling word association data.
Other information on the site includes entries on the VocabularyWiki page on the Kent-Rosanoff association list, Spanish word frequency lists, the MacArthur Communicative Development Inventory (an assessment scale for monolingual children’s lexical growth). There are also links to the websites of a number of other prominent vocabulary researchers. Perhaps the most interesting are his Lognostics Maps, which show show how various vocabulary researchers are linked together in citations (see below for an example).
Batia Laufer's university website contains an impressive personal publications bibliography, and also includes the CATSS test (Computer Adaptive Test of Size and Strength) available online.
Vocabulary Research Methodology
I put most of what I know about vocabulary research methodology into this
book. It covers a wide range of research issues, with a focus on various
kinds of vocabulary measurement.
Nation and Webb give detailed advice on how to research a number of
vocabulary issues, including:
- examining vocabulary teaching techniques
- word cards
- guessing word meaning from context
- measuring lexical richness
It is very practical and easy to read.