SEAlang Lab Thai Vocabulary
SEAlang Lab Thai Vocabulary Lists
This page provides a window into our work on Thai vocabulary lists.  Clicking any link on the left displays that list's contents (the first set of AWL links has additional work in progress).  The search feature finds all lists that contain any particular English or Thai word.  Note that:
 --  either English or Thai search queries are OK, but must match whole words.
 --  * is the wildcard: *x* matches fox, taxi, etc.
 --  WebRank (discussed below) can apply to any search: words or files.
    The WebRank measure is an experimental metric we are using to estimate word difficulty.  It is derived from the frequency the term is found on the Web, and is expressed as a number on a scale of 0 (most common) to 15 or so (least common).
    This work is still underway.  The lists reflect just one aspect of a larger effort to devise innovative approaches to using the lexicon in language acquisition; e.g. extracting items with particular orthographic or phonetic features, or at a particular difficulty level, from across the entire set of lists.  Feel free to use these lists in any way you like, to make comments or contributions, or to request new features. 
Examples
 --  A search target of *, with WebRank 7+ or harder, followed by clicking the "search" button, will return all difficult words from the entire set.
 --  No search target, with WebRank 1 or easier, followed by clicking the "main list" link (under Thai AWL) returns all 0, 1, or unranked entries from the Thai AWL main list.
 --  A target of "inter*", followed by clicking the "search" button, returns all English words that begin with "inter...".  This also works for any Thai sequence.
Sources
Our sources for these lists are:
AUA Conversation we extracted vocabulary lists from the three AUA volumes.
BYKI lists Before You Know It (www.byki.com) is a widely used tool produced by Transparent Language.  They create, accept, and redistribute lists in their b4u format.  We have extracted heads and glosses from b4u lists from a variety of sources. 
SEAlang lists This large set of wordlists convers about 3,300 different words, split into some 250 sets of 10 - 15 items each.  Words are grouped in various ways:
  - semantically, e.g. colors, days, and other related words;
  - thematically, e.g. processes, such as visiting a dentist;
  - contextually, e.g. words that are associated with a location;
  - by difficulty, e.g. words of particular frequency levels.
Our goal was to create lists that had some sort of consistency in both content and difficulty.  This turned out to be surprisingly hard, which may explain why we were unable to find extensive lists of this sort for any language. 
Academic Wordlist (AWL) These lists are based on Averil Coxhead's work; see language.massey.ac.nz/staff/awl.  The basic contention of academic wordlists is that many relatively low-frequency words are commonly found in university-level texts. 
    This kind of vocabulary has never been methodically presented to Thai language learners (or to students of less commonly taught language in general).  Indeed, we were unable to find any material of this sort -- the equivalent of SAT verbal test preparation -- even for native Thai speakers.  We believe it is extremely important, though, as we begin to grapple with producing speakers who are competent at a high level. 
    Is an English AWL a good starting point?  Yes, we think it is, not in the least because much of the Thai academic vocabulary has been prompted by the need to translate English and other foreign language texts.  Perhaps because of this, translation of the AWL turns out to be unexpectedly difficult, particularly in deciding when to use English loans.  And, of course, it is clear that any Thai AWL must be extended in the end by local innovations.