Mon-Khmer Comparative Dictionary

About the SEAlang Mon-Khmer Etymological Dictionary

The Mon-Khmer Languages Project's primary resources are this Etymological Dictionary, built to support work in comparative and historical linguistics, and a companion Languages Database devoted to preservation and sharing of language and lexical resources. Please read the MKED Tutorial and Cookbook if you're a first-time user.

Organization Data are obtained from a range of sources, including:

etymological dictionaries that include proposed proto-language reconstructions,

comparative dictionaries that group citations into etymological or semantic sets, but do not propose reconstructions,

"linguist's lexicons" that provide careful phonemic renderings and brief glosses, and

ordinary dictionaries that may or may not include phonemic rendering.

In 2007-2009 we entered the broad set of reconstructions and citations provided by Shorto (2006), along with representative datasets for all 12 Mon-Khmer family branches. In 2009-2011, we continue to add additional language data (working down to the sub-branch level), as well as proto-language reconststructions. Future cycles will add more languages to these branches (eventually, nearly 150), and assign citations to etymological groups, using Shorto (2006) as an initial (but extensible) guide.

Current status The project has been funded by the National Endowment for the Humanities 2007 - 2011.

Capabilities The Etymological Dictionary provides four basic functions:

searching data, based on phonemic, orthographic, or semantic queries,

organizing results into comparative or historical sets,

restricting searches, based on language and/or source,

naming datasets and individual items for citation and reuse.

Developing reliable mechanisms for on-line collaboration is a central project goal. New datasets of citations, reconstructions, relations, and comments are welcome, and are readily added to the database. However, all datasets are individually identified: every user can easily decide which sets to include or exclude from searches.

Data entry and indexing As noted above, data sources are inconsistently organized. We make every effort to:

expose data for searching, e.g. by expanding bracketed reconstructions. For example, a head that is originally listed as *b[h]raap may be searched as braap or bhraap.

extend queries in a manner that meets the user's intention. For example, unvoiced consonant variants are automatically included in searches, as are breathy, creaky, dipthonged, or long vowel variations. This behavior can be overridden.

preserve non-explicit information from original sources. For example, dialect identifiers, glosses, and derivational relations may be inferred, or phonemic equivalents may be added.

Experimental features The Mon-Khmer Etymological Dictionary is an experimental laboratory as well as a working resource. Our concerns include:

community development Discussing dataset content and analysis is critical to the linguistics community. We are seeking to discover and define the middle ground between the overly restrictive methods of the past (passing manuscripts from hand to hand), and the unconstrained Wiki-style publication seen today.

query specification Historical language change and inconsistent data quality can make formulating useful phonemic queries extremely difficult. Our work on IPA query builders and both phonemic and notational approximation are intended to help account for language drift, and variations in research practice.

database design The underlying design of the Mon-Khmer database is extraordinarily simple: it contains only citations, reconstructions, comments, and relations. We wish to see if this experimental approach will continue to allow us to manipulate and extend the database, while preserving the rich web of relations that characterize comparative language data.

Please click to read the MKED Tutorial and Cookbook if you're a first-time user.