Mon-Khmer Languages Databasey

About the Mon-Khmer Languages Database

The Mon-Khmer Languages Project's primary resources are this Languages Database, devoted to preservation and sharing of language and lexical resources, and a companion Etymological Dictionary built to support work in comparative and historical linguistics.

Sources of Language Data
The Mon-Khmer Languages project is collecting data on some 150 languages. Our primary sources are:

published dictionaries As a rule, smaller dictionaries designed for linguistic reference and research are digitized word-for-word, while larger language reference works may be condensed.

unpublished notes and lexicons Much valuable work on languages that are highly endangered, or are spoken in areas that are now relatively inaccessable for political reasons, can only be found in archival storage.

contributed data Many researchers have data that was insufficiently analyzed for their original research purposes, but is nevertheless of value for comparative linguistics.

Attribution / naming / glossing
Every data item is uniquely labeled with its author (three-letter abbreviation), year of publication, an original identifying label or number if provided, and possibly a SEAlang "C" (for "citation") or "R" (for "reconstruction") number.

Some etymological and comparative dictionaries do not necessarily gloss all of the citations that follow a reconstruction or vice versa. In these cases an interpolated gloss will be completely parenthesized. "Dummy" entries (marked as clusters) are sometimes added to help group etymologically related terms; e.g. when the underlying dictionary has been developed along semantic rather than comparative lines. Typical IPA and gloss values are interpolated from the cluster members.

Some of the more obscure works cited on the left have been extracted from larger comparative dictionaries. These items have compound identifiers that show both the original work, and the borrower's usage, e.g. The1980:C:Sid2005~25-3 indicates that Sidwell 2005's entry 25-3 quotes or paraphrases Theraphan 1980.

Returned data
Data can be viewed in this frame, saved to a file, or requested and interpreted (via XMLHttpRequest and eval). Four formats are provided:

HTML This provides a ready-to-view html page, as appears in this frame.

Tab-separated text Data is readily loaded into a spreadsheet and/or reformatted. The top line contains labels.

XML extract Data is completely tagged, but no formal DTD is provided.

JSON Data is serialized in JavaScript Object Notation for ready interpolation into Web pages.

Formal descriptions will be provided.

Font requirements
All text is provided in Unicode UTF-8 encoding. IPA input and display prefer either Arial Unicode MS or one of Charis SIL or Doulos SIL for proper rendering.