About the Mon-Khmer Languages Database |
The Mon-Khmer Languages Project's primary resources are this Languages Database,
devoted to preservation and sharing of language and lexical resources, and a
companion Etymological Dictionary
built to support work in comparative and historical linguistics.
Sources of Language Data The Mon-Khmer Languages project is collecting data on some 150 languages. Our primary sources are:
published dictionaries As a rule, smaller dictionaries designed
for linguistic reference and research are digitized word-for-word, while
larger language reference works may be condensed.
unpublished notes and lexicons Much valuable work on languages
that are highly endangered, or are spoken in areas that are now relatively
inaccessable for political reasons, can only be found in archival storage.
contributed data Many researchers have data that was insufficiently
analyzed for their original research purposes, but is nevertheless of value for
comparative linguistics.
Attribution / naming / glossing Every data item is uniquely labeled with its author (three-letter abbreviation), year of publication, an original identifying label or number if provided, and possibly a SEAlang "C" (for "citation") or "R" (for "reconstruction") number.
Some etymological and comparative dictionaries do not necessarily gloss
all of the citations that follow a reconstruction or vice versa.
In these cases an interpolated gloss will be completely parenthesized.
"Dummy" entries (marked as clusters) are sometimes added to help
group etymologically related terms; e.g. when the underlying dictionary has
been developed along semantic rather than comparative lines.
Typical IPA and gloss values are interpolated from the cluster members.
Some of the more obscure works cited on the left have been extracted
from larger comparative dictionaries.
These items have compound identifiers that show both the
original work, and the borrower's usage, e.g. The1980:C:Sid2005~25-3
indicates that Sidwell 2005's entry 25-3 quotes or paraphrases
Theraphan 1980.
Returned data Data can be viewed in this frame, saved to a file, or requested and interpreted (via XMLHttpRequest and eval). Four formats are provided:
HTML This provides a ready-to-view html page, as appears in this frame.
Tab-separated text Data is readily loaded into a spreadsheet and/or reformatted.
The top line contains labels.
XML extract Data is completely tagged, but no formal DTD is provided.
JSON Data is serialized in JavaScript Object Notation for ready
interpolation into Web pages.
Formal descriptions will be provided.
Font requirements All text is provided in Unicode UTF-8 encoding. IPA input and display prefer either Arial Unicode MS or one of Charis SIL or Doulos SIL for proper rendering. |