About the SEAlang Mon-Khmer Etymological Dictionary |
The Mon-Khmer Languages Project's primary resources are this Etymological Dictionary,
built to support work in comparative and historical linguistics, and a companion
Languages Database devoted
to preservation and sharing of language and lexical resources.
Please read the
MKED Tutorial and Cookbook
if you're a first-time user.
Organization Data are obtained from a range of sources, including:
etymological dictionaries
that include proposed proto-language reconstructions,
comparative dictionaries that group citations into
etymological or semantic sets, but do not propose reconstructions,
"linguist's lexicons" that provide careful phonemic
renderings and brief glosses, and
ordinary dictionaries that may or may not include
phonemic rendering.
In 2007-2009 we entered
the broad set of reconstructions and citations provided
by Shorto (2006), along with representative datasets for
all 12 Mon-Khmer family branches.
In 2009-2011, we continue to add additional language
data (working down to the sub-branch level), as well
as proto-language reconststructions.
Future cycles will add more languages to these branches
(eventually, nearly 150), and assign citations to
etymological groups,
using Shorto (2006) as an initial (but extensible) guide.
Current status The project has been funded by the National Endowment for the Humanities 2007 - 2011. Capabilities The Etymological Dictionary provides four basic functions:
searching data, based on phonemic, orthographic, or semantic queries,
organizing results into comparative or historical sets,
restricting searches, based on language and/or source,
naming datasets and individual items for citation and reuse.
Developing reliable mechanisms for on-line
collaboration is a central project goal.
New datasets of citations, reconstructions, relations, and
comments are welcome, and are readily added to the database.
However, all datasets are individually identified: every
user can easily decide which sets to include or exclude from searches.
Data entry and indexing As noted above, data sources are inconsistently organized. We make every effort to:
expose data for searching, e.g. by expanding bracketed
reconstructions. For example, a head that is originally listed
as *b[h]raap may be searched as
braap or bhraap.
extend queries in a manner that meets the user's intention.
For example, unvoiced consonant variants are automatically included
in searches, as are breathy, creaky, dipthonged, or long vowel variations.
This behavior can be overridden.
preserve non-explicit information from original sources.
For example, dialect identifiers, glosses, and derivational relations
may be inferred, or phonemic equivalents may be added.
Experimental features The Mon-Khmer Etymological Dictionary is an experimental laboratory as well as a working resource. Our concerns include:
community development
Discussing dataset content and analysis is critical
to the linguistics community.
We are seeking to discover and define the middle ground
between the overly restrictive methods of the past (passing
manuscripts from hand to hand), and the unconstrained Wiki-style
publication seen today.
query specification
Historical language change and inconsistent data quality
can make formulating useful phonemic queries extremely difficult.
Our work on IPA query builders and both phonemic and notational
approximation are intended to help account for
language drift, and variations in research practice.
database design
The underlying design of the Mon-Khmer database is extraordinarily
simple: it contains only citations, reconstructions, comments, and
relations.
We wish to see if this experimental approach will continue to allow us
to manipulate and extend the database, while preserving the
rich web of relations that characterize comparative language data.
Please click to read the
MKED Tutorial and Cookbook
if you're a first-time user.
|