Khmer

 

[This document is for discussion, and should not yet be used as a reference.  It closely follows pages 96-98 of the ALA-LC Romanization Tables, with annotations and typographic corrections by Doug Cooper, January 2006  An X entry represents a character from the original table that has not yet been entered here.]

 

Consonants

 

Full Form

Subscript

Romanization

 

Full Form

Subscript

Romanization

 

្ក

k

 

្ទ

d

 

្ខ

kh

 

្ធ

dh

 

្គ

g

 

្ន

n

 

្ឃ

gh

 

្ប

p

 

្ង

 

្ផ

ph

 

្ច

c

 

្ព

b

 

្ឆ

ch

 

្ភ

bh

 

្ជ

j

 

្ម

m

 

្ឈ

jh

 

្យ

y

 

្ញ  or X

ñ

 

្រ

r

 

្ត

 

្ល

l

 

្ឋ

h

 

្វ

v

 

្ឌ

 

្ស

ś

 

្ឍ

h

 

្ហ

 

្ណ

 

-

 

្ត

t

 

្អ

q

 

្ថ

th

 

 

 

 

 

 

[Independent] Vowels

 

Independent

Romanization

i

ī

u

or

ū

e

ai

or

o

or X

au

 (not ṛ ṝ ḷ ḹ)

r̥̄

l̥̄

 

[Dependent] Vowels

 

Dependent

Romanization

-

-a

- -

-a-

-

-á-

-

-ă-

ា់

-â-

-i

-ẏ

-u

-ua

-oe

-ẏa

-ia

-e

-ae

-ai

-o

-au

-aṃ

-aḥ

 

 

Diacritical Marks

 

Vernacular

Alternative

Romanization

˝ (hard sign)

ʹ (soft sign (prime))

 

r-

 

- ̊ (circle above)  [See note 7]

 

 (alif)

 

(ayn)

X

 

-˙  (dot above)  [See note 7]

 

 

Notes

 

1.  In the consonant portion of this romanization table, the special character – shows the position of a Khmer script character below which a subscript character is written.  A subscript character is always romanized after a full form character, without an intervening vowel, as in ក្រខ្វាក់ (krakhvák).

2.  When (ñ) occurs with a subscript character, the lower element is omitted, as in ញ្ច (ñj).  When occurs as its own subscript, it takes the full form , as in កញ្ញា (kaññā).  Otherwise, the subscript has the form of the lower element alone, as in ខ្ញ (khñ).

3.  The consonant (p), followed by the vowel (ā), takes the special form បា.

4.  In the vowel columns, - shows the position of the consonant relative to the vowel.  This applies to both the Khmer vernacular and to the romanization columns.  It should be noted that – in the Khmer vernacular column can also represent a final consonant with no vowel following, in which case it is simply romanized as - , as in ទ័ព (dăb).

5.  The consonants () and () are always preceded by a vowel, but, being finals, never themselves bear a vowel.  Vowels other than a may precede them, as in ដុំ (ṭuṃ), សេះ (seḥ).

6.  The diacritics and are romanized by ˝ and ʹ respectively, immediately following the consonant they modify.  They have the alternative form when they co-occur with one of the superscript vowels , , , and .  When – co-occurs with one of the superscript vowels and with one of the consonants , , , , , , or , it is romanized as ˝ , as in ប៉ី (p˝ī ).  When co-occurs with one of the superscript vowels and with one of the consonants , , or , it is romanized  ʹ , as in ស៊ី (sʹī ).  Otherwise, represents the vowel u, as in មុន (mun).

7.  The diacritics -˚, -ʼ, -ʻ, and -˙  in the romanization column are placed after the last letter of the word in which they occur, as in ក្សត្រីយ៍ (ksatriy˚ ); ច៎ាះ (cāḥ); ដ៏ (ṭaʻ); អាត្មន (qātman˙ ).

[Ed:  note that this calls for ring above (U+02DA) and dot above (U+02D9), not combining ring above (U+030A) and combining dot above (U+0307) as specified by the MARC code for Character Modifiers, below.  Ring above and dot above should be added to the Special Characters table).

8.  Conventional signs are:  , romanized by repeating the preceding word or phrase;  ។ល។ romanized as .l. ; ។ប។ , romanized as .p. ; X (underscore?) , romanized by means of a hyphen ( - ); , romanized by means of a colon (: ); and and , romanized by means of a period (. ).  The signs and are omitted in romanization.

9.  The numerals are (0), (1), (2), (3), (4), (5), (6), (7), (8), and (9).

10.  Khmer words are not written separately, and spacing occurs only after longer phrases.  When romanizing, the shortest written form which can stand alone as a word is treated as such.  This applies also to Pali and Sanskrit loan-words.  Other loan-words are divided as in the original language.

Special Characters and Character Modifiers used in Romanization
(See the MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media / CHARACTER SETS: Part 3 / Code Tables / January 2000, updated September 2004)

http://www.loc.gov/marc/specifications/specchartables.html
http://lcweb2.loc.gov/cocoon/codetables/45.html  (Extended Latin)

http://www.atla.com/tsig/LatinCharacters/Latin%20Characters%20in%20Unicode%20and%20MARC-8.pdf  (full set of examples)