SEAlang Library -- Corpus Help

Corpus Searches

The SEAlang Library provides both bitext (two aligned languages) and monotext corpora. In all searches, every context is segmented on the fly. It is only returned if both the left- and right-hand sides appear to be genuine words (search raw contexts to override this).

type use monolingual contexts Does a word-in-context search (similar to concordancing), and shows example contexts for each of the left- and right-hand collocates. This provides practical usage examples. Clicking any of these starts a new collocate search. monolingual collocates Provides a bird's-eye view of the word's immediate neighbors, showing both left- and right-hand collocates. Clicking any of these starts a new context search. raw contexts Searches the monolingual corpus, and shows the target in context without attempting any analysis or error-correction. This is necessary for finding names or words whose collocates might not be in the dictionary. test contexts The monolingual corpus is very large, but most typical behavior can be found in a much smaller subsample. The test contexts control limits the sample size - we search the corpus for all examples, then select this many at random for further investigation. examples per collocate The relation between a word and each collocate can usually be made clear with just a few examples. Because the test contexts are sampled at random, a new search will usually provide new examples for each collocate.

    A context search returns words in typical contexts. Below, using Thai examples, we first see five examples of the most common left-hand collocate, which accounted for 24.9% of left-hand matches. Then, we have five examples of the most common right-hand collocate - here, _, which represents a space or end-of-line. Thus, the search word ended a phrase or sentence 20% of the time. Note that the left- and right-hand contexts are mixed

A collocate search concentrates on showing the search word's immediate neighbors. For convenience, a combined collocates/context search has the context information embedded in the page; and can be seen by pressing the yellow boxes. Words that appear in blue are also dictionary entries.

If you look closely, you'll see that the figures in the two examples don't match exactly (the most common left- and right-hand matches occur 23.7% and 18.2% of the time in the second examples). This is because (as noted above) we look at a different random sample each time.