Az Eszterházy Károly Tanárképző Főiskola Tudományos Közleményei. 1996. Vol. 1. Eger Journal of English Studies.(Acta Academiae Paedagogicae Agriensis : Nova series ; Tom. 24)
Ramesh Krishnamurthy: Change and continuity at COBUILD (1986-1996)
2.5 Wordforms The 20 million word corpus gave us information about c. 250,000 wordforms. However, many of these do not constitute valid candidates for dictionary entries. Most proper names need to be excluded. Regular inflected forms (such as plural forms of nouns, comparative forms of adjectives and adverbs, and inflected forms of verbs) that show no semantic, syntactic, or stylistic deviations from the base form, will be subsumed under the entry for the base form. On the other hand, multi-word items (such as phrasal verbs, noun compounds, idiomatic phrases) serve to extend the final inventory. The 211 million word corpus represented a tenfold increase in overall corpus size, but yielded only a two-fold increase in the number of wordforms (c. 500,000). The proportions of proper names and regular inflected forms remained roughly the same. 2.6 Frequency The increase in corpus size should not be considered solely in terms of number of wordforms. The frequency of occurrence of each wordform is also extremely significant for lexicography. However, the increased frequency is of little benefit to the lexicographer in the analysis of the very common words, because they do not vary a great deal in corpus frequency rank or in linguistic usage, but merely reflect the tenfold increase in overall corpus size: 20m corpus (1987) Number of Word occurrences 211m corpus (1995) Number of Word occurrences the 1,081,654 the 11,611,078 of 535,391 of 5,359,185 and 511,333 to 5,180,130 to 479,191 and 4,941,561 a 419,798 a 4,537,660 in 334,183 in 3,796,752 that 215,322 that 2,226,871 it 198,578 it 1,954,556 i 197,055 is 1,940,162 was 194,286 for 1,794,630 63