The Frequency Dictionaries series aims at producing dictionaries with comparable frequency data for a large number of different languages. For many of the languages featured in this collection, this series is the first comprehensive compilation to use a large-scale empirical base.
The dictionaries are available in both print and electronic versions. Each dictionary provides the most frequent 1,000 word forms in order of frequency and the 10,000 most frequent word forms in alphabetical order. They provide an introductory description of the data and the methodological approach used. In addition, language-specific statistical information is provided with regard to letters, word structure and structural changes.
The enclosed CD-ROM contains a more comprehensive version of the dictionary as an e-book. It includes data on the relative frequency of up to 1,000,000 word forms. For less-resourced languages the lists are shorter due to the reduced size of the corpora. The Georgian word list of this volume contains 1,000,000 word forms. This list of words (with frequency classes) is also available as a plain text file on the CD-ROM, ordered both alphabetically and by frequency. Using this file, word lists for various applications can be generated easily. The word forms in the printed part of the dictionary have been checked carefully by hand to identify incorrect forms. By contrast, the more comprehensive list on the CD-ROM has been inspected by means of automatic plausibility criteria alone.
For the compilation, comprehensive electronically available sources of the Leipzig Corpora Collection were used consistently. The corpora on which the individual frequency dictionaries are based include newspaper texts, Wikipedia articles and other randomly collected texts available on the Internet. They can be accessed online at http://corpora.uni-leipzig.de/. This series of dictionaries provides the opportunity to explore comparative linguistic topics and such monolingual issues as studies on word formation and frequency-based examinations of lexical areas for use in dictionaries or language teaching. The statistical results presented here can offer initial suggestions for several areas of research. The title of each frequency dictionary always includes the name of the language in English, in the original language and its three-letter abbreviation according to ISO 639-3.
The Frequency Dictionaries series aims at producing dictionaries with comparable frequency data for a large number of different languages. For many of the languages featured in this collection, this