Title Details: | |
Text corpora and applications |
|
Authors: |
Tantos, Alexandros |
Reviewer: |
Gkotsoulia, Paraskevi |
Description: | |
Abstract: |
Text corpora (TC) are one of the main linguistic resources for automated natural language processing. This chapter contains basic concepts for a) the creation and b) the utilization of TC. First, after highlighting the qualitative difference between annotated and unannotated TBs, the basic criteria for selecting and classifying TBs for targeted and more effective linguistic or non-linguistic processing of text data are analyzed. The creation of a corpus is a difficult task and requires the observance of various basic criteria for the selection of texts, so that the language sample collected is representative of the linguistic variety it aspires to represent. Next, the types of corpora are presented, along with examples of how they can be used. In addition, the reader becomes familiar with the XML markup language, which is the dominant markup language for the majority of annotated corpora today. The last part of the chapter presents the basic principles of probability theory that are necessary for a number of applications in computational linguistics. In this direction, and as an example for the analysis of categorical variables related to linguistic data, hypothesis formulation and testing are used. There, the process of hypothesis testing on corpora is described step by step on the basis of a concrete example. Hypothesis testing is an essential everyday tool for processing linguistic data for computational linguists and others.
|
Technical Editors: |
Minos, Panagiotis |
Type: |
Chapter |
Creation Date: | 2015 |
Item Details: | |
License: |
http://creativecommons.org/licenses/by-nc-sa/3.0/gr |
Handle | http://hdl.handle.net/11419/2210 |
Bibliographic Reference: | Tantos, A. (2015). Text corpora and applications [Chapter]. In Tantos, A., Markantonatou, S., Anastassiadis Symeonidis, A., & Kyriakopoulou, P. 2015. Computational Linguistics [Undergraduate textbook]. Kallipos, Open Academic Editions. https://hdl.handle.net/11419/2210 |
Language: |
Greek |
Is Part of: |
Computational Linguistics |
Publication Origin: |
Kallipos, Open Academic Editions |