Taxes & Accounting

12 pages

Qualitative and Quantitative Analysis of Annotators' Agreement in the Development of Cast3LB

of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Qualitative and Quantitative Analysis of Annotators' Agreement in the Development of Cast3LB
  Qualitative and Quantitative Analysis of Annotators’ Agreement in the Development of Cast3LB M. Civit   , A. Ageno ¡  , B. Navarro ¢  , N. Bufí    , M.A. Martí     CLiC Centre de Llenguatge i ComputacióAdolf Florensa s/n (Torre Florensa) 08028 Barcelona{civit, nuria}; ¡  TALP Research Centre (UPC)Jordi Girona n o 3 08034 ¢  Departamento de Lenguajes y Sistemas InformáticosUniversidad de Alicante Campus de San Vicente del RaspeigApartado 99. 03080 1 Introduction The main objective of this paper is to present a qualitative and quantitative analy-sis of disagreement among annotators in the development of the syntactic annota-tion of Cast3LB corpus. Nowadays, this corpus is under development in the 3LBproject 1 , and includes 100.000 Spanish words. At syntactic level, more than 75%of the corpus has already been annotated.According to [12], there are three main issues in the development of Treebanks:a) systems for deriving structure automatically from unannotated language sam- ples - parsers; b) specification of schemes of annotation-targets for parser output; c) metrics for quantifying parsing accuracy. From a general point of view, these metrics 2 measure the accuracy of an anal-ysis regarding a preestablished gold-standard  . They have been used to comparedifferent analysis systems with the same reference corpus. The objective of these 1 Project supported by Spanish Government, Ministerio de Ciencia y Tecnología, PROFIT pro-gram (FIT-150500-2002-244). This work has been partially funded by the X-TRACT-II project(BFF2002-04226-C03-03), too. 2 The first definitions of these metrics appear at the PARSEVAL workshop.  metrics is to provide information about the similarity of the data, but they do notprovide information about the location of disagreement into the analysis nor aboutits nature.The linguistic annotation of corpora is a complex task. On the one hand, somelinguistic expressions appearing in corpora show problems that do not appear ingrammars (or do not appear explicitly enough). On the other hand, it is possibleto give different syntactic analyses to a given linguistic structure, all of them beingcorrect. Finally, each person has his/her particular view of the language: each oneinterprets sentences in a subjective and specific way.Since the annotation process is a teamwork, it is important to know and as-sess the degree of consistency among the analyses given by each annotator. Theannotation consistency is necessary to increase the quality of the corpus and to in-crease the utility of the corpus for the training of automatic systems or for linguisticresearch.Another important issue to which very little attention has been paid to date isthe evaluation of the accuracy among annotators. The question is: How preciselycan human beings analyze language structure? [12].There is a limit of human ability to analyze the own language, so there is alimit in the accuracy of human annotation. Deep studies about the annotators’ con-sistency are rare 3 . However, we can point out the work carried out in the NEGRAproject [5]. Nowadays, G. Sampson and A. Babarczy are working in an experimentin order to compare the output of two human analysts applying the same parsingscheme independently tothe same language samples [12]. Theyuse the SUSANNEscheme to annotate a set of 20.000 words from the BNC. The quality measure of aTreebank depends on the degree of agreement among annotators, whether there areerrors or not. As it seems that errors are unavoidable, the main goal is to reduce asmuch as possible annotators’ disagreement. But we have to take into account thateven if there is a limit in human performance, this performance is the upper-boundfor automatic language analysis. Indeed, the way in which humans solve theseproblems is the reference criteria for the automatic analysis of languages.In this work, we present a study about annotators’ agreement building theCast3LB Treebank.The first objective is to study the quantitative agreement among annotators atthe constituent annotation level (see section 3). From the quantitative agreement,we obtain measures about the annotators’ agreement, so we get the consistencyof the syntactic annotation. Results over 90% are good enough to consider the 3 There are some analyses of the accuracy in the semantic and morphological annotation [13]and [1].  resulting annotated corpus as a good resource for syntactic analysis. The secondobjective is to study the qualitative agreement among annotators. From the qualita-tive agreement, we want to analyze and classify the specific cases of disagreement.At this point, we follow the proposal of Sampson and Babarczy [12].Section 2 presents the methodology followed in this study. Section 3 presentsthe main characteristics of the Cast3LB project. In sections 4 and 5, the quantita-tive and qualitative analysis are exposed. Finally, some conclusions are given insection 6. 2 Methodology The purpose of this work is to make some contributions to the definition of amethodology for building treebanks. The main steps we have followed are:1.- definition of the main principles of the syntactic annotation;2.- annotation of a subset of 100 sentences according to these principles;3.- enlargement of the guidelines in order to increase its coverage;4.- annotation of 220 sentences and refinement of the guidelines;5.- checking of these 320 sentences (steps 2 and 4) against the annotation guide-lines, discussion and redefinition of the guidelines;6.- annotation of 650 new sentences following the new guidelines;7.- annotation of the last 30 sentences in order to perform the qualitative evaluation.In every step, an automatic evaluation of the quantitative agreement has beencarried out. 3 The Cast3LB Project The Cast3LB project is the Spanish part of the 3LB project 4 . The objective of 3LBis to build three corpora linguistically annotated: one for Catalan (Cat3LB), onefor Basque (Eus3LB) and one for Spanish (Cast 3LB) 5 .At the syntactic annotation level, we have followed two steps: the first is tobracket and tag the main constituents of each sentence; the second is to assign afunction tag to the main constituents of each sentence.The main points of the annotation scheme are 6 :   Only explicit elements of the sentences are annotated. However, since weannotate anaphoric and coreferential relations, we have decided to introduce 4 URL: 5 See [11] for more details about the composition of the corpus, the annotation levels, etc. 6 These principles are the same than the Catalan corpus Cat3LB.  a special node for elliptical subjects. Regarding the verbal ellipsis, we mark this linguistic phenomenon with the symbol (*) in the sentence tag.   We do not alter the word order. Spanish is a free constituent order languageand this order has functional and communicative contents. If we change thisorder, we lose this information and alter the srcinal data.   We follow the constituency annotation.   Finally, our aim is to develop a neutral annotation scheme, in the sense thatwe do not follow any specific linguistic theory. Our objective is to develop alinguistic annotated corpus useful for as many people as possible, so, if wefollow a specific linguistic theory, the result of the project will be close tothis theory and may become unuseful for some studies or purposes. Our ideais to give an unmarked  annotation with respect to any theory.The general annotation scheme of Cast3lb is described in [8] and [9].The corpus has previously been morphologically annotated and disambiguated,on the one hand, and automatically parsed with a chunker [7], on the other. Thework of human annotators is focused on the bracketing and labelling of each con-stituent.In order to facilitate the annotators’ task, we have adapted and developed someannotation tools: we are using an adaptation of the AGTK-toolkit [10] to do thesyntactic annotation and will use 3LB-SAT [3] to do the semantic one. 4 Quantitative Analysis In this section we present the quantitative analysis of discrepancies among annota-tors. Firstly, we describe the measures used to do so; then, we present the results,and finally we discuss them. 4.1 Description Since no specific measures for the quantitative comparison ofthe annotator’s agree-ment exist, we have decided to use some of the measures used for the evaluationof grammars and/or analysis methods. The need of an accurate evaluation whendeveloping wide coverage analysers has been plainly agreed upon. It is out of thereach of this paper to describe in detail the existing evaluation systems (as an ex-ample, two excellent reviews of the different methods defined starting from 1991,[6] and [2] can be consulted). In our case, we have decided to use what might beconsidered the first objective measures, namely the ones defined in the Parseval  workshops [4], srcinally in order to evaluate syntactic wide-coverage analysersfor the English language. Though not exclusive, its use is quite standardised forthe evaluation of grammars and/or analysis methods, comparing the similarity of the results obtained with the reference parse trees (the ones previously considered correct  , also known as gold standard  ). These similarity measures are based on thecomparison of the constituents of both parse trees, on the ground of their spanning(their initial and final position in the sentence) as well as of their label, using recall(an attempt to measure coverage) and precision (a standard measure of accuracy).The specific metrics used are the following ones:1. Labelled Precision Rate : number of constituents in the evaluated parse treethat coincide completely (both label and spanning) with one constituent inthe reference parse, divided by the total number of constituents in the evalu-ated parse tree.2. Bracketed Precision Rate : number of constituents in the evaluated parse treewhose spanning coincides with that of any constituent in the reference parse,divided by the total number of constituents of the evaluated parse tree.3. Labelled Recall Rate : number of constituents in the evaluated parse tree thatcoincide completely (both label and spanning) with one constituent in thereference parse, divided by the total number of constituents in the referenceparse tree.4. Bracketed Recall Rate : number of constituents in the evaluated parse treethat span the same as some constituent in the reference parse, divided by thetotal number of constituents of the reference parse tree.5. Consistent Brackets Recall Rate : number of constituents of the evaluatedparse tree not crossing with any constituent in the reference parse tree, di-vided by the total number of constituents of the reference parse tree. It isconsidered that a constituent whose boundaries are ¡¢£¤ crosses with an-other constituent with boundaries ¡¥¢£¥¤ iff  ¡¦¡¥¦§£¦£¥  , that is, if the boundaries overlap but no constituent is completely included in the otherone.In other words, recall indicates the proportion of correct constituents that are hy-pothesized, whereas precision is the portion of hypothesized constituents that arecorrect. Also, the two bracketed measures are less strict, since they only regardthose words of the sentence which are spanned by the constituents, ignoring thenonterminal label assigned to them. As to the Consistent Brackets Recall Rate, thismeasure is even less strict, since it considers only the proportion of constituents of 
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!