Automatic Detection of Contradictions in Texts Inaugural-Dissertation zur Erlangung des Doktorgrades der Philosophie des Fachbereiches 05 – Sprache, Literatur, Kultur der Justus-Liebig-Universität Gießen vorgelegt von Natali Karlova-Bourbonus, M.A. Aus Frankfurt am Main 2018 Vorsitzender: Herr Prof. Dr. Thomas Möbius Erstgutachter: Herr Prof. Dr. Henning Lobin Zweitgutachter: Herr Prof. Dr. Helmut Feilke For my little prince Philipp Acknowledgments Five years are now passed since I began my doctoral thesis on contradictions in news texts and at this point, I would like to thank everybody who helped me in finalizing it. First of all, I would like to express my special thanks of gratitude to my Doktorvater, Prof. Dr. Henning Lobin, who gave me a priceless opportunity to collect new experience in teach- ing and researching and to learn how exciting natural language processing can be. Dear Henning, thank you for supporting me the whole, often thorny, way to the completed mon- ograph from the first half-baked ideas till the last dot on the paper. I would also like to express my special thanks of gratitude to my second supervisor, Prof. Dr. Helmut Feilke. Participation in his GCSC-Kolloquium in the year 2014 led to a long- desired breakthrough in my research. I am grateful for his sharing of precious ideas with me, which laid the groundwork for my research. While reading the present monograph you will come across the statement several times that finding contradictions is a challenging task for a human. I would like to thank all the students at the University of Giessen who did not let this challenge discourage them and who, in the years 2015 and 2016, took part on my surveys conducted within the framework of the present study. Their patience, curiosity, and ability to explore latent things in the texts are admirable. Last but not least, I would like to thank the most important people in my life – my family. My parents, who always let me go my way, who never judged me for my decisions, and patiently and gratuitous supported me in seemingly inextricable moments. My husband Nico, who inspired and motivated me to make possible what, in the beginning, seemed to be impossi- ble. And my brother Denis for the exciting conversations on the processing of natural lan- guages. Table of Contents I Table of Contents Page List of Abbreviations ...................................................................................................... V List of Figures .............................................................................................................. VIII List of Tables.................................................................................................................. IX 1 Introduction ............................................................................................................. 11 1.1 Statement of Problem and Motivation .............................................................. 11 1.2 Subject of the Study ......................................................................................... 16 1.3 Research Questions and Objectives ................................................................ 17 1.4 Structure of the Thesis ..................................................................................... 18 1.5 How to Read This Thesis ................................................................................. 20 2 State of the Art ........................................................................................................ 21 2.1 Methods and Systems ..................................................................................... 21 2.2 Corpora of Contradictions ................................................................................ 32 2.2.1 FraCas Inference Data Suite ................................................................ 32 2.2.2 RTE Datasets and Their Modifications .................................................. 34 2.2.3 Stanford Corpus of Real-Life Contradictions ......................................... 37 2.2.4 SNLI Corpus ......................................................................................... 38 2.3 Summary ........................................................................................................ 39 3 Contradiction in Logic and Language ................................................................... 41 3.1 Contradiction as Concept of Logic ................................................................... 41 3.1.1 Contradiction in Traditional (Aristotelian) Logic ..................................... 41 3.1.2 Contradiction and Contrariety ............................................................... 43 3.1.3 Contradiction and Quantification ........................................................... 43 3.1.4 Related Terms ...................................................................................... 45 3.2 Negation in Natural Languages ........................................................................ 46 3.2.1 Typology of Negation ............................................................................ 46 3.2.2 Negation and Word Order ..................................................................... 52 3.2.3 Scope of Negation ................................................................................ 52 3.2.4 Double and Multiple Negation ............................................................... 55 3.3 Problematic Issues about Contradiction ........................................................... 59 3.3.1 Contradiction and Presupposition ......................................................... 59 3.3.2 Contradiction and Modality ................................................................... 61 3.3.3 Contradiction and Vagueness ............................................................... 64 3.3.4 The Concept of Fake Contradiction ...................................................... 66 Table of Contents II 3.4 Classification of Contradictions ........................................................................ 67 3.4.1 Classification of Svintsov ...................................................................... 67 3.4.2 Classification of Mučnik ........................................................................ 69 3.4.3 Typology according to Contradictory Element ....................................... 71 3.4.3.1 In Educational Psychology ...................................................... 71 3.4.3.2 In Computational Linguistics ................................................... 73 3.5 Causes and Functions of Contradictions .......................................................... 76 3.5.1 Causes ................................................................................................. 76 3.5.2 Functions .............................................................................................. 79 3.6 Summary ........................................................................................................ 81 4 The Characteristics of News Texts ........................................................................ 83 4.1 Introductory Notions ......................................................................................... 83 4.1.1 News as Text Genre ............................................................................. 83 4.1.2 Online Newspaper vs. Printed Newspaper............................................ 86 4.1.3 Values of Selection and Production of News Articles ............................ 88 4.2 Structure and Elements of News Articles ......................................................... 91 4.3 Language of News Article ................................................................................ 95 4.3.1 Reported Speech.................................................................................. 95 4.3.1.1 News as an “embedded talk” .................................................. 95 4.3.1.2 Types of Reported Speech ..................................................... 96 4.3.1.3 Reporting Expressions ............................................................ 98 4.3.2 News Actors Labeling ........................................................................... 99 4.3.3 Event Categories .................................................................................. 99 4.3.4 Time and Place Mentions ................................................................... 100 4.3.5 The Use of Numbers and Figures ....................................................... 101 4.3.6 Other Characteristics .......................................................................... 102 4.4 Summary ...................................................................................................... 102 5 Typology Construction: Types of Contradictions in News Texts ...................... 104 5.1 Compilation of News Text Contradiction Corpus ............................................ 104 5.1.1 Data Collection ................................................................................... 104 5.1.1.1 Collection of News Texts ....................................................... 104 5.1.1.2 Survey 1: Finding the Contradictions..................................... 106 5.1.1.3 Results and Evaluation ......................................................... 107 5.1.2 Data Validation and Filtering ............................................................... 108 5.1.2.1 Survey 2: Contradiction or Not .............................................. 108 5.1.2.2 Results and Evaluation ......................................................... 109 5.2 Typology of Contradictions ............................................................................. 110 Table of Contents III 5.2.1 Dimension: Contradiction Cues .......................................................... 110 5.2.2 Dimension: Relatedness of the Parts .................................................. 116 5.3 Giessen Annotated Corpus of Contradictions in News Texts ......................... 127 5.3.1 Survey 3: Typology Validation, Results, and Evaluation ..................... 127 5.3.2 Corpus Annotation .............................................................................. 128 5.4 Summary ...................................................................................................... 131 6 Conceptual Design of a CD System and Supporting Tools ............................... 133 6.1 Conceptual Design......................................................................................... 133 6.2 Supporting Tools and Methods ...................................................................... 134 6.2.1 Processing at Lexical, Morphological and Syntax Levels .................... 134 6.2.1.1 Tokenization and Sentence Splitting ..................................... 134 6.2.1.2 Stop Word Detection and Removing ..................................... 136 6.2.1.3 Part-of-speech Tagging......................................................... 137 6.2.1.4 Stemming and Lemmatization ............................................... 138 6.2.1.5 Parsing and Chunking ........................................................... 140 6.2.2 Processing at Semantic, Pragmatic and Discourse Levels ................. 141 6.2.2.1 Semantic Role Labeling ........................................................ 141 6.2.2.2 Recognizing Textual Entailment ............................................ 145 6.2.2.3 Anaphora Resolution ............................................................ 148 6.2.3 Further Processing Tasks ................................................................... 155 6.2.3.1 Negation and Modality Processing ........................................ 155 6.2.3.2 Sentiment Analysis ............................................................... 157 6.2.3.3 Named-Entity Recognition .................................................... 159 6.2.3.4 Temporal Processing ............................................................ 163 6.2.3.5 Measuring Semantic Textual Similarity ................................. 165 6.2.4 Approaches to Meaning Representation ............................................. 166 6.2.5 Computational Sources of Knowledge ................................................ 169 6.2.5.1 Lexical Resources ................................................................. 169 6.2.5.2 Ontologies............................................................................. 173 6.3 Summary ...................................................................................................... 176 7 Physical Design of a CD System and Implementation ....................................... 178 7.1 System Architecture and Potential Contradiction ........................................... 178 7.2 System Implementation ................................................................................. 181 7.2.1 Module Preprocessing ........................................................................ 181 7.2.2 Module Finding Parts of Contradiction ................................................ 183 7.2.3 Module Finding Contradictions ........................................................... 187 7.3 Results and Evaluation .................................................................................. 190 Table of Contents IV 8 Conclusions .......................................................................................................... 194 Bibliography ................................................................................................................. 198 Appendix A. Survey 1: An Example of a Questionnaire ............................................ 229 Appendix B. Survey 2 and A Test for Contradiction ................................................. 257 List of Abbreviations V List of Abbreviations ACE Automatic Content Extraction Programm ACL Association for Computational Linguistics BART Baltimore Anaphora Resolution Toolkit, currently Beautiful Anaphora Resolution Toolkit CBOW Continuous Bag of Words CCG Componential Counting Grid CD Contradiction Detection CFG Context-free Grammar COLING International Conference on Computational Linguistics CoNLL The SIGNLL Conference on Computational Natural Language Learning CoU Context of Utterance DIRT Discovery of Inference Rules from Text DNA Duplex Negatio Affirmat DNN Duplex Negatio Negat DOLCE Descriptive Ontology for Linguistic and Cognitive Engineering DRT Discourse Representation Theory DSM Distributional Semantic Model EDITS Edit Distance Textual Entailment Suite EMD Earth Mover’s Distance EOP EXCITEMENT Open Platform ESA Explicit Semantic Analysis EXCITEMENT EXploring Customer Interactions through Textual EntailMENT FOL First-Order Logic FraCas A Framework for Computational Semantics GLSA Generalized Latent Semantic Analysis HAL Hyperspace Analogue to Language HLBL Hierarchical Log-bilinear Model HOTCoref Higher Order Tree Coreference List of Abbreviations VI kNN k-Nearest-Neighbor LDA Latent Dirichlet Allocation LDC Linguistic Data Consortium LEM Law of Excluded Middle LNC Law of Non-Contradiction LSA Latent Semantic Analysis LSI Latent Semantic Indexing mSDA Marginaized Stacked Denoising Autoencoders MUC Message Understanding Conference NC Negative Concord NER Named Entity Recognition NLP Natural Language Processing NLTK Natural Language Toolkit NLU Natural Language Understanding NomBank Noun Annotation Bank NPI Negative Polarity Item Okapi BM25 Okapi Best Matching 25 OWL Web Ontology Language PropBank Propositional Bank RDF Resource Description Framework RST Rhetorical Structure Theory RTE Recognizing Textual Entailment S, V and O Subject, Verb and Object SDRT Segmented Discourse Representation Theory SIGLEX Special Interest Group on the Lexicon of the Association for Compu- tational Linguistics SIGNLL Special Interest Group on Natural Language Learning of the Associ- ation for Computational Linguistics SNLI Stanford Natural Language Inference STS Semantic Textual Similarity List of Abbreviations VII SUMO Suggested Upper Merged Ontology SVM Support Vector Machines T and H Text and Hypothesis TF-IDF Term Frequency – Inverse Document Frequency TnT Trigrams’n‘Tags TREC Text Retrieval Conference VENSES Venice Semantic Evaluation System WMD Word Mover’s Distance XML Extensible Markup Language YAGO Yet Another Great Ontology List of Figures VIII List of Figures Page Figure 1: Square of Opposition. .................................................................................... 44 Figure 2: Multiple negation in English and other languages. ..................................... 55 Figure 3: Classification of contradictions proposed in Mučnik (1985). ..................... 69 Figure 4: XML annotation layout of the corpus. ........................................................ 129 Figure 5: Distribution of contradictions in the corpus according to contradiction cues. ............................................................................................................. 130 Figure 6: Distribution of types of contradictions in the Giessen Annotated Corpus of Contradictions in News Texts. .................................................................. 131 Figure 7: Meaning of John pushed the cart as a conceptual dependency graph. .. 167 Figure 8: The multiple senses of the noun dog as represented in WordNet. .......... 170 Figure 9: Excerpt of the DBpedia. Note: From DBpedia and the live extraction of structured data from Wikipedia (Morsey et al. 2012: 5). .............................. 175 Figure 10: The knowledge of the concept eat sandwich as represented in ConceptNet. .................................................................................................... 176 Figure 11: The architecture of the Contradictio system. ........................................... 179 Figure 12: An information graph for the noun Obama from the sentence, Obama speaks to the media in Illinois. ...................................................................... 183 Figure 13: The idea underlying word mover’s distance model. Note: From From Word Embeddings to Document Distances (Kusner et al. 2015: 957). ....... 185 List of Tables IX List of Tables Page Table 1: CD Systems submitted for the RTE-3 challenge – Extended Task (2007). .. 24 Table 2: CD systems submitted for the RTE-4 challenge (2008). ............................... 25 Table 3: CD systems submitted for the RTE-5 challenge (2009). ............................... 26 Table 4: Standalone CD systems. ................................................................................. 27 Table 5: Comparison of approaches to graph alignment applied in de Marneffe et al. (2008). ............................................................................................................... 31 Table 6: The distribution of premises in the FraCas corpus....................................... 33 Table 7: The distribution of answers in the FraCas corpus. ....................................... 33 Table 8: The distribution of the problems per group in the FraCas corpus. ............. 34 Table 9: The statistics on RTE datasets partially adapted from Bentivogli et al. (2009). ............................................................................................................... 35 Table 10: Number of contradictions in the RTE-1, RTE-2, and RTE-3 datasets. ........ 36 Table 11: Distribution of contradictions occurring in the RTE-3 development dataset according to the contradiction type. ............................................................... 37 Table 12: Distribution of contradictions occurring in the Stanford Corpus of Real- Life Contradictions according to the contradiction type. .............................. 38 Table 13: Distribution of contradictions, entailments, neutral, and unlabeled cases in the SNLI corpus. .......................................................................................... 38 Table 14: Contradiction types distinguished in de Marneffe et al. (2008: 1041) with examples. .......................................................................................................... 75 Table 15: Definitions of Aristotle’s fallacies as provided in Parry/Hacker (1991: 423- 457). ............................................................................................................... 78 Table 16: Types of reported speech. Note: From News discourse (Bednarek/Caple 2012: 92)............................................................................................................ 97 Table 17: The distribution of news articles according to the topic of the news story, their source, and date of publishing. ............................................................ 105 Table 18: Distribution of English language competencies. ...................................... 107 Table 19: Distribution of agreement and disagreement on contradictions among the raters. ............................................................................................................. 109 Table 20: The scores of inter-rater agreements on contradictions before and after discussion, computed with Fleiss’s Kappa and Krippendorff’s Alpha. ..... 109 Table 21: Knowledge-based inferences from “How Leisure Came”. Note: From Constructing inferences during narrative text comprehension (Graesser et al. 1994: 375). .................................................................................................. 120 List of Tables X Table 22: Alleged presupposition triggers as listed in Potts (2015), Beaver and Geurts (2014), and Levinson (1983). ............................................................. 122 Table 23: The scores of inter-rater agreements on the type of relatedness, contradiction cue, and contradiction type, before and after discussion, computed with Fleiss’s Kappa and Krippendorff’s Alpha. .......................... 128 Table 24: Universal thematic roles. Note: From Understanding semantics (Löbner 2013: 123). ....................................................................................................... 141 Table 25: Simplified VerbNet entry for the hit-18.1 class. ......................................... 144 Table 26: Overview of anaphora resolution methods (*Types of cohesion tie are not treated in detail). Note: From Anaphora resolution and text retrieval (Schmolz 2015: 236). ...................................................................................... 152 Table 27: Available corpora with annotated named entities. .................................... 162 Table 28: A DSM for the concept dog. Note: From DSM Tutorial (Stefen Evert et al. 2009-2016). ...................................................................................................... 169 Table 29: WordNet 2.1, 3.0 and 3.1 database statistics. ............................................ 171 Table 30: Confusion matrix for system’s performance (in columns) compared to the Gold standard (in rows). ................................................................................ 191 Table 31: The overall performance of the system evaluated using precision and recall. ............................................................................................................. 191 Table 32: Confusion matrix for contradiction types found by the system and contained in the dataset (Rows: Gold standard, Columns: Contradictio system). .......................................................................................................... 192 Table 33: Confusion matrix for the system performance in finding parts of contradiction (Gold standard in rows, system performance in columns). . 193 Introduction 11 1 Introduction 1.1 Statement of Problem and Motivation Please read carefully the following text passage from Daniel Defoe's adventure novel Rob- inson Crusoe1 (Defoe 2001: 58-59): “When I came down from my apartment in the tree I looked about me again, and the first thing I found was the boat, which lay as the wind and the sea had tossed her up upon the land, about two miles on my right hand. I walked as far as I could upon the shore to have got to her; but found a neck or inlet of water between me and the boat, which was about half a mile broad […]. I resolved, if possible, to get to the ship; so I pulled off my clothes, for the weather was hot to extremity, and took the water. But when I came to the ship, my difficulty was still greater to know how to get on board; for as she lay aground, and high out of the water, there was nothing within my reach to lay hold of. I swam round her twice, and the second time I spied a small piece of rope, which I wondered I did not see at first, hang down by the fore-chains so low as that with great difficulty I got hold of it, and by the help of that rope got up into the forecastle of the ship. Here I found that the ship was bulged, and had a great deal of water in her hold, but that she lay so on the side of a bank of hard sand, or rather earth, that her stern lay lifted up upon the bank, and her head low, almost to the water. By this means all her quarter was free, and all that was in that part was dry; for you may be sure my first work was to search and to see what was spoiled and what was free. And first I found that all the ship's provisions were dry and untouched by the water; and being very well disposed to eat, I went to the bread-room and filled my pockets with biscuit, and eat it as I went about other things, for I had no time to lose.” Could you recognize the contradiction in the passage? The described scene is a topic of many discussions concerning the work of Daniel Defoe (e.g. Baines 2007) and, in general, dealing with logical mistakes occurring in texts of Classic Literature. Robinson Crusoe (referred to by the pronoun I during the whole text passage) was intended to get to the wrecked ship. As there was a neck of water between, he had to swim. Before immediately swimming, and as the weather was hot, Robinson took off his clothes. After reaching the ship and being on it for some time, Crusoe found out that some of the provisions had remained dry. He, therefore, went to the bread-room and filled his pockets with biscuits (I filled my pockets with biscuits). But how could he do this? Filling the pockets with anything presupposes some clothes with pockets at the time of filling. But as we have been told at the beginning of the text passage, Robinson Crusoe had taken off his clothes before swimming. From reading this at the beginning of the text passage, the reader 1 The full title of the novel is The Life and Strange Surprising Adventures of Robinson Crusoe, Of York, Mar- iner: Who lived Eight and Twenty Years, all alone in an un-inhabited Island on the Coast of America, near the Mouth of the Great River of Oroonoque; Having been cast on Shore by Shipwreck, wherein all the Men perished but himself. With an Account how he was at last as strangely deliver'd by Pyrates. The novel was first published in 1719. Introduction 12 inferred that Robinson Crusoe did not have pockets during his stay aboard the ship. There was also no information or further clues that Crusoe had found and worn any clothes and put them on. We obviously deal with a contradiction here – two statements express propo- sitions that cannot be true at the same time with the same respect. It is beyond debate that the recognition of contradictions presents a challenging task for the reader (Markman 1979; Garner 1980, 1981) and especially, for the poor readers (Garner 1980, 1981; Winograd/Johnston 1982). How well an individual performs in detecting con- tradiction depends on the state of his language and world knowledge, analytical ability, memory as well as his individual characteristics such as, e.g., age (Kotsonis/Patterson 1980; Chan et al. 1987; Vosniadou et al. 1988; Otero/Campanario 1990). The type of con- tradictions (Markman 1979; Markman/Gorin 1981; Harris et al. 1981; Flavell et al. 1981; Paris/Myers 1981; Garner 1981; Baker 1985) and the preceding notification about the pres- ence of contradictions in the text (Winograd/Johnston 1982; Glenberg et al. 1982; August et al. 1984; Baker/Zimlin 1989) can be crucial for the success of this task as well. Only a few attempts have been made to reveal and describe the processes involved in the recognition of contradictions by a human. The most prominent theories have been devel- oped by the psychologists in the framework of reading comprehension and described in Otero and Kintsch (1992), Singer (1996), Johnson-Laird et al. (2004) and van den Broek et al. (2005). The proposed theories differ with respect to the model of reading comprehension which they are based upon. The focus of the present study are contradictions occurring in and between online news texts. There are a number of definitions for contradiction, which, according to Grim (2004), can be grouped into four classes: (1) those which define contradiction in terms of truth and falsity (Prior 1967: 458; Bonevac 1987: 25; Wolfram 1989: 163; Sainsbury 1991: 369) such as in (D1) as follows, (2) in terms of content or form (Reichenbach 1947: 36; Mendelson 1964: 18; Haack 1978: 244; Kalish et al. 1980: 18; Forbes 1994: 102) such as in (D2), (3) in terms of assertion and denial (Strawson 1952, 2011: 16-19; Quine 1959: 9; Brody 1967: 61; Kahane 1995: 308) such as, e.g., in (D3), and (4) as a state of affairs (Routley/Routley 1985: 204) such as, e.g., in (D4). Grim (2004) refers to these four groups as semantic, syntactic, pragmatic, and ontological, respectively. D1 Two propositions are contradictories if and only if it is logically impossible for both to be true and logically impossible for both to be false. (Sainsbury 1991: 369) D2 Wff* of the form ‘A & ¬A’; statement of the form ‘A and not A’ (Haack 1978: 244) D3 A contradiction both makes a claim and denies that very claim. (Kahane 1995: 308) D4 A contradictory situation is one where both B and ¬B (it is not the case that B) hold for some B. (Routley/Routley 1985: 204) Introduction 13 Though these definitions can be used by humans for recognizing contradictions, they are practically, with the exception of the third group of definitions and only by considering some limitations, only with difficulties applicable for the purpose of the study, which is the devel- opment of a system for automatic detection of contradictions in news texts. For instance, no machines are capable of determining the truth value of a sentence at present. It is obvious that most of the above definitions, to some degree, build on one of the three versions of Aristotle’s Law of Non-Contradiction (Section 3.1.1). Thus, the third group of definitions, for example, seems to reflect the ontological version of the law (not to be con- fused with Grim’s ontological definition), which states that “it is impossible that the same thing can at the same time both belong and not belong to the same object and in the same respect, and all other specifications that might be made, let them be added to meet local objections” (Metaphysics IV 3 1005b19–23). In our opinion, this formulation is more appli- cable to development of a system for automatic detection of contradictions and will, there- fore, be mentioned prior to the purpose of the study. It is to note that, besides contradiction, also contrariety will be considered in the present study. Though, both terms will be referred to here as contradiction (compare to German: kontradiktorischer Widerspruch vs. konträrer Widerspruch), they have to be clearly distin- guished as not synonymous. The difference between contradiction and contrariety will be presented in Section 3.1.2. According to the survey on the news consumption across twelve countries conducted in 20152 by the Reuters Institute for the Study of Journalism, Oxford University over four channels of news access – television, online (including social media), radio, and printed newspapers, the first two appeared to be the most popular ways of accessing news on a weekly basis, with television being the number-one source in, i.a., Germany (82%), France (80%), and UK (75%), among others, and online access in, i.a., Urban Brazil (91%), Finland (90%), Spain (86%), and Denmark (85%). However, taking into consideration that this sur- vey has been conducted online and thus, may underrepresent users who do not use online services, it can be concluded that TV news is still ahead in the countries that participated in the survey; however, with the clear exception of the United States and possibly Denmark, Finland, and Australia. Moreover, from comparing the news consumption among people of different ages, it can be observed that young people prefer online news and often com- pletely abandon television news. This trend is especially observed for United States, France, and Denmark. 2 The online report on the survey can be found by following this http://www.digitalnewsreport.org/survey/2015/sources-of-news-2015/ Introduction 14 To study in particular online news consumption, the Reuters Institute conducted a survey across 36 countries (i.a. USA, Mexico, Australia, EU countries) in five continents.3 Accord- ing to the survey, around a half of the survey participants (54%) across all countries, with a predominance of Southern Europe and Latin America, prefer social media as a source of news in contrast with other sources. However, in Spain, Germany, and France, a reverse or slowing trend for this can be observed. Further, the report shows that 23% use messag- ing apps (e.g., WhatsApp, Viber, We Chat, FB Messenger, Line, Kakao Talk) for weekly accessing the news. Additionally, it was found out that the access of news via smartphones had increased in comparison to computers and tablets, which amounted to 56%, a score which had doubled since 2013. With the Internet era, not only the readers’ preferences for news source (especially of young readers) have changed. The journalistic practice of news production, i.e. information collec- tion and reporting, has been influenced by the possibilities provided by the Internet as well. A number of studies have been conducted which reveal the changes the Internet had brought to the process of news production, including Reddick and King (2001), Miller (1998), Singer (2003), and Fenton (2012), among others. Fenton (2012) summarizes the research findings, i.a., under the umbrella of criteria such as (data transfer) speed and (web) space. The great amount of space provided in the web means the production of more news for the journalistic practice. Fenton (2012: 559) frames this as “space equals more news”. Space provides a possibility of archiving and updating the news, achieving “more depth of infor- mation coverage” (ibid.). Space allows a storage of news in different multimedia formats, and not only as text. Space and speed enable a geographical reach so that journalists do not need to leave their newsroom to write about events that have happened in the world. Speed enabled by Internet, in turn, for the practice of news production, means an increasing value of immediacy (Fenton 2012). However, while the immediate release and update of the news texts is doubtlessly an advantage for the news reader, it is unfortunately often only possible at the cost of information quality (Gunter 2003; Fenton 2012; Silvia 2001). Taking an advantage of the Internet speed, news organizations often publish their news on the web “before the usual checks for journalistic integrity have taken place” (Fenton 2012: 561). This in turn results in the observation that news texts often include typographical, factual, and logical errors, violating accuracy as one of the fundamental values of news text production, misinforming the reader, and negatively affecting the credibility of the newspaper (Bell 1991; Maier 2005; Bednarek/Caple 2012). 3 The Digital News Report 2017 on the survey published online can be found by following this link: http://www.digitalnewsreport.org/survey/2017/resources-2017/ Introduction 15 Factual errors, according to Silverman (2007), represent the most frequent kind of errors occurring in news texts. In contrast to typographical and logical errors, which can be recog- nized within the text itself, incorrect facts can be revealed only by applying world knowledge or by referring to the original or other related information sources. Typographical errors, in turn, are not critical and can nowadays be easily recognized by means of autocorrection. In contrast, logical errors, which are the result of a violation of logical laws, e.g., the Law of Non-Contradiction (LNC) and the Law of Excluded Middle (LEM), are the most challenging kind of errors for recognition. In practice, both factual and logical errors, in most cases, remain unnoticed by the reader of the news and are taken for granted as reliable or trust- worthy (Svintsov 1979; Bell 1991). The omission of the errors can be a consequence of missing world knowledge required or of a lack of readers’ attention while reading. In any case, the reader of the news is misinformed and is not aware of this. However, if detected by a reader, the typographical, but especially factual and logical errors that have occurred present a negative impact on the newspaper’s credibility and trustwor- thiness since they are perceived as lies or disinformation (Svintsov 1979; Bell 1991; Silver- man 2007; Bednarek/Caple 2012). Therefore, in the process of news production, the task of news editing is essential and cannot be ignored. Editing has become even more urgent today because the modern reader has even more possibilities of verifying the information provided, in comparison to the past, as a large amount of related information appears online simultaneously (Silverman 2007). One should also consider that incorrect facts (factual errors) and logically wrong conclu- sions (logical errors) in news texts are often used for intentionally serving the purpose of manipulation or propaganda. Violating the news value of objectivity (Section 4.1.3), the facts are adjusted to influence the reader’s opinion, forcing it into a particular direction to the advantage of the country’s, institution’s, or individuals’ interests. In this context, in particular, the current phenomenon of fake news reportedly occurring in the social media should be mentioned. Today in many fields of human life, computers successfully play a supporting role, taking over natural language tasks such as, e.g., searching among a huge amount of data and delivering the needed information in the shortest amount of time, as well as typographical error correction, opinion mining, etc. The main aim of the present study is to propose an approach for automatic detection of contradictions (henceforth referred to as CD) in news texts. This approach can be of practical relevance first, for the task of news editing when proofing the text for consistency (=agreement with facts previously stated, no contradictions con- tained). Second, it can be applied to identify on which facts and aspects the different Introduction 16 sources of information disagree and in such a way, to serve the purpose of information verification. Third, an automatic CD task can be used to obtain a summarized view of con- tradictory opinions and facts on particular events from a large number of news texts in order that a reader can independently form his opinion based on a full picture. Finally, the ap- proach can be integrated into other natural language systems and applications such e.g., question-answering systems and text summarization which among others use news texts as their data source. From the theoretical perspective, the significance of the study consists first in summarizing and elaborating the existing theoretical knowledge on natural language contradiction. Sec- ond, the study provides new empirically gained insights into the realization mechanisms of natural language contradictions occurring in and between news texts, in this way contrib- uting to a better understanding of the nature of contradictions and filling the knowledge gaps. 1.2 Subject of the Study Natural language contradictions are of complex nature. As will be shown in Chapter 5, the realization of contradictions is not limited to the examples such as Socrates is a man and Socrates is not a man (under the condition that Socrates refers to the same object in the real world), which is discussed by Aristotle (Section 3.1.1). Empirical evidence (see Chapter 5 for more details) shows that only a few contradictions occurring in the real life are of that explicit (prototypical) kind (see, e.g., Svintsov 1979; de Marneffe et al. 2008). Rather, con- tradictions make use of a variety of natural language devices such as, e.g., paraphrasing, synonyms and antonyms, passive and active voice, diversity of negation expression, and figurative linguistic means such as idioms, irony, and metaphors. Additionally, the most so- phisticated kind of contradictions, the so-called implicit contradictions, can be found only when applying world knowledge and after conducting a sequence of logical operations such as e.g. in (1.1). (1.1) The first prize was given to the experienced grandmaster L. Stein who, in total, col- lected ten points (7 wins and 3 draws). (Svintsov 1979: 195) Those familiar with the chess rules know that a chess player gets one point for winning and zero points for losing the game. In case of a draw, each player gets a half point. Built on this idea and by conducting some simple mathematical operations, we can infer that in the case of 7 wins and 3 draws (the second part of the sentence), a player can only collect 8.5 points and not 10 points. Hence, we observe that there is a contradiction between the first and the second parts of the sentence. Introduction 17 Implicit contradictions will only partially be the subject of the present study, aiming primarily at identifying the realization mechanism and cues (Chapter 5) as well as finding the parts of contradictions by applying the state of the art algorithms for natural language processing without conducting deep meaning processing. Further in focus are the explicit and implicit contradictions that can be detected by means of explicit linguistic, structural, lexical cues, and by conducting some additional processing operations (e.g., counting the sum in order to detect contradictions arising from numerical divergencies). One should note that an additional complexity in finding contradictions can arise in case parts of the contradictions occur on different levels of realization. Thus, a contradiction can be observed on the word- and phrase-level, such as in a married bachelor (for variations of contradictions on lexical level, see Ganeev 2004), on the sentence level – between parts of a sentence or between two or more sentences, or on the text level – between the portions of a text or between the whole texts such as a contradiction between the Bible and the Quran, for example. Only contradictions arising at the level of single sentences occurring in one or more texts, as well as parts of a sentence, will be considered for the purpose of this study. Though the focus of interest will be on single sentences, it will make use of text particularities such as coreference resolution without establishing the referents in the real world. Finally, another aspect to be considered is that parts of the contradictions are not neces- sarily to appear at the same time. They can be separated by many years and centuries with or without time expression making their recognition by human and detection by machine challenging. According to Aristotle’s ontological version of the LNC (Section 3.1.1), how- ever, the same time reference is required in order for two statements to be judged as a contradiction. Taking this into account, we set the borders for the study by limiting the ana- lyzed textual data thematically (only nine world events) and temporally (three days after the reported event had happened) (Section 5.1). No sophisticated time processing will thus be conducted. 1.3 Research Questions and Objectives As previously mentioned, the main aim of the present study is to propose a system for automatic detection of naturally occurring contradictions in and between news texts pub- lished in English. As regards to the aim of the study, we formulate the following three blocks of related research questions: RQ1 What conditions must two sentences necessarily satisfy in order to be judged a con- tradiction? Are there any natural language exceptions? RQ2 What are the cues of contradictions occurring in news texts written in English? Do all contradictions occur explicitly in news texts? Introduction 18 RQ3 What phenomena of natural languages should a CD system be able to cope with? Considering this, how can the architecture of a system for the automatic detection of contradictions occurring in and between news texts look like? What is the most efficient way of the computational realization of the system’s components? What are the current limitations? How can a CD profit from the properties of a text? The research objectives serving as milestones toward the main aim of the study, are as follows: O1a Review the state of the art of the CD systems, identify their weaknesses and strengths, and determine the aspects or components that are to be improved; O1b Review the existing datasets of contradictions and decide whether they can be ap- plied as the basis for the development and evaluation of the CD system. If required, collect and prepare own data; O2a Based on the existing theory, formulate a set of conditions and rules that underlie the realization of natural language contradictions; O2b Describe natural language phenomena which can be problematic issues for the CD task; O3a Outline the characteristics and particularities of the text and, in particular, of the online news texts that have to be considered by a CD system and can potentially contribute to the efficiency of the CD task; O3b Identify the linguistic cues of naturally occurring news contradictions and offer a ty- pology of contradictions based on these cues; O4 Develop an architecture of a prototype CD system and implement the system. De- cide on which methods and approaches can be used for implementing the system’s components and evaluate them on real cases. 1.4 Structure of the Thesis The overall structure of the study consists of ten chapters, including Introduction, Conclu- sions, References, and Appendix. After introducing the reader with the subject, the main aim, and goals of the study (Chapter 1, Introduction), Chapter 2 (State of the Art) begins with the presentation of the main stages of the development of the CD task. It then goes on with an overview of the existing CD systems, summarizes their weaknesses and strengths, and defines the research gaps to be addressed in the study (Section 2.1). Finally, the chapter provides a description of the available datasets of contradictions, which are an essential condition for the development and evaluation of CD systems (Section 2.2). The next two chapters (together with Chapter 6) lay out the theoretical dimensions of the research, addressing the concepts of contradiction in logic and language (Chapter 3) and the characteristics of news texts with a focus on online news texts (Chapter 4). In more detail, Chapter 3 (Contradiction in Logic and Language), which consists of five sections, is concerned with the traditional approaches to contradiction in logic and lan- guage. Section 3.1 first presents the traditional view on contradiction as developed by Introduction 19 Aristotle and then provides a terminological distinction of contradiction to the related con- cepts such as contrariety, tautology, and paradox. The focus of Section 3.2 is the realiza- tion, expression, and interpretation of negation in natural languages with interest in English. The subject of Section 3.3 is the scientific debate on the status of contradiction in the light of phenomena such as presupposition, modality, vagueness, and ambiguity. Further, Sec- tion 3.4 provides an overview of existing classifications of textual contradictions, including typologies from educational psychology and computational linguistics. Finally, Section 3.5 concludes the chapter with a summary of the causes and functions of natural language contradictions, claiming that contradictions are not always “bad”. Chapter 4 (The Characteristics of News Texts) introduces the concept of news texts, includ- ing the differences between printed and online newspapers, hard and soft news, and values in news production (Section 4.1), description of a news article structure and its main ele- ments (Section 4.2) as well as a discussion of news language particularities (Section 4.3). Chapters 5, 6 and 7 focus on the conceptual and physical design as well as implementation of the CD system and constitute with Chapters 5 and 7 the empirical part of the present work. Chapter 5 (Typology Construction: Types of Contradictions in News Texts) describes the computationally oriented methodology and reports the results of a corpus-based typology construction of the contradictions occurring in single or multiple news texts. Chapter 6 (Conceptual Design of a CD System and Supporting Tools) in turn addresses a possible conceptual design of a CD system and provides a theoretical background on com- putational approaches to meaning processing at lexical, morphological, syntax, semantic, pragmatic and discourse levels essential for the support of a CD system (Section 6.1– Sec- tion 6.2.3). Approaches to meaning representation are the topic of Section 6.2.4. The chap- ter then concludes with a presentation of existing computational sources of lexical and world knowledge (Section 6.2.5). Chapter 7 (Physical Design of a CD System and Implementation) then proposes an ap- proach for the CD task, integrating the gained knowledge, and describes the main steps and experiments conducted with an implementation of the system’s components. Finally, Chapter 8 (Conclusions) summarizes the findings and outlines the limitations of the system developed. In respect to these limitations, the areas and tasks for further research are defined. Introduction 20 1.5 How to Read This Thesis I would like to conclude the introductory chapter with some useful remarks on how to read this thesis by addressing the use of the examples, terminology, and data. All examples in the thesis are provided with an ID that follows a particular system. Each ID consists of two digits, separated by a point. The first digit indicates the number of the chapter where the example occurs; the second digit indicates the order of the example in this chap- ter. Examples of contradictions taken from the compiled corpus are additionally provided with an id that indicates where the example can be found in the corpus. The digital version of the corpus is provided on the USB flash drive submitted along with the present work. The digital version of all supplementary materials attached in the present study can be found on the USB flash drive as well. State of the Art 21 2 State of the Art The present chapter serves the purpose of introducing the reader to the state of the art of an automatic task of textual CD. First, it provides an overview and description of existing CD systems and methods (Section 2.1). In order for a reader to form a well-ordered picture about the state of the art, Section 2.1 begins by sketching the main stages of development of interest in automatic detecting textual contradictions before discussing the methods and systems. Due to the relevance, only a selected number of methods and systems will be presented in detail here. The criteria for the selection of the systems and methods were an underlying methodology, perfor- mance evaluation scores as well as experts’ opinions. The section then concludes with an outline of weak and strong aspects of the systems indicating the research gaps and sets the objectives for the study. Further, in Section 2.2, a description is given of the datasets of contradictions – the so-called corpora – available which are an essential basis for the development and evaluation of the systems. Additionally, the need of collecting own data, despite the existing ones, is explained in this sec- tion. 2.1 Methods and Systems The interest in an automatic CD within the framework of natural language processing (henceforth NLP) and specifically, as a task of natural language understanding (NLU) has its origin in the mid-1990s and is associated with the FraCas project (Cooper et al. 1996). Since then, a number of systems have been proposed, ranging from the simple and robust shallow approaches relying on lexical overlaps and word frequencies to the precise but challenging, deep approaches conducting an advanced semantic interpretation. The best state-of-the-art systems currently achieve approx. 60% accuracy in identifying contradic- tions that mainly arise from negation and antonyms. The initial attempts of automatic CD were theoretical and relied on the methodological ap- paratus of the first-order logic (FOL) (Cooper et al. 1996; Condoravdi et al. 2003). Crouch et al. (2003), in particular, emphasized the potential of sophisticated FOL approaches such as described in Hirst (1991) and Hobbs (1985). However, no practical implementations of logic- or quasi-logic-driven systems have been proposed until the middle of the 2000s. To the first logical and quasi-logical systems count the system described in Tatu and Moldovan (2007), the BLUE system developed by Clark and Harrison (2009), and a hybrid NatLog- system by MacCartney and Manning (2009). The first CD system implemented that went beyond the FOL was proposed in Harabagiu et al. (2006). The developers relied only on the capability of the machine-learning algorithms State of the Art 22 for textual entailment4 recognition (Section 6.2.2.2) and considered explicit contradiction cues such as negation and antonyms. A number of systems for CD in English have been developed during the Recognizing Tex- tual Entailment (RTE) challenges in the years 2007-2009 (RTE-3 Extended Task, RTE-4, and RTE-5 challenges).5 The main requirement for the systems was a classification of the sentence pairs, provided in the three categories of entailment, contradiction, and unknown, the so-called three-way task (Giampiccolo et al. 2007; Voorhees 2008). The RTE systems are presented in Table 1 (RTE-3 Extended Task), Table 2 (RTE-4), and Table 3 (RTE-5). One should note that the systems submitted in the latter RTE challenges by the same au- thors are, in most cases, improvements on the earlier RTE submissions. In addition to the RTE systems, a number of standalone systems for different languages have been developed to the present time as well. These include, among others, systems described in Harabagiu et al. (2006), de Marneffe et al. (2008), Ritter et al. (2008), Kim/Zhai (2009), Ennals et al. (2010), Tsytsarau et al. (2010, 2011), Tsytsarau/Palpanas (2011), Pham et al. (2013), Dînşoreanu/Potolea (2013), Lendvai/Reichel (2016) for English; War- tena et al. (2006) for Dutch; Kawahara et al. (2010), Hashimoto et al. (2012), Andrade et al. (2013), Kloetzer et al. (2013), and Takabatake et al. (2015) for Japanese, and Shih et al. (2012) for Chinese. The standalone CD systems for English are summarized in Table 4. Both RTE and standalone CD systems have been developed for different application pur- poses, including, e.g., the improvement of textual entailment recognition tasks (the RTE systems), the improvement of text summarization and question-answering systems (e.g., Harabagiu et al. 2006) as well as the detection and summarization of conflicting opinions in social media and other Web 2.0 platforms (e.g., Kim/Zhai 2009; Ennals et al. 2010; Tsytsarau et al. 2010, 2011; Tsytsarau/Palpanas 2011; Dînşoreanu/Potolea 2013; Lendvai/Reichel 2016). The systems follow different often-combined rationales and meth- odologies, apply a variety of NLP tools, and with the exception of the RTE systems, are evaluated on different datasets, which makes their comparison and generalization challeng- ing. The execution of the same steps for different purposes makes the systems 4 The term textual entailment is related to the logical entailment but is used in computational linguis- tics in a looser and more relaxed sense. The organizers of the RTE challenges provide the fol- lowing definition of textual entailment: “We say that T entails H if, typically, a human reading T would infer that H is most probably true” (Dagan/Glickman 2004: 4). The parts of logical entailment relation premise and conclusion in the framework of RTE refer to text (T) and hypothesis (H), respectively. 5 The RTE-1, RTE-2, RTE-3 (Main, but not the Extended Task), RTE-6, and RTE-7 challenges fo- cused on the recognition of entailments only. State of the Art 23 generalization difficult as well. Nevertheless, an attempt of systems’ comparison is pre- sented in Table 1 – Table 4. The comparison of the systems reveals that the CD by means of supervised classification is a preferred method, despite the need for a large amount of data for classifier training and model testing. Based on a set of pre-defined features and manually classified (annotated) examples in contradictions and non-contradictions, a classification algorithm searches for patterns in the pre-classified data (training data) and builds a model which, after a test stage, can then be applied to predict any contradictions in a new corpus. For the classification task, a variety of algorithms have been applied, including, among others, maximum entropy in, e.g., de Marneffe et al. (2008), SVM (Vapnik 1995) in, e.g., Malakasiotis and Androut- sopoulos (2007), decision trees in, e.g., Hickl et al. (2007), nearest (shrunken) centroids (Tibshirani et al. 2003), and random forest (Breiman 2001) in Lendvai/Reichel (2016). The maximum-entropy algorithm has proved to be most efficient so far. For the application of the classifiers, the WEKA machine-learning tool6 described, e.g., in Smith/Frank (2016) was preferred. Concerning pre-defined features, some classification-based systems relied on the degree (or score) of similarity between text and hypothesis sentences (for definition of text and hypothesis see Footnote 4) in tokens, lemma, parts-of-speech, and sentence length (e.g. Malakasiotis/Androutsopoulos 2007; Lendvai/Reichel 2016) computed by multiple similarity measures, without considering any other information. For this task, a number of similarity measures have been applied, including among others, the Levenshtein distance, the Jaro- Winkler distance, the Manhattan distance, the Euclidean distance, the cosine similarity, the n-gram distance, the matching coefficient, the Dice coefficient, as well as the Jaccard coef- ficient. In general, the results show that although classification based on similarity scores works well for recognizing entailments and neutral cases, CD represents a more complex task (Lendvai/Reichel 2016). Another group of classification-based systems in turn relied on features which are charac- teristic for contradiction, including negations, antonyms, numerical mismatches as well as mismatches in grammatical functions and thematic roles (Harabagiu et al. 2006; de Marn- effe et al. 2008). In contrast to the simple computation of similarity, the detection of a con- tradictory relation requires additional steps such as a unified comparable representation of 6 https://www.cs.waikato.ac.nz/ml/weka/index.html State of the Art 24 Feature Study M a la k a s io ti s / A n d ro u ts o p o u lo s C la rk e t a l. T a tu /M o ld o v a n H ic k l e t a l. B o b ro w e t a l. M a c C a rt n e y /M a n n in g If te n e / B a la h u r- D o b re s c u W a n g /N e u m a n n Accuracy (%) 49.4 45.1 71.3 73.1 43.6 59.1 56.9 45.5 Preprocessing X X Parsing X SRL X X Anaphora resolution X X Lexical resources X X X X X Paraphrasing X X X X World knowledge X M e a n in g R e p re s e n ta ti o n Bag-of-words Logical form X X X Dependency graph / tree X X X Other X Alignment X X Machine learning X X X String similarity X X Topic identification C o n tr a d ic ti o n c lu e s Negation X X X X Opposition X Other Sentiment analysis D a ta s e ts RTE (original) X X X X X X X X RTE (modified) Other X Table 1: CD Systems submitted for the RTE-3 challenge – Extended Task (2007). State of the Art 25 Feature Study G a la n is /M a la k a s io ti s C la rk /H a rr is o n G lin o s W a n g /N e u m a n n A g ic h te in e t a l. M o n ta lv o -H u h n /T a y lo r V a rm a e t a l. K re s te l e t a l. S ib lin i/ K o s s e im L i e t a l. C a s ti llo /A lo n s o i A le - m a n y P a d ó e t a l. If te n e M o h a m m a d e t a l. Accuracy (%) 67.6 54.7 41.6 61.4 54.7 46.6 30.9 43.2 61.6 58.8 54.6 55.3 68.5 55.6 Prepro- cessing X X X X X X X X X X Parsing X X X X X X X X X SRL X X X Anaphora resolution X X X X X Lexical resources X X X X X X X X X X X X X X Paraphrasing X X World knowledge X X M e a n in g r e p re s e n ta ti o n Bag-of- words X Logical form X Depend- ency graph / tree X X X X X X X X Other Alignment X X X X X X Machine learning X X X X X X X X String simi- larity X X X X Topic identi- fication X C o n tr a d ic ti o n c lu e s Nega- tion X X X X X Opposi- tion X X X X Other X X X X X Sentiment analysis D a ta s e ts RTE (original) X X X X X X X X X X X X X X RTE (modi- fied) Other Table 2: CD systems submitted for the RTE-4 challenge (2008). State of the Art 26 Feature Study M a la k a s io ti s C la rk /H a rr is o n H a n R e n e t a l. W a n g e t a l. F e rr á n d e z e t a l. B re c k C a s ti llo V a rm a e t a l. K re s te l e t a l. If te n e /M o ru z Accuracy (%) 57.5 54.7 52.2 63.7 60 57 52.2 46.9 48.7 68.3 Preprocessing X X X X X X Parsing X X X X X X X X Semantic Role Labelling X X Anaphora Resolution X X Lexical resources X X X X X X X X X X Paraphrasing X X X World knowledge X X M e a n in g R e p re s e n ta ti o n Bag-of-words X Logical form X Dependency graph / tree X X X Other Alignment X X Machine learning X X X X X String similarity X X X X X Topic identification C o n tr a d ic ti o n c lu e s Negation X X X X X X Opposition X X X Other X X Sentiment analysis D a ta s e ts RTE (original) X X X X X X X X X X RTE (modified) Other Table 3: CD systems submitted for the RTE-5 challenge (2009). State of the Art 27 Feature Study H a ra b a g iu e t a l. ( 2 0 0 6 ) d e M a rn e ff e e t a l. ( 2 0 0 8 ) R it te r e t a l. ( 2 0 0 8 ) K im /Z h a i (2 0 0 9 ) E n n a ls e t a l. ( 2 0 1 0 ) T s y ts a ra u /P a lp a n a s ( 2 0 1 1 ) T s y ts a ra u e t a l. ( 2 0 1 0 , 2 0 1 1 ) P h a m e t a l. ( 2 0 1 3 ) L e n d v a i/ R e ic h e l (2 0 1 6 ) W a rt e n a e t a l. ( 2 0 0 6 ) fo r D u tc h Accuracy (%) Precision (%) Recall (%) 64/- /- - /22.95/ 19.44 -/62/ 19 n.a. n.a. n.a. n.a. -/14/ 19.44 iPosts -/40/34 Threads -/42/35 n.a. Preprocessing X X X X X Parsing X X X SRL X X X Anaphora resolution X X X X Lexical resources X X X X X Paraphrasing World knowledge X X M e a n in g R e p re s e n ta ti o n Bag-of-words X X X X Logical form Dependency graph / tree X X Other X X X Alignment X X X X X Machine learning X X X String similarity X X Topic identification X X C o n tr a d ic ti o n c lu e s Negation X X X Opposition X X X X Other X X X X Sentiment analysis X X X D a ta s e ts RTE (original) X X RTE (modified) X X Other X X X X X X X Table 4: Standalone CD systems. State of the Art 28 text and hypothesis meaning and their alignment. The preferable means for meaning rep- resentation were dependency trees converted to typed dependency graphs, e.g., in de Marneffe et al. (2008), functional dependency triples alone (Wang/Neumann 2008) or com- bined with frame representation based on semantic role frames (Pham et al. 2013), the functional dependency tuple (Ritter et al. 2008) as well as the bag-of-words (Tsytsarau et al. 2010, 2011; Tsytsarau/Palpanas 2011), only to name a few. For the representation of sentences as a functional dependency of a verb predicate and two arguments, the REVERB tool (Fader et al. 2011) applied in Pham et al. (2013) and the TextRunner Open Information Extraction system (Banko et al. 2007; Banko/Etzioni 2008) has been applied in Ritter et al. (2008). For alignment, despite a greedy algorithm, the maximum entropy-based classifier was preferred (Hickl et al. 2006) in, e.g., Harabagiu et al. (2006). In addition to classification- and rule-based systems, the third group of systems adopt a slightly loose logical form in their meaning representation and incorporate logical inference rules (Tatu/Moldovan 2007; Clark and Harrison 2009; MacCartney/Manning 2007) as well as detect contradictions based on opposite sentiments and statistical computing (Tsytsarau et al. 2010, 2011; Tsytsarau/Palpanas 2011; Dînşoreanu/Potolea 2013) or patterns over ontology terms (Wartena et al. 2006). Common for all systems is the use of lexical resources (Section 6.2.5.1) such as WordNet (Fellbaum 1998), VerbNet (Kipper et al. 2000), and DIRT (Lin and Pantel 2001) for identify- ing meaning relations (i.a., oppositions and synonyms) for the purpose of sentence align- ment, improving the building of a classification model and detecting contradictions. For knowledge-based contradictions, the Wikipedia resource was most preferred. A number of studies emphasize the importance of finding related text and hypothesis sen- tences which describe the same event in order to achieve better performance of the sys- tems on CD task (de Marneffe et al. 2008; Kim/Zhai 2009; Pham et al. 2013; Lendvai/Reichel 2016). The authors proceed on the assumption that two events cannot be contradictory when they are not related. The related sentences were found in the proposed systems by means of, e.g., a Jaccard similarity function in combination with WordNet by, e.g., Kim/Zhai (2009) as well as a latent Dirichlet allocation (LDA) topic modelling algorithm (Blei et al. 2003) at a sentence level (Denecke/Brosowski 2010) applied in Tstytsarau et al. (2011). The general natural processing tasks integrated into the systems include data normalization (i.a., temporal, abbreviations, etc.), parsing for the purpose of identifying grammatical func- tions and constructing meaning representations, part-of-speech tagging, anaphora resolu- tion within a sentence or between two neighbor sentences, semantic role labeling for iden- tifying the thematic roles, polarity computing, and others. For parsing, the Charniak parser State of the Art 29 (Charniak 2000), chart parser SAPIR (Harrison/Maxwell 1986), Collins parser (Collins 2003), Stanford dependency parser (Klein/Manning 2003; de Marneffe et al. 2006) and MiniPar (Lin 1994) have been applied. The LingPipe tool (e.g., described in Baldwin/Daya- nidhi 2014) was a preferred toolkit for named entity recognition (NER) and TnT (Brants 2000) for part-of-speech tagging. Anaphora resolution in turn has been performed, e.g., by means of a tool which combines the Hobbs algorithm (Hobbs 1978) and the resolution of anaphora procedure (Lappin/Leass 1994). Semantic role labeling was conducted by means of, e.g., the SENNA package (Collobert et al. 2011). For normalization of time expressions, e.g. the TARSQI toolkit (Verhagen et al. 2008) has been applied. Only a few systems (Har- abagiu et al. 2006; de Marneffe et al. 2008) make use of information on modality and quan- tification, which is essential for the task of CD. To the most prominent, most cited, and interesting CD approaches for English belong to those developed and described in Harabagiu et al. (2006) and de Marneffe et al. (2008), as well as its improvement and extension proposed in Padó et al. (2008) and Ritter et al. (2008), and sentiment-based CD presented in Tsytsarau et al. (2010, 2011) and Tsytsarau/Palpanas (2011). As already mentioned earlier, Harabagiu et al. (2006) were the first to provide empirical results for the task of CD. The authors point out that the task can increase the quality of other NLP tasks such as question-answering and multi-document summarization. In the case of discovering contradictory information from multiple sources, the systems have to decide which information is preferred for the output. For this, the inconsistent information can either be checked by the additional intervention of a user or by contacting additional knowledge resources. The system proposed in Harabagiu et al. (2006) detects contradictions by following two views. According to the first view, contradictions can be recognized by removing the nega- tions of propositions (argument-predicate structure) and then testing the propositions for textual entailment. Harabagiu et al. (2006) used their own textual entailment system for conducting this task. According to the second view, contradictions can be detected by train- ing a classifier upon positive representatives of the contradictions relying on linguistic infor- mation such as negations (n’t, not; verbs to deny, to fail; prepositions without, except, etc.), antonyms as well as explicit cues of contrast relations (e.g., but, although, however). For the classification task, the maximum entropy machine learning algorithm was applied. To train and evaluate the classifier for detecting the contradictions arising from negations and antonyms, a modified RTE-2 dataset (for more information, see Section 2.2.2) has been used. For training and evaluation of the classifier for recognizing contrast relations datasets State of the Art 30 of a total of 10,000 sentence pairs (9,000 training datasets and 1,000 evaluation datasets) have been collected from online news articles. The results of the training and the following testing of the system showed that the system, by following the second view, shows better performance in CD. The proposed approach could achieve a 62% overall accuracy in identifying contradictions arising from negation and antonyms. A similar but more extended system was proposed in de Marneffe et al. (2008). Analogous to Harabagiu et al. (2006), the system makes use of the predicate-argument, meaning rep- resentation, recognition of textual entailment, and supervised machine learning techniques but relies, in contrast, the system of Harabagiu et al. (2006) not only on information of ne- gation and antonyms. Moreover, the authors compiled the first corpus of naturally occurring contradictions, repre- senting a more realistic data basis for system development (Section 2.2.3). Based on their corpus, de Marneffe et al. (2008) constructed a typology of contradiction cues, including negation, antonymy, numerical mismatches, structural, factivity, and modality information as well as world knowledge (see Section 3.4.3.2 for more information on these types). The authors point out that the contradictions arising from the first three features are relatively easy to model and detect as no deep comprehension is required. Detecting the contradic- tions marked by the latter aspects, in turn, requires a more precise meaning modeling. The system proposed in de Marneffe et al. (2008) is based on the Stanford RTE system (MacCartney et al. 2006) and was extended by an additional step of event coreference recognition. The authors claim that sentences about different events cannot be contradic- tory. However, as the result of missing context, sentences such as (2.1) were assumed to be contradictory without further analyzing whether woman refers to the same person. (2.1) Passions surrounding Germany’s final match turned violent when a woman stabbed her partner because she didn’t want to watch the game. A woman passionately wanted to watch the game. In general, the CD process by the Stanford system consists of four steps. First, the input text and hypothesis sentences are syntactically and semantically analyzed by means of the Stanford dependency parser (Klein/Manning 2003; de Marneffe et al. 2006) and then con- verted to typed dependency graphs. In the second step, based on the similarity and syntac- tic information that was combined by means of the margin infused relaxed algorithm (Cram- mer/Singer 2001), the graphs are aligned with each other, if possible. Padó et al. (2008) offered an improvement on this step by applying the edit distance-based alignment system MANLI (MacCartney et al. 2008) and the stochastic aligner. In the third step, sentences that are not related and do not describe the same event are filtered out by the system. Two State of the Art 31 different approaches have been proposed for this task. The authors claim that on one side, the root of the hypothesis graph aligned with text graph can indicate the co-referent events. It is, however, efficient in the case when the hypothesis sentences are shorter than the text sentences. On the other side, the authors propose modeling the sentence topicality as a technique for co-referent event detection. The two approaches were tested on the RTE-3 development dataset. The results are presented in Table 5, indicating first that the two ap- proaches, in general, have to be improved and second, addressing the need for other tech- niques for filtering the non-co-referent events. Finally, in the fourth step, the contradictory features are extracted, and logistic regression is applied to classify the hypothesis and text sentences as contradictory or not. Approach Precision Recall No filter 55.10 32.93 Root alignment 61.36 32.93 Root alignment + topicality modelling 61.90 31.71 Table 5: Comparison of approaches to graph alignment applied in de Marneffe et al. (2008). To test the system, the modified RTE-1_test, the RTE-2_test (contradictions arising from negations) dataset, and the original RTE-3_test dataset were used. The authors report a 42.22% precision and a 26.21% recall for detecting contradictions in the RTE-1_test da- taset, a 22.95% precision and a 19.44% recall for the RTE-3_test dataset, and a 62.97% precision, a 62.50% recall, and a 62.74% accuracy for the modified RTE-2_test dataset of negation. Further, the comparison of the results for each contradiction type separately shows that the system is efficient in the detection of contradictions arising from negation, antonyms, and numeric mismatches and needs improvement in the detection of lexical and world knowledge contradictions. Ritter et al. (2008) proposed an extension of the Stanford system, addressing the problem of world knowledge contradictions in their study, such as in (2.2): (2.2) a. Mozart was born in Salzburg. b. Mozart was born in Vienna. Here, a contradiction arises as the result of the incompatibility between Salzburg and Vi- enna in respect to the co-referent subject Mozart and driven by the relation expression was_born_in. This kind of relation, which can be formally represented as R(x,y), the authors call functional. The functional relation e.g. in (2.2a) can thus be represented as was_born_in(Mozart, Salzburg). The relation R between x (subject) and y (object) is a func- tional relation if and only if x is not ambiguous and is not related to different entities in the real world, and the function R maps x to the unique variable y. State of the Art 32 For the detection of contradictions marked by functional relations, Ritter et al. (2008) pro- posed a three-staged domain-independent system which they called AuContraire. In the first stage, the system analyzes sentences and presents them as one or more tuples that have the form R(x,y). For this task, the TextRunner component of the Open Information Extraction system (Banko et al. 2007; Banko/Etzioni 2008) was applied. In the second stage, the system identifies pairs of sentences which, with high probability, are functional relations and groups them into a set of R(x, •) with the same subject. For this, the authors propose the application of a modified expectation-maximization algorithm (Dempster et al. 1997). Finally, in the third stage, the system filters out cases, such as in (2.3), by reasoning about the synonymy, meronymy, and type of x and y (person, data, location, etc.) and iden- tifying the non-co-referent arguments. For identifying the meronyms, the developers used diverse lexical resources such as Tipster Gazetteer and WordNet (Fellbaum 1998). The synonyms, in turn, were recognized by computing the edit distance and string similarity (Cohen et al. 2003), as well as by applying a RESOLVER system for synonyms identifica- tion (Yates/Etzioni 2007), and by WordNet. To identify the type of x and y, the task of NER was performed in combination with lists of personal and geographical names. (2.3) Alan Turing was born in London. Alan Turing was born in England. To evaluate the system, Ritter et al. (2008) first used TextRunner to collect 1000 relations automatically from 117 million Web pages. They labeled each relation as a functional or a non-functional relation. They achieved a 62% precision and a 12% recall, and a 92% recall and a 51% precision on the balanced data (contradictions and non-contradictions in a pro- portion of 1:1). 2.2 Corpora of Contradictions 2.2.1 FraCas Inference Data Suite The FraCas (A Framework for Computational Semantics) inference test suite is considered to be the first corpus for English, which includes contradictions together with examples of entailments. The dataset was developed within the scope of a joint project of the Universität des Saarlandes (Germany), Universität Stuttgart (Germany), and University of Edinburgh (United Kingdom) in the middle of the 1990s (Cooper et al. 1996). The purpose of the project was to provide data for the development, evaluation, and improvement of applications for NLP focusing on inference processing. Cooper et al. (1996) define the central capability of such applications to be the ability of inference processing. The FraCas corpus consists of 346 units, so-called problems, each including 1-5 statements (premises), one yes/no question, and a yes/no answer, where yes indicates an entailment, State of the Art 33 no a contradiction, and don’t know remains for neutral cases. Some yes and no answers additionally include comments and explanations as in the example of (2.4). The number of premises in the problems in total amounts to 536. The distribution of the premises and an- swers in the corpus is presented in Table 6 and Table 7, respectively. (2.4) Premise: Dumbo is a large animal. Question: Is Dumbo a small animal? Answer: [No] Large(N) => ¬Small(N) Number of Premises Number of Problems Number of Problems (%) 1 192 55.5 2 122 35.3 3 29 8.4 4 2 0.6 5 1 0.3 Table 6: The distribution of premises in the FraCas corpus. Answer Number of Answers Number of Answers (%) Yes 180 52 Don’t 94 27 No 31 9 Other/complex 41 12 Table 7: The distribution of answers in the FraCas corpus. In general, the FraCas problems are divided into nine groups, according to the categories involved in semantic inference construction such as quantifiers, plurals, anaphora, ellipsis, adjectives, comparatives, temporal reference, verbs, and attitudes. The problems in each group, in turn, are further divided into subgroups representing single aspects of each cate- gory. The problem unit in (2.4) is an example of the category adjectives, the subcategory opposites. The distribution of the problems in the groups is presented in Table 8. State of the Art 34 Group of Problems Number of Problems Number of Problems (%) Quantifiers 80 23 Plurals 33 10 Anaphora 28 8 Ellipsis 55 16 Adjectives 23 7 Comparatives 31 9 Temporal 75 22 Verbs 8 2 Attitudes 13 4 Table 8: The distribution of the problems per group in the FraCas corpus. In 2009, MacCartney improved the FraCas corpus for the purpose of his study and anno- tated it with XML. Besides conducting some corrections and adding relevant notes, Mac- Cartney (2009) rephrased the questions into declarative sentences, facilitating them for au- tomatic processing. The original version of the FraCas corpus as ps-file and its improved XML-version are freely available for download at the webpage of Stanford University.7 2.2.2 RTE Datasets and Their Modifications A number of datasets, including contradictions, have been developed within the RTE chal- lenge during the period of 2006 to 2011. The RTE datasets were created with the aim of providing a comparable basis for evaluation of the systems participating in the RTE chal- lenges. All datasets are divided into development and text datasets and include mainly manually constructed pairs of sentences, representing entailments and non-entailments (contradictions and neutral cases). The statistics on RTE datasets, partially adapted from Bentivogli et al. (2009), are presented in Table 9. All RTE datasets are freely available on the web, directly or upon request.8 Since RTE-6 (Bentivogli et al. 2010) and RTE-7 (Bentivogli et al. 2011) include no annotations of contra- dictions, as well as no extensions of the datasets as regards to contradictions, the datasets will not any further be taken into consideration. The RTE-1 (Dagan et al. 2006), RTE-2 (Bar-Haim et al. 2006), and RTE-3 Main Task (Giam- piccolo et al. 2007) challenges were interested only in the task of automatic classification of the data in entailments and non-entailments. For this reason, the correspondent datasets are annotated exclusively with the categories entailments (label yes) and non-entailments 7 https://nlp.stanford.edu/~wcmac/downloads/ 8 https://tac.nist.gov// State of the Art 35 (label no), without further specification of non-entailments in contradictions and neutral cases. In terms of the RTE challenge, this is called a two-way task. The three-way task annotation of the RTE-1 and RTE-2 datasets, representing entailments, contradictions, and neutral cases, was later performed by Harabagiu et al. (2006) and de Marneffe et al. (2008). Challenge Dataset Size (No. of pairs) Hypothesis length (No. of words) Text length (No. of words) Contradictions (%) RTE-1 Dev 567 10.08 24.78 - Test 800 10.8 26.04 - RTE-2 Dev 800 9.65 27.15 - Test 800 8.39 28.37 - RTE-3 (Extended) Dev 800 8.46 34.98 10 Test 800 7.87 30.06 9 RTE-4 Test 1,000 7.7 40.15 15 RTE-5 Dev 600 7.79 99.49 15 Test 600 7.92 99.41 15 Table 9: The statistics on RTE datasets partially adapted from Bentivogli et al. (2009). Harabagiu et al. (2006) modified the RTE-2 dataset for the purpose of training and testing their system for the detection of contradictions marked by explicit negations (e.g., not), an- tonymy, and contrast discourse relation cues (e.g., but, although). To our current knowledge, the modified corpus is not available, neither for free nor for purchase. In modifying the RTE-2 dataset, Harabagiu et al. (2006) followed three different approaches. First, 800 instances of positive entailments from the RTE-2 dataset were manually negated by human annotators, such as shown in (2.5). As a result, a balanced corpus of 800 con- tradictions (Dataset 1) has been created. In order to avoid overtraining the model, the an- notators were also asked to negate 800 examples of negative entailments (=non-entail- ments) from the RTE-2 dataset. The produced instances (Dataset 2) were then checked to remove contradictions. (2.5) a. Former dissident John Bok, who has been on a hunger strike since Monday, says he wants to increase pressure on Stanislav Gross to resign as prime minister. b. A hunger strike was not attempted. Second, the human annotators were asked to paraphrase the negative sentences created in each pair of the Dataset 1, such as in the example of (2.6). As the paraphrasing was not possible for all cases, the corpus of 638 out of 800 instances could be created. (2.6) a. Former dissident John Bok, who has been on a hunger strike since Monday, says he wants to increase pressure on Stanislav Gross to resign as prime minister. b. A hunger strike was called off. State of the Art 36 Finally, the third dataset was created by combining 800 examples of non-contradictions with a randomly chosen 400 contradictions from the first and second datasets. Two years later, de Marneffe et al. (2008) proposed modifications and extensions of the RTE-1, RTE-2 and RTE-3 datasets.9 First, following the methodology of Harabagiu et al. (2006), they modified the RTE-2 dataset by randomly choosing 102 pairs of sentences (51 entailment and 51 non-entailments) from the RTE-2 test dataset and changing them by adding explicit negation. Afterward, they labeled the sentence pairs with yes for contradic- tion and no for a non-contradiction. The datasets can be downloaded from the website of the Stanford NLP Group10. Second, de Marneffe et al. (2008) extended the annotation of the sentence pairs of the RTE-1, RTE-2, and RTE-3 (Main Task) datasets from two-way task labels (yes for entail- ment relation between the sentences in the pair and no for non-entailment) to three-way task labels (yes for entailment relation between the sentences in the pair, no for contradic- tion, and unknown for non-entailment relation, excluding contradiction). For this, each in- stance of non-entailments in the RTE-1, RTE-2, and RTE-3 datasets was checked whether it is a contradiction or not. The decision is made by following the guidelines prepared by the Stanford project team.11 The pairs were labeled manually, either by one or two annotators. Moreover, the contradictions in the RTE-1, RTE-2, and RTE-3 datasets were assigned a type of contradiction based on a contradiction type (e.g., negation, antonymy, world knowledge, etc.). More details on the characteristics of each contradiction type are provided in Section 3.4.3.2 of the present work. The distribution of contradictions in the RTE-1, RTE- 2, and RTE 3 tests and development datasets is presented in Table 10. According to the statistics, contradictions constitute in total only 10% of the instances in all three RTE da- tasets. The distribution of contradictions according to their types on the example of the RTE- 3 development dataset is presented in Table 11. Challenge Dataset Original file name Number of con- tradictions Total number of instances RTE-1 development (1) RTE1_dev1 48 287 development (2) RTE1_dev2 55 280 test RTE1_test 149 800 RTE-2 development RTE2_dev 11 800 RTE-3 development RTE3_dev 80 800 test RTE3_test 72 800 Table 10: Number of contradictions in the RTE-1, RTE-2, and RTE-3 datasets. 9 De Marneffe et al. (2008) explain the need to again modify the datasets by the fact, that the corpora could not be made available by Harabagiu et al. 10 https://nlp.stanford.edu/projects/contradiction/ 11 https://nlp.stanford.edu/projects/contradiction/contradiction_guidelines.pdf State of the Art 37 Type of contradiction Distribution (%) Antonym 15.0 Negation 8.8 Numeric 8.8 Factive/Modal 5.0 Structure 16.3 Lexical 18.8 World Knowledge 27.5 Table 11: Distribution of contradictions occurring in the RTE-3 development dataset accord- ing to the contradiction type. Since 2008 three-way task labeled RTE-4 (Giampiccolo et al. 2008) and RTE-5 (Bentivogli et al. 2009) datasets specifying non-entailments into contradiction and unknown have been created. Sentence pairs in the datasets are labeled with yes for positive entailment, no for contradiction and unknown for neutral cases. The methodology of datasets compilation and annotation is the same as for the RTE-2 and is described in more detail in Dagan et al. (2009). The distribution of contradictions in the RTE-4 and RTE-5 datasets (test and devel- opment) is presented in Table 9. The main particularity of the RTE-5 dataset toward the other RTE datasets is the larger size of texts, in such a way providing a more realistic data basis for the development and evaluation of CD and RTE systems. 2.2.3 Stanford Corpus of Real-Life Contradictions Besides modifying and extending the RTE datasets, de Marneffe et al. (2008) additionally compiled a corpus of natural, or “real-life”, contradictions. The authors argue that manually created contradictions from the RTE 1-3 datasets do not necessarily cover the diversity of contradictions naturally occurring in the language and, therefore, provide an insufficient data basis for the development of efficient and effective systems for CD. Additionally, they claim that real contradictions can be more challenging for automatic recognition than the manually created ones. To compile a corpus of naturally occurring contradictions, de Marneffe et al. (2008) collected 131 pairs of contradictory sentences from the web. The instances included 19 contradictions from news articles (predominately from Google News), 51 from Wikipedia, 10 from the Lexis Nexis database, and 51 from the LDC project data. The sentence pairs were then manually annotated by two annotators with contradiction types. In case of divergences in annotator’s judgments, these have been clarified by discussion with agreement achieved if possible. Unfortunately, no information on an agreement between annotators on contradiction types has been provided by the researchers. The distribution of contradictions according to their type is presented in Table 12. State of the Art 38 Type of contradiction Distribution (%) Antonym 9.0 Negation 17.6 Numeric 29.0 Factive / Modal 6.9 Structure 3.1 Lexical 21.4 World Knowledge 13.0 Table 12: Distribution of contradictions occurring in the Stanford Corpus of Real-Life Contra- dictions according to the contradiction type. 2.2.4 SNLI Corpus Another corpus developed by the Stanford group, not only for the study of contradiction and textual entailment but also for the development of other applications for NLP is the SNLI 1.0 (Stanford Natural Language Inference) balanced corpus. Currently, the SNLI is considered as the largest state-of-the-art corpus for the task of RTE (also natural inference). The corpus is divided into development, test, and training datasets and consists of a total 570,152 sentence pairs, including examples of entailment, contradiction, and neutral cases. Their distribution in each dataset is presented in Table 13. The total number of instances in the corpus amounts 37,026. Dataset/Char- acteristics Size (No. of pairs) No. of contra- diction No. of entail- ments No. of neu- tral cases No. of unla- belled cases Development 10,000 3,278 3,329 3,235 158 Test 10,000 3,237 3,368 3,219 176 Training 550,152 183,187 183,416 182,764 785 Table 13: Distribution of contradictions, entailments, neutral, and unlabeled cases in the SNLI corpus. The sentence pairs for the corpus have been created manually in “a grounded naturalistic context” (Bowman et al. 2015: 1) by about 2,500 participants of the crowdsourcing Internet marketplace Amazon Mechanical Turk. For this purpose, the Stanford team developed the following methodology. Each MTurk worker was presented with a caption of a photo that served as a premise and was given a task to write three kinds of hypotheses for this caption, representing entailment (definitely a true description of the photo caption), contradiction (definitely a false description as of a photo) and a neutral sentence (might be a true descrip- tion of a caption of a photo) for one premise. Photo captions were provided by the Flickr corpus which consists of 160,000 unattributed captions to 30,000 scenes (Young et al. 2014). State of the Art 39 Thus, for example, for a caption of a photo Two dogs are running through a field, the entail- ment could be as shown in (2.7a), the neutral sentence as in (2.7b), and the contradiction as in (2.7c). The examples are taken from Bowman et al. (2015: 3). (2.7) a. There are animals outdoors. b. Some puppies are running to catch a stick. c. The pets are sitting on a couch. (Under assumption that both refer to the same point in the time) In total, 570,152 sentence pairs have been collected. These are presented as original sen- tences, as syntactically parsed, and as S-ROOT parsed. The premise sentences are pre- dominantly longer than the hypothesis sentences. That is, the mean length of premise sen- tence is 14.1 tokens, and the mean length of the hypothesis is 8.3 tokens. Moreover, prem- ise and hypothesis are in, most cases, syntactically different from each other. Further, the data in the corpus is not cleaned and includes few mistakes. The SNLI is released under a Creative Commons Attribution-Share Alike 4.0 International License and can be down- loaded freely.12 It is available in the JSON format and as text files with tab separated values. 2.3 Summary To sum up, the present methods and systems for CD task show good but still insufficient performance. That is, the mean accuracy score that the current systems could achieve ac- counts for 60%. The relatively low performance of the systems can be explained by the complexity of natural language contradictions, as well as by the diversity of ways and mech- anisms of their realization, making the task of automatic CD challenging. The specific rea- sons for the low performance of the systems can be the following. First, most of the methods were initially developed and tested on the basis of artificially synthesized pairs of contradic- tory sentences and are, therefore, probably not able to cover the whole diversity of naturally occurring contradictions. Second, the systems developed focus mainly on detection of ex- plicitly expressed contradictions, relying on linguistic features such as negation and anto- nyms. Only a few methods address the detection of implicit contradictions, which requires more sophisticated processing than the detection of explicitly expressed contradictions. Third, the pairs of contradictory sentences were analyzed out of the context in which they occur, in this way losing helpful information for CD such as e.g. the aspect of coreference between entities and events. Thus, there still remains a need for an efficient method for an 12 nlp.stanford.edu/projects/snli/ State of the Art 40 automatic CD indicating, foremost, gaps in the efficient methods for finding related sen- tences that may potentially form a contradictory or contrary relation. Though different approaches have been applied to the collection of contradictions, including manual construction and free collection from the web, the manual construction of contradic- tions has been preferred so far. In our opinion, however, the manually constructed examples do not have a claim to cover the diversity of the naturally occurring contradictions. Addition- ally, due to the limitations of the manual data creation, contradiction pairs are presented isolated from their text and context, thereby losing valuable information such as, e.g., the co-references (without knowledge about the referents in the real world) that can contribute to the better performance of the systems. Finally, with the exception of corpora that include some single examples, there is no special corpus for news text contradictions. Therefore, built on this background, there arises the need of collecting our own data – contradictions that occur in news texts – for the purpose of the study. Our methodology for collection of contradictions naturally occurring in news texts, along with text they appear in, will be pro- vided in Chapter 5. Contradiction in Logic and Language