Automatic Detection of Contradictions in Texts 
 
 
Inaugural-Dissertation 
zur  

Erlangung des Doktorgrades 
der Philosophie des Fachbereiches 05 – Sprache, Literatur, Kultur  

der Justus-Liebig-Universität Gießen 
 
 
vorgelegt von 
Natali Karlova-Bourbonus, M.A. 

 
Aus Frankfurt am Main 

 
2018 
 
 
Vorsitzender: Herr Prof. Dr. Thomas Möbius 

Erstgutachter: Herr Prof. Dr. Henning Lobin 

Zweitgutachter: Herr Prof. Dr. Helmut Feilke  

 
For my little prince Philipp  

 
Acknowledgments 

Five years are now passed since I began my doctoral thesis on contradictions in news texts 

and at this point, I would like to thank everybody who helped me in finalizing it.  

First of all, I would like to express my special thanks of gratitude to my Doktorvater, Prof. 

Dr. Henning Lobin, who gave me a priceless opportunity to collect new experience in teach-

ing and researching and to learn how exciting natural language processing can be. Dear 

Henning, thank you for supporting me the whole, often thorny, way to the completed mon-

ograph from the first half-baked ideas till the last dot on the paper.  

I would also like to express my special thanks of gratitude to my second supervisor, Prof. 

Dr. Helmut Feilke. Participation in his GCSC-Kolloquium in the year 2014 led to a long-

desired breakthrough in my research. I am grateful for his sharing of precious ideas with 

me, which laid the groundwork for my research. 

While reading the present monograph you will come across the statement several times 

that finding contradictions is a challenging task for a human. I would like to thank all the 

students at the University of Giessen who did not let this challenge discourage them and 

who, in the years 2015 and 2016, took part on my surveys conducted within the framework 

of the present study. Their patience, curiosity, and ability to explore latent things in the texts 

are admirable. 

Last but not least, I would like to thank the most important people in my life – my family. My 

parents, who always let me go my way, who never judged me for my decisions, and patiently 

and gratuitous supported me in seemingly inextricable moments. My husband Nico, who 

inspired and motivated me to make possible what, in the beginning, seemed to be impossi-

ble. And my brother Denis for the exciting conversations on the processing of natural lan-

guages.  

 
Table of Contents I 

 
Table of Contents 

Page 

List of Abbreviations ...................................................................................................... V 

List of Figures .............................................................................................................. VIII 

List of Tables.................................................................................................................. IX 

1 Introduction ............................................................................................................. 11 

1.1 Statement of Problem and Motivation .............................................................. 11 

1.2 Subject of the Study ......................................................................................... 16 

1.3 Research Questions and Objectives ................................................................ 17 

1.4 Structure of the Thesis ..................................................................................... 18 

1.5 How to Read This Thesis ................................................................................. 20 

2 State of the Art ........................................................................................................ 21 

2.1 Methods and Systems ..................................................................................... 21 

2.2 Corpora of Contradictions ................................................................................ 32 

2.2.1 FraCas Inference Data Suite ................................................................ 32 

2.2.2 RTE Datasets and Their Modifications .................................................. 34 

2.2.3 Stanford Corpus of Real-Life Contradictions ......................................... 37 

2.2.4 SNLI Corpus ......................................................................................... 38 

2.3 Summary  ........................................................................................................ 39 

3 Contradiction in Logic and Language ................................................................... 41 

3.1 Contradiction as Concept of Logic ................................................................... 41 

3.1.1 Contradiction in Traditional (Aristotelian) Logic ..................................... 41 

3.1.2 Contradiction and Contrariety ............................................................... 43 

3.1.3 Contradiction and Quantification ........................................................... 43 

3.1.4 Related Terms ...................................................................................... 45 

3.2 Negation in Natural Languages ........................................................................ 46 

3.2.1 Typology of Negation ............................................................................ 46 

3.2.2 Negation and Word Order ..................................................................... 52 

3.2.3 Scope of Negation ................................................................................ 52 

3.2.4 Double and Multiple Negation ............................................................... 55 

3.3 Problematic Issues about Contradiction ........................................................... 59 

3.3.1 Contradiction and Presupposition ......................................................... 59 

3.3.2 Contradiction and Modality ................................................................... 61 

3.3.3 Contradiction and Vagueness ............................................................... 64 

3.3.4 The Concept of Fake Contradiction ...................................................... 66 


Table of Contents II 

 
3.4 Classification of Contradictions ........................................................................ 67 

3.4.1 Classification of Svintsov ...................................................................... 67 

3.4.2 Classification of Mučnik ........................................................................ 69 

3.4.3 Typology according to Contradictory Element ....................................... 71 

3.4.3.1 In Educational Psychology ...................................................... 71 

3.4.3.2 In Computational Linguistics ................................................... 73 

3.5 Causes and Functions of Contradictions .......................................................... 76 

3.5.1 Causes ................................................................................................. 76 

3.5.2 Functions .............................................................................................. 79 

3.6 Summary  ........................................................................................................ 81 

4 The Characteristics of News Texts ........................................................................ 83 

4.1 Introductory Notions ......................................................................................... 83 

4.1.1 News as Text Genre ............................................................................. 83 

4.1.2 Online Newspaper vs. Printed Newspaper............................................ 86 

4.1.3 Values of Selection and Production of News Articles ............................ 88 

4.2 Structure and Elements of News Articles ......................................................... 91 

4.3 Language of News Article ................................................................................ 95 

4.3.1 Reported Speech.................................................................................. 95 

4.3.1.1 News as an “embedded talk” .................................................. 95 

4.3.1.2 Types of Reported Speech ..................................................... 96 

4.3.1.3 Reporting Expressions ............................................................ 98 

4.3.2 News Actors Labeling ........................................................................... 99 

4.3.3 Event Categories .................................................................................. 99 

4.3.4 Time and Place Mentions ................................................................... 100 

4.3.5 The Use of Numbers and Figures ....................................................... 101 

4.3.6 Other Characteristics .......................................................................... 102 

4.4 Summary  ...................................................................................................... 102 

5 Typology Construction: Types of Contradictions in News Texts ...................... 104 

5.1 Compilation of News Text Contradiction Corpus ............................................ 104 

5.1.1 Data Collection ................................................................................... 104 

5.1.1.1 Collection of News Texts ....................................................... 104 

5.1.1.2 Survey 1: Finding the Contradictions..................................... 106 

5.1.1.3 Results and Evaluation ......................................................... 107 

5.1.2 Data Validation and Filtering ............................................................... 108 

5.1.2.1 Survey 2: Contradiction or Not .............................................. 108 

5.1.2.2 Results and Evaluation ......................................................... 109 

5.2 Typology of Contradictions ............................................................................. 110 


Table of Contents III 

 
5.2.1 Dimension: Contradiction Cues .......................................................... 110 

5.2.2 Dimension: Relatedness of the Parts .................................................. 116 

5.3 Giessen Annotated Corpus of Contradictions in News Texts ......................... 127 

5.3.1 Survey 3: Typology Validation, Results, and Evaluation ..................... 127 

5.3.2 Corpus Annotation .............................................................................. 128 

5.4 Summary  ...................................................................................................... 131 

6 Conceptual Design of a CD System and Supporting Tools ............................... 133 

6.1 Conceptual Design......................................................................................... 133 

6.2 Supporting Tools and Methods ...................................................................... 134 

6.2.1 Processing at Lexical, Morphological and Syntax Levels .................... 134 

6.2.1.1 Tokenization and Sentence Splitting ..................................... 134 

6.2.1.2 Stop Word Detection and Removing ..................................... 136 

6.2.1.3 Part-of-speech Tagging......................................................... 137 

6.2.1.4 Stemming and Lemmatization ............................................... 138 

6.2.1.5 Parsing and Chunking ........................................................... 140 

6.2.2 Processing at Semantic, Pragmatic and Discourse Levels ................. 141 

6.2.2.1 Semantic Role Labeling ........................................................ 141 

6.2.2.2 Recognizing Textual Entailment ............................................ 145 

6.2.2.3 Anaphora Resolution ............................................................ 148 

6.2.3 Further Processing Tasks ................................................................... 155 

6.2.3.1 Negation and Modality Processing ........................................ 155 

6.2.3.2 Sentiment Analysis ............................................................... 157 

6.2.3.3 Named-Entity Recognition .................................................... 159 

6.2.3.4 Temporal Processing ............................................................ 163 

6.2.3.5 Measuring Semantic Textual Similarity ................................. 165 

6.2.4 Approaches to Meaning Representation ............................................. 166 

6.2.5 Computational Sources of Knowledge ................................................ 169 

6.2.5.1 Lexical Resources ................................................................. 169 

6.2.5.2 Ontologies............................................................................. 173 

6.3 Summary  ...................................................................................................... 176 

7 Physical Design of a CD System and Implementation ....................................... 178 

7.1 System Architecture and Potential Contradiction ........................................... 178 

7.2 System Implementation ................................................................................. 181 

7.2.1 Module Preprocessing ........................................................................ 181 

7.2.2 Module Finding Parts of Contradiction ................................................ 183 

7.2.3 Module Finding Contradictions ........................................................... 187 

7.3 Results and Evaluation .................................................................................. 190 


Table of Contents IV 

 
8 Conclusions .......................................................................................................... 194 

Bibliography ................................................................................................................. 198 

Appendix A. Survey 1: An Example of a Questionnaire ............................................ 229 

Appendix B. Survey 2 and A Test for Contradiction ................................................. 257 

 
List of Abbreviations V 

 
List of Abbreviations 

ACE  Automatic Content Extraction Programm 

ACL  Association for Computational Linguistics 

BART  Baltimore Anaphora Resolution Toolkit, currently Beautiful Anaphora 
Resolution Toolkit 

CBOW  Continuous Bag of Words 

CCG  Componential Counting Grid 

CD  Contradiction Detection 

CFG  Context-free Grammar 

COLING  International Conference on Computational Linguistics 

CoNLL  The SIGNLL Conference on Computational Natural Language 
Learning 

CoU  Context of Utterance 

DIRT  Discovery of Inference Rules from Text 

DNA  Duplex Negatio Affirmat 

DNN  Duplex Negatio Negat 

DOLCE  Descriptive Ontology for Linguistic and Cognitive Engineering 

DRT  Discourse Representation Theory 

DSM  Distributional Semantic Model 

EDITS  Edit Distance Textual Entailment Suite 

EMD  Earth Mover’s Distance 

EOP  EXCITEMENT Open Platform 

ESA  Explicit Semantic Analysis 

EXCITEMENT EXploring Customer Interactions through Textual EntailMENT  

FOL  First-Order Logic 

FraCas  A Framework for Computational Semantics 

GLSA  Generalized Latent Semantic Analysis 

HAL  Hyperspace Analogue to Language 

HLBL  Hierarchical Log-bilinear Model 

HOTCoref  Higher Order Tree Coreference 


List of Abbreviations VI 

 
kNN   k-Nearest-Neighbor 

LDA  Latent Dirichlet Allocation 

LDC  Linguistic Data Consortium 

LEM  Law of Excluded Middle 

LNC  Law of Non-Contradiction 

LSA  Latent Semantic Analysis 

LSI  Latent Semantic Indexing 

mSDA  Marginaized Stacked Denoising Autoencoders 

MUC  Message Understanding Conference 

NC  Negative Concord 

NER  Named Entity Recognition 

NLP  Natural Language Processing 

NLTK  Natural Language Toolkit 

NLU  Natural Language Understanding 

NomBank  Noun Annotation Bank 

NPI  Negative Polarity Item 

Okapi BM25  Okapi Best Matching 25 

OWL  Web Ontology Language 

PropBank  Propositional Bank 

RDF  Resource Description Framework 

RST  Rhetorical Structure Theory 

RTE  Recognizing Textual Entailment 

S, V and O  Subject, Verb and Object 

SDRT  Segmented Discourse Representation Theory 

SIGLEX  Special Interest Group on the Lexicon of the Association for Compu-
tational Linguistics 

SIGNLL  Special Interest Group on Natural Language Learning of the Associ-
ation for Computational Linguistics 

SNLI  Stanford Natural Language Inference 

STS  Semantic Textual Similarity 


List of Abbreviations VII 

 
SUMO  Suggested Upper Merged Ontology 

SVM  Support Vector Machines 

T and H  Text and Hypothesis 

TF-IDF  Term Frequency – Inverse Document Frequency 

TnT  Trigrams’n‘Tags 

TREC  Text Retrieval Conference 

VENSES  Venice Semantic Evaluation System 

WMD  Word Mover’s Distance 

XML  Extensible Markup Language 

YAGO  Yet Another Great Ontology 

 
List of Figures VIII 

 
List of Figures 

Page 

Figure 1: Square of Opposition. .................................................................................... 44 

Figure 2: Multiple negation in English and other languages. ..................................... 55 

Figure 3: Classification of contradictions proposed in Mučnik (1985). ..................... 69 

Figure 4: XML annotation layout of the corpus. ........................................................ 129 

Figure 5: Distribution of contradictions in the corpus according to contradiction 

cues.  ............................................................................................................. 130 

Figure 6: Distribution of types of contradictions in the Giessen Annotated Corpus 

of Contradictions in News Texts. .................................................................. 131 

Figure 7: Meaning of John pushed the cart as a conceptual dependency graph. .. 167 

Figure 8: The multiple senses of the noun dog as represented in WordNet. .......... 170 

Figure 9: Excerpt of the DBpedia. Note: From DBpedia and the live extraction of 

structured data from Wikipedia (Morsey et al. 2012: 5). .............................. 175 

Figure 10: The knowledge of the concept eat sandwich as represented in 

ConceptNet. .................................................................................................... 176 

Figure 11: The architecture of the Contradictio system. ........................................... 179 

Figure 12: An information graph for the noun Obama from the sentence, Obama 

speaks to the media in Illinois. ...................................................................... 183 

Figure 13: The idea underlying word mover’s distance model. Note: From From 

Word Embeddings to Document Distances (Kusner et al. 2015: 957). ....... 185 

 
List of Tables IX 

 
List of Tables 

Page 

Table 1: CD Systems submitted for the RTE-3 challenge – Extended Task (2007). .. 24 

Table 2: CD systems submitted for the RTE-4 challenge (2008). ............................... 25 

Table 3: CD systems submitted for the RTE-5 challenge (2009). ............................... 26 

Table 4: Standalone CD systems. ................................................................................. 27 

Table 5: Comparison of approaches to graph alignment applied in de Marneffe et al. 

(2008).  ............................................................................................................... 31 

Table 6: The distribution of premises in the FraCas corpus....................................... 33 

Table 7: The distribution of answers in the FraCas corpus. ....................................... 33 

Table 8: The distribution of the problems per group in the FraCas corpus. ............. 34 

Table 9: The statistics on RTE datasets partially adapted from Bentivogli et al. 

(2009).  ............................................................................................................... 35 

Table 10: Number of contradictions in the RTE-1, RTE-2, and RTE-3 datasets. ........ 36 

Table 11: Distribution of contradictions occurring in the RTE-3 development dataset 

according to the contradiction type. ............................................................... 37 

Table 12: Distribution of contradictions occurring in the Stanford Corpus of Real-

Life Contradictions according to the contradiction type. .............................. 38 

Table 13: Distribution of contradictions, entailments, neutral, and unlabeled cases 

in the SNLI corpus. .......................................................................................... 38 

Table 14: Contradiction types distinguished in de Marneffe et al. (2008: 1041) with 

examples. .......................................................................................................... 75 

Table 15: Definitions of Aristotle’s fallacies as provided in Parry/Hacker (1991: 423-

457).  ............................................................................................................... 78 

Table 16: Types of reported speech. Note: From News discourse (Bednarek/Caple 

2012: 92)............................................................................................................ 97 

Table 17: The distribution of news articles according to the topic of the news story, 

their source, and date of publishing. ............................................................ 105 

Table 18: Distribution of English language competencies. ...................................... 107 

Table 19: Distribution of agreement and disagreement on contradictions among the 

raters.  ............................................................................................................. 109 

Table 20: The scores of inter-rater agreements on contradictions before and after 

discussion, computed with Fleiss’s Kappa and Krippendorff’s Alpha. ..... 109 

Table 21: Knowledge-based inferences from “How Leisure Came”. Note: From 

Constructing inferences during narrative text comprehension (Graesser et 

al. 1994: 375). .................................................................................................. 120 


List of Tables X 

 
Table 22: Alleged presupposition triggers as listed in Potts (2015), Beaver and 

Geurts (2014), and Levinson (1983). ............................................................. 122 

Table 23: The scores of inter-rater agreements on the type of relatedness, 

contradiction cue, and contradiction type, before and after discussion, 

computed with Fleiss’s Kappa and Krippendorff’s Alpha. .......................... 128 

Table 24: Universal thematic roles. Note: From Understanding semantics (Löbner 

2013: 123). ....................................................................................................... 141 

Table 25: Simplified VerbNet entry for the hit-18.1 class. ......................................... 144 

Table 26: Overview of anaphora resolution methods (*Types of cohesion tie are not 

treated in detail). Note: From Anaphora resolution and text retrieval 

(Schmolz 2015: 236). ...................................................................................... 152 

Table 27: Available corpora with annotated named entities. .................................... 162 

Table 28: A DSM for the concept dog. Note: From DSM Tutorial (Stefen Evert et al. 

2009-2016). ...................................................................................................... 169 

Table 29: WordNet 2.1, 3.0 and 3.1 database statistics. ............................................ 171 

Table 30: Confusion matrix for system’s performance (in columns) compared to the 

Gold standard (in rows). ................................................................................ 191 

Table 31: The overall performance of the system evaluated using precision and 

recall.  ............................................................................................................. 191 

Table 32: Confusion matrix for contradiction types found by the system and 

contained in the dataset (Rows: Gold standard, Columns: Contradictio 

system). .......................................................................................................... 192 

Table 33: Confusion matrix for the system performance in finding parts of 

contradiction (Gold standard in rows, system performance in columns). . 193 

 
Introduction 11 

 
1 Introduction 

1.1 Statement of Problem and Motivation 

Please read carefully the following text passage from Daniel Defoe's adventure novel Rob-

inson Crusoe1  (Defoe 2001: 58-59): 

“When I came down from my apartment in the tree I looked about me again, and the 
first thing I found was the boat, which lay as the wind and the sea had tossed her up 
upon the land, about two miles on my right hand. I walked as far as I could upon the 
shore to have got to her; but found a neck or inlet of water between me and the boat, 
which was about half a mile broad […]. 
 
I resolved, if possible, to get to the ship; so I pulled off my clothes, for the weather 
was hot to extremity, and took the water. But when I came to the ship, my difficulty 
was still greater to know how to get on board; for as she lay aground, and high out 
of the water, there was nothing within my reach to lay hold of. I swam round her 
twice, and the second time I spied a small piece of rope, which I wondered I did not 
see at first, hang down by the fore-chains so low as that with great difficulty I got 
hold of it, and by the help of that rope got up into the forecastle of the ship. Here I 
found that the ship was bulged, and had a great deal of water in her hold, but that 
she lay so on the side of a bank of hard sand, or rather earth, that her stern lay lifted 
up upon the bank, and her head low, almost to the water. By this means all her 
quarter was free, and all that was in that part was dry; for you may be sure my first 
work was to search and to see what was spoiled and what was free. And first I found 
that all the ship's provisions were dry and untouched by the water; and being very 
well disposed to eat, I went to the bread-room and filled my pockets with biscuit, and 
eat it as I went about other things, for I had no time to lose.” 

Could you recognize the contradiction in the passage? 

The described scene is a topic of many discussions concerning the work of Daniel Defoe 

(e.g. Baines 2007) and, in general, dealing with logical mistakes occurring in texts of Classic 

Literature. Robinson Crusoe (referred to by the pronoun I during the whole text passage) 

was intended to get to the wrecked ship. As there was a neck of water between, he had to 

swim. Before immediately swimming, and as the weather was hot, Robinson took off his 

clothes. After reaching the ship and being on it for some time, Crusoe found out that some 

of the provisions had remained dry. He, therefore, went to the bread-room and filled his 

pockets with biscuits (I filled my pockets with biscuits). But how could he do this? Filling the 

pockets with anything presupposes some clothes with pockets at the time of filling. But as 

we have been told at the beginning of the text passage, Robinson Crusoe had taken off his 

clothes before swimming. From reading this at the beginning of the text passage, the reader 

                                                
1 The full title of the novel is The Life and Strange Surprising Adventures of Robinson Crusoe, Of York, Mar-

iner: Who lived Eight and Twenty Years, all alone in an un-inhabited Island on the Coast of America, near 

the Mouth of the Great River of Oroonoque; Having been cast on Shore by Shipwreck, wherein all the Men 

perished but himself. With an Account how he was at last as strangely deliver'd by Pyrates. The novel was 

first published in 1719. 


Introduction 12 

 
inferred that Robinson Crusoe did not have pockets during his stay aboard the ship. There 

was also no information or further clues that Crusoe had found and worn any clothes and 

put them on. We obviously deal with a contradiction here – two statements express propo-

sitions that cannot be true at the same time with the same respect.  

It is beyond debate that the recognition of contradictions presents a challenging task for the 

reader (Markman 1979; Garner 1980, 1981) and especially, for the poor readers (Garner 

1980, 1981; Winograd/Johnston 1982). How well an individual performs in detecting con-

tradiction depends on the state of his language and world knowledge, analytical ability, 

memory as well as his individual characteristics such as, e.g., age (Kotsonis/Patterson 

1980; Chan et al. 1987; Vosniadou et al. 1988; Otero/Campanario 1990). The type of con-

tradictions (Markman 1979; Markman/Gorin 1981; Harris et al. 1981; Flavell et al. 1981; 

Paris/Myers 1981; Garner 1981; Baker 1985) and the preceding notification about the pres-

ence of contradictions in the text (Winograd/Johnston 1982; Glenberg et al. 1982; August 

et al. 1984; Baker/Zimlin 1989) can be crucial for the success of this task as well.  

Only a few attempts have been made to reveal and describe the processes involved in the 

recognition of contradictions by a human. The most prominent theories have been devel-

oped by the psychologists in the framework of reading comprehension and described in 

Otero and Kintsch (1992), Singer (1996), Johnson-Laird et al. (2004) and van den Broek et 

al. (2005). The proposed theories differ with respect to the model of reading comprehension 

which they are based upon. 

The focus of the present study are contradictions occurring in and between online news 

texts. There are a number of definitions for contradiction, which, according to Grim (2004), 

can be grouped into four classes: (1) those which define contradiction in terms of truth and 

falsity (Prior 1967: 458; Bonevac 1987: 25; Wolfram 1989: 163; Sainsbury 1991: 369) such 

as in (D1) as follows, (2) in terms of content or form (Reichenbach 1947: 36; Mendelson 

1964: 18; Haack 1978: 244; Kalish et al. 1980: 18; Forbes 1994: 102) such as in (D2), (3) 

in terms of assertion and denial (Strawson 1952, 2011: 16-19; Quine 1959: 9; Brody 1967: 

61; Kahane 1995: 308) such as, e.g., in (D3), and (4) as a state of affairs (Routley/Routley 

1985: 204) such as, e.g., in (D4). Grim (2004) refers to these four groups as semantic, 

syntactic, pragmatic, and ontological, respectively.  

D1 Two propositions are contradictories if and only if it is logically impossible for both to 
be true and logically impossible for both to be false. (Sainsbury 1991: 369) 

D2 Wff* of the form ‘A & ¬A’; statement of the form ‘A and not A’ (Haack 1978: 244) 

D3 A contradiction both makes a claim and denies that very claim. (Kahane 1995: 308) 

D4 A contradictory situation is one where both B and ¬B (it is not the case that B) hold 
for some B. (Routley/Routley 1985: 204) 


Introduction 13 

 
Though these definitions can be used by humans for recognizing contradictions, they are 

practically, with the exception of the third group of definitions and only by considering some 

limitations, only with difficulties applicable for the purpose of the study, which is the devel-

opment of a system for automatic detection of contradictions in news texts. For instance, 

no machines are capable of determining the truth value of a sentence at present.  

It is obvious that most of the above definitions, to some degree, build on one of the three 

versions of Aristotle’s Law of Non-Contradiction (Section 3.1.1). Thus, the third group of 

definitions, for example, seems to reflect the ontological version of the law (not to be con-

fused with Grim’s ontological definition), which states that “it is impossible that the same 

thing can at the same time both belong and not belong to the same object and in the same 

respect, and all other specifications that might be made, let them be added to meet local 

objections” (Metaphysics IV 3 1005b19–23). In our opinion, this formulation is more appli-

cable to development of a system for automatic detection of contradictions and will, there-

fore, be mentioned prior to the purpose of the study.  

It is to note that, besides contradiction, also contrariety will be considered in the present 

study. Though, both terms will be referred to here as contradiction (compare to German: 

kontradiktorischer Widerspruch vs. konträrer Widerspruch), they have to be clearly distin-

guished as not synonymous. The difference between contradiction and contrariety will be 

presented in Section 3.1.2. 

According to the survey on the news consumption across twelve countries conducted in 

20152  by the Reuters Institute for the Study of Journalism, Oxford University over four 

channels of news access – television, online (including social media), radio, and printed 

newspapers, the first two appeared to be the most popular ways of accessing news on a 

weekly basis, with television being the number-one source in, i.a., Germany (82%), France 

(80%), and UK (75%), among others, and online access in, i.a., Urban Brazil (91%), Finland 

(90%), Spain (86%), and Denmark (85%). However, taking into consideration that this sur-

vey has been conducted online and thus, may underrepresent users who do not use online 

services, it can be concluded that TV news is still ahead in the countries that participated in 

the survey; however, with the clear exception of the United States and possibly Denmark, 

Finland, and Australia. Moreover, from comparing the news consumption among people of 

different ages, it can be observed that young people prefer online news and often com-

pletely abandon television news. This trend is especially observed for United States, 

France, and Denmark.  

                                                
2 The online report on the survey can be found by following this  
http://www.digitalnewsreport.org/survey/2015/sources-of-news-2015/ 


Introduction 14 

 
To study in particular online news consumption, the Reuters Institute conducted a survey 

across 36 countries (i.a. USA, Mexico, Australia, EU countries) in five continents.3  Accord-

ing to the survey, around a half of the survey participants (54%) across all countries, with a 

predominance of Southern Europe and Latin America, prefer social media as a source of 

news in contrast with other sources. However, in Spain, Germany, and France, a reverse 

or slowing trend for this can be observed. Further, the report shows that 23% use messag-

ing apps (e.g., WhatsApp, Viber, We Chat, FB Messenger, Line, Kakao Talk) for weekly 

accessing the news. Additionally, it was found out that the access of news via smartphones 

had increased in comparison to computers and tablets, which amounted to 56%, a score 

which had doubled since 2013.  

With the Internet era, not only the readers’ preferences for news source (especially of young 

readers) have changed. The journalistic practice of news production, i.e. information collec-

tion and reporting, has been influenced by the possibilities provided by the Internet as well. 

A number of studies have been conducted which reveal the changes the Internet had 

brought to the process of news production, including Reddick and King (2001), Miller (1998), 

Singer (2003), and Fenton (2012), among others. Fenton (2012) summarizes the research 

findings, i.a., under the umbrella of criteria such as (data transfer) speed and (web) space.  

The great amount of space provided in the web means the production of more news for the 

journalistic practice. Fenton (2012: 559) frames this as “space equals more news”. Space 

provides a possibility of archiving and updating the news, achieving “more depth of infor-

mation coverage” (ibid.). Space allows a storage of news in different multimedia formats, 

and not only as text. Space and speed enable a geographical reach so that journalists do 

not need to leave their newsroom to write about events that have happened in the world.  

Speed enabled by Internet, in turn, for the practice of news production, means an increasing 

value of immediacy (Fenton 2012). However, while the immediate release and update of 

the news texts is doubtlessly an advantage for the news reader, it is unfortunately often only 

possible at the cost of information quality (Gunter 2003; Fenton 2012; Silvia 2001). Taking 

an advantage of the Internet speed, news organizations often publish their news on the web 

“before the usual checks for journalistic integrity have taken place” (Fenton 2012: 561). This 

in turn results in the observation that news texts often include typographical, factual, and 

logical errors, violating accuracy as one of the fundamental values of news text production, 

misinforming the reader, and negatively affecting the credibility of the newspaper (Bell 1991; 

Maier 2005; Bednarek/Caple 2012). 

                                                
3 The Digital News Report 2017 on the survey published online can be found by following this link: 
http://www.digitalnewsreport.org/survey/2017/resources-2017/ 


Introduction 15 

 
Factual errors, according to Silverman (2007), represent the most frequent kind of errors 

occurring in news texts. In contrast to typographical and logical errors, which can be recog-

nized within the text itself, incorrect facts can be revealed only by applying world knowledge 

or by referring to the original or other related information sources. Typographical errors, in 

turn, are not critical and can nowadays be easily recognized by means of autocorrection. In 

contrast, logical errors, which are the result of a violation of logical laws, e.g., the Law of 

Non-Contradiction (LNC) and the Law of Excluded Middle (LEM), are the most challenging 

kind of errors for recognition. In practice, both factual and logical errors, in most cases, 

remain unnoticed by the reader of the news and are taken for granted as reliable or trust-

worthy (Svintsov 1979; Bell 1991). The omission of the errors can be a consequence of 

missing world knowledge required or of a lack of readers’ attention while reading. In any 

case, the reader of the news is misinformed and is not aware of this. 

However, if detected by a reader, the typographical, but especially factual and logical errors 

that have occurred present a negative impact on the newspaper’s credibility and trustwor-

thiness since they are perceived as lies or disinformation (Svintsov 1979; Bell 1991; Silver-

man 2007; Bednarek/Caple 2012). Therefore, in the process of news production, the task 

of news editing is essential and cannot be ignored. Editing has become even more urgent 

today because the modern reader has even more possibilities of verifying the information 

provided, in comparison to the past, as a large amount of related information appears online 

simultaneously (Silverman 2007).  

One should also consider that incorrect facts (factual errors) and logically wrong conclu-

sions (logical errors) in news texts are often used for intentionally serving the purpose of 

manipulation or propaganda. Violating the news value of objectivity (Section 4.1.3), the facts 

are adjusted to influence the reader’s opinion, forcing it into a particular direction to the 

advantage of the country’s, institution’s, or individuals’ interests. In this context, in particular, 

the current phenomenon of fake news reportedly occurring in the social media should be 

mentioned.  

Today in many fields of human life, computers successfully play a supporting role, taking 

over natural language tasks such as, e.g., searching among a huge amount of data and 

delivering the needed information in the shortest amount of time, as well as typographical 

error correction, opinion mining, etc. The main aim of the present study is to propose an 

approach for automatic detection of contradictions (henceforth referred to as CD) in news 

texts.  

This approach can be of practical relevance first, for the task of news editing when proofing 

the text for consistency (=agreement with facts previously stated, no contradictions con-

tained). Second, it can be applied to identify on which facts and aspects the different 


Introduction 16 

 
sources of information disagree and in such a way, to serve the purpose of information 

verification. Third, an automatic CD task can be used to obtain a summarized view of con-

tradictory opinions and facts on particular events from a large number of news texts in order 

that a reader can independently form his opinion based on a full picture. Finally, the ap-

proach can be integrated into other natural language systems and applications such e.g., 

question-answering systems and text summarization which among others use news texts 

as their data source. 

From the theoretical perspective, the significance of the study consists first in summarizing 

and elaborating the existing theoretical knowledge on natural language contradiction. Sec-

ond, the study provides new empirically gained insights into the realization mechanisms of 

natural language contradictions occurring in and between news texts, in this way contrib-

uting to a better understanding of the nature of contradictions and filling the knowledge 

gaps.  

1.2 Subject of the Study 

Natural language contradictions are of complex nature. As will be shown in Chapter 5, the 

realization of contradictions is not limited to the examples such as Socrates is a man and 

Socrates is not a man (under the condition that Socrates refers to the same object in the 

real world), which is discussed by Aristotle (Section 3.1.1). Empirical evidence (see Chapter 

5 for more details) shows that only a few contradictions occurring in the real life are of that 

explicit (prototypical) kind (see, e.g., Svintsov 1979; de Marneffe et al. 2008). Rather, con-

tradictions make use of a variety of natural language devices such as, e.g., paraphrasing, 

synonyms and antonyms, passive and active voice, diversity of negation expression, and 

figurative linguistic means such as idioms, irony, and metaphors. Additionally, the most so-

phisticated kind of contradictions, the so-called implicit contradictions, can be found only 

when applying world knowledge and after conducting a sequence of logical operations such 

as e.g. in (1.1).  

(1.1) The first prize was given to the experienced grandmaster L. Stein who, in total, col-
lected ten points (7 wins and 3 draws). (Svintsov 1979: 195) 

Those familiar with the chess rules know that a chess player gets one point for winning and 

zero points for losing the game. In case of a draw, each player gets a half point. Built on 

this idea and by conducting some simple mathematical operations, we can infer that in the 

case of 7 wins and 3 draws (the second part of the sentence), a player can only collect 8.5 

points and not 10 points. Hence, we observe that there is a contradiction between the first 

and the second parts of the sentence.  


Introduction 17 

 
Implicit contradictions will only partially be the subject of the present study, aiming primarily 

at identifying the realization mechanism and cues (Chapter 5) as well as finding the parts 

of contradictions by applying the state of the art algorithms for natural language processing 

without conducting deep meaning processing. Further in focus are the explicit and implicit 

contradictions that can be detected by means of explicit linguistic, structural, lexical cues, 

and by conducting some additional processing operations (e.g., counting the sum in order 

to detect contradictions arising from numerical divergencies).  

One should note that an additional complexity in finding contradictions can arise in case 

parts of the contradictions occur on different levels of realization. Thus, a contradiction can 

be observed on the word- and phrase-level, such as in a married bachelor (for variations of 

contradictions on lexical level, see Ganeev 2004), on the sentence level – between parts of 

a sentence or between two or more sentences, or on the text level – between the portions 

of a text or between the whole texts such as a contradiction between the Bible and the 

Quran, for example. Only contradictions arising at the level of single sentences occurring in 

one or more texts, as well as parts of a sentence, will be considered for the purpose of this 

study. Though the focus of interest will be on single sentences, it will make use of text 

particularities such as coreference resolution without establishing the referents in the real 

world.  

Finally, another aspect to be considered is that parts of the contradictions are not neces-

sarily to appear at the same time. They can be separated by many years and centuries with 

or without time expression making their recognition by human and detection by machine 

challenging. According to Aristotle’s ontological version of the LNC (Section 3.1.1), how-

ever, the same time reference is required in order for two statements to be judged as a 

contradiction. Taking this into account, we set the borders for the study by limiting the ana-

lyzed textual data thematically (only nine world events) and temporally (three days after the 

reported event had happened) (Section 5.1). No sophisticated time processing will thus be 

conducted. 

1.3 Research Questions and Objectives 

As previously mentioned, the main aim of the present study is to propose a system for 

automatic detection of naturally occurring contradictions in and between news texts pub-

lished in English. As regards to the aim of the study, we formulate the following three blocks 

of related research questions: 

RQ1  What conditions must two sentences necessarily satisfy in order to be judged a con-
tradiction? Are there any natural language exceptions? 

RQ2  What are the cues of contradictions occurring in news texts written in English? Do 
all contradictions occur explicitly in news texts?  


Introduction 18 

 
RQ3  What phenomena of natural languages should a CD system be able to cope with? 
Considering this, how can the architecture of a system for the automatic detection 
of contradictions occurring in and between news texts look like? What is the most 
efficient way of the computational realization of the system’s components? What are 
the current limitations? How can a CD profit from the properties of a text? 

The research objectives serving as milestones toward the main aim of the study, are as 

follows: 

O1a  Review the state of the art of the CD systems, identify their weaknesses and 
strengths, and determine the aspects or components that are to be improved; 

O1b Review the existing datasets of contradictions and decide whether they can be ap-
plied as the basis for the development and evaluation of the CD system. If required, 
collect and prepare own data; 

O2a Based on the existing theory, formulate a set of conditions and rules that underlie 
the realization of natural language contradictions;  

O2b Describe natural language phenomena which can be problematic issues for the CD 
task; 

O3a Outline the characteristics and particularities of the text and, in particular, of the 
online news texts that have to be considered by a CD system and can potentially 
contribute to the efficiency of the CD task; 

O3b  Identify the linguistic cues of naturally occurring news contradictions and offer a ty-
pology of contradictions based on these cues;  

O4 Develop an architecture of a prototype CD system and implement the system. De-
cide on which methods and approaches can be used for implementing the system’s 
components and evaluate them on real cases. 

1.4 Structure of the Thesis 

The overall structure of the study consists of ten chapters, including Introduction, Conclu-

sions, References, and Appendix.  

After introducing the reader with the subject, the main aim, and goals of the study (Chapter 

1, Introduction), Chapter 2 (State of the Art) begins with the presentation of the main stages 

of the development of the CD task. It then goes on with an overview of the existing CD 

systems, summarizes their weaknesses and strengths, and defines the research gaps to 

be addressed in the study (Section 2.1). Finally, the chapter provides a description of the 

available datasets of contradictions, which are an essential condition for the development 

and evaluation of CD systems (Section 2.2).  

The next two chapters (together with Chapter 6) lay out the theoretical dimensions of the 

research, addressing the concepts of contradiction in logic and language (Chapter 3) and 

the characteristics of news texts with a focus on online news texts (Chapter 4). 

In more detail, Chapter 3 (Contradiction in Logic and Language), which consists of five 

sections, is concerned with the traditional approaches to contradiction in logic and lan-

guage. Section 3.1 first presents the traditional view on contradiction as developed by 


Introduction 19 

 
Aristotle and then provides a terminological distinction of contradiction to the related con-

cepts such as contrariety, tautology, and paradox. The focus of Section 3.2 is the realiza-

tion, expression, and interpretation of negation in natural languages with interest in English. 

The subject of Section 3.3 is the scientific debate on the status of contradiction in the light 

of phenomena such as presupposition, modality, vagueness, and ambiguity. Further, Sec-

tion 3.4 provides an overview of existing classifications of textual contradictions, including 

typologies from educational psychology and computational linguistics. Finally, Section 3.5 

concludes the chapter with a summary of the causes and functions of natural language 

contradictions, claiming that contradictions are not always “bad”.  

Chapter 4 (The Characteristics of News Texts) introduces the concept of news texts, includ-

ing the differences between printed and online newspapers, hard and soft news, and values 

in news production (Section 4.1), description of a news article structure and its main ele-

ments (Section 4.2) as well as a discussion of news language particularities (Section 4.3). 

Chapters 5, 6 and 7 focus on the conceptual and physical design as well as implementation 

of the CD system and constitute with Chapters 5 and 7 the empirical part of the present 

work.  

Chapter 5 (Typology Construction: Types of Contradictions in News Texts) describes the 

computationally oriented methodology and reports the results of a corpus-based typology 

construction of the contradictions occurring in single or multiple news texts.  

Chapter 6 (Conceptual Design of a CD System and Supporting Tools) in turn addresses a 

possible conceptual design of a CD system and provides a theoretical background on com-

putational approaches to meaning processing at lexical, morphological, syntax, semantic, 

pragmatic and discourse levels essential for the support of a CD system (Section 6.1– Sec-

tion 6.2.3). Approaches to meaning representation are the topic of Section 6.2.4. The chap-

ter then concludes with a presentation of existing computational sources of lexical and world 

knowledge (Section 6.2.5).  

Chapter 7 (Physical Design of a CD System and Implementation) then proposes an ap-

proach for the CD task, integrating the gained knowledge, and describes the main steps 

and experiments conducted with an implementation of the system’s components. 

Finally, Chapter 8 (Conclusions) summarizes the findings and outlines the limitations of the 

system developed. In respect to these limitations, the areas and tasks for further research 

are defined. 


Introduction 20 

 
1.5 How to Read This Thesis 

I would like to conclude the introductory chapter with some useful remarks on how to read 

this thesis by addressing the use of the examples, terminology, and data.  

All examples in the thesis are provided with an ID that follows a particular system. Each ID 

consists of two digits, separated by a point. The first digit indicates the number of the chapter 

where the example occurs; the second digit indicates the order of the example in this chap-

ter. Examples of contradictions taken from the compiled corpus are additionally provided 

with an id that indicates where the example can be found in the corpus. The digital version 

of the corpus is provided on the USB flash drive submitted along with the present work. The 

digital version of all supplementary materials attached in the present study can be found on 

the USB flash drive as well.   


State of the Art 21 

 
2 State of the Art 

The present chapter serves the purpose of introducing the reader to the state of the 
art of an automatic task of textual CD. First, it provides an overview and description 
of existing CD systems and methods (Section 2.1). In order for a reader to form a 
well-ordered picture about the state of the art, Section 2.1 begins by sketching the 
main stages of development of interest in automatic detecting textual contradictions 
before discussing the methods and systems. Due to the relevance, only a selected 
number of methods and systems will be presented in detail here. The criteria for the 
selection of the systems and methods were an underlying methodology, perfor-
mance evaluation scores as well as experts’ opinions. The section then concludes 
with an outline of weak and strong aspects of the systems indicating the research 
gaps and sets the objectives for the study. Further, in Section 2.2, a description is 
given of the datasets of contradictions – the so-called corpora – available which are 
an essential basis for the development and evaluation of the systems. Additionally, 
the need of collecting own data, despite the existing ones, is explained in this sec-
tion. 

2.1 Methods and Systems 

The interest in an automatic CD within the framework of natural language processing 

(henceforth NLP) and specifically, as a task of natural language understanding (NLU) has 

its origin in the mid-1990s and is associated with the FraCas project (Cooper et al. 1996). 

Since then, a number of systems have been proposed, ranging from the simple and robust 

shallow approaches relying on lexical overlaps and word frequencies to the precise but 

challenging, deep approaches conducting an advanced semantic interpretation. The best 

state-of-the-art systems currently achieve approx. 60% accuracy in identifying contradic-

tions that mainly arise from negation and antonyms. 

The initial attempts of automatic CD were theoretical and relied on the methodological ap-

paratus of the first-order logic (FOL) (Cooper et al. 1996; Condoravdi et al. 2003). Crouch 

et al. (2003), in particular, emphasized the potential of sophisticated FOL approaches such 

as described in Hirst (1991) and Hobbs (1985). However, no practical implementations of 

logic- or quasi-logic-driven systems have been proposed until the middle of the 2000s. To 

the first logical and quasi-logical systems count the system described in Tatu and Moldovan 

(2007), the BLUE system developed by Clark and Harrison (2009), and a hybrid NatLog-

system by MacCartney and Manning (2009).  

The first CD system implemented that went beyond the FOL was proposed in Harabagiu et 

al. (2006). The developers relied only on the capability of the machine-learning algorithms 


State of the Art 22 

 
for textual entailment4 recognition (Section 6.2.2.2) and considered explicit contradiction 

cues such as negation and antonyms.  

A number of systems for CD in English have been developed during the Recognizing Tex-

tual Entailment (RTE) challenges in the years 2007-2009 (RTE-3 Extended Task, RTE-4, 

and RTE-5 challenges).5  The main requirement for the systems was a classification of the 

sentence pairs, provided in the three categories of entailment, contradiction, and unknown, 

the so-called three-way task (Giampiccolo et al. 2007; Voorhees 2008). The RTE systems 

are presented in Table 1 (RTE-3 Extended Task), Table 2 (RTE-4), and Table 3 (RTE-5). 

One should note that the systems submitted in the latter RTE challenges by the same au-

thors are, in most cases, improvements on the earlier RTE submissions. 

In addition to the RTE systems, a number of standalone systems for different languages 

have been developed to the present time as well. These include, among others, systems 

described in Harabagiu et al. (2006), de Marneffe et al. (2008), Ritter et al. (2008), Kim/Zhai 

(2009), Ennals et al. (2010), Tsytsarau et al. (2010, 2011), Tsytsarau/Palpanas (2011), 

Pham et al. (2013), Dînşoreanu/Potolea (2013), Lendvai/Reichel (2016) for English; War-

tena et al. (2006) for Dutch; Kawahara et al. (2010), Hashimoto et al. (2012), Andrade et al. 

(2013), Kloetzer et al. (2013), and Takabatake et al. (2015) for Japanese, and Shih et al. 

(2012) for Chinese. The standalone CD systems for English are summarized in Table 4.  

Both RTE and standalone CD systems have been developed for different application pur-

poses, including, e.g., the improvement of textual entailment recognition tasks (the RTE 

systems), the improvement of text summarization and question-answering systems (e.g., 

Harabagiu et al. 2006) as well as the detection and summarization of conflicting opinions in 

social media and other Web 2.0 platforms (e.g., Kim/Zhai 2009; Ennals et al. 2010; 

Tsytsarau et al. 2010, 2011; Tsytsarau/Palpanas 2011; Dînşoreanu/Potolea 2013; 

Lendvai/Reichel 2016). The systems follow different often-combined rationales and meth-

odologies, apply a variety of NLP tools, and with the exception of the RTE systems, are 

evaluated on different datasets, which makes their comparison and generalization challeng-

ing. The execution of the same steps for different purposes makes the systems 

                                                
4 The term textual entailment is related to the logical entailment but is used in computational linguis-
tics in a looser and more relaxed sense. The organizers of the RTE challenges provide the fol-
lowing definition of textual entailment: “We say that T entails H if, typically, a human reading T 
would infer that H is most probably true” (Dagan/Glickman 2004: 4). The parts of logical entailment 
relation premise and conclusion in the framework of RTE refer to text (T) and hypothesis (H), 
respectively. 

5 The RTE-1, RTE-2, RTE-3 (Main, but not the Extended Task), RTE-6, and RTE-7 challenges fo-
cused on the recognition of entailments only. 


State of the Art 23 

 
generalization difficult as well. Nevertheless, an attempt of systems’ comparison is pre-

sented in Table 1 – Table 4. 

The comparison of the systems reveals that the CD by means of supervised classification 

is a preferred method, despite the need for a large amount of data for classifier training and 

model testing. Based on a set of pre-defined features and manually classified (annotated) 

examples in contradictions and non-contradictions, a classification algorithm searches for 

patterns in the pre-classified data (training data) and builds a model which, after a test stage, 

can then be applied to predict any contradictions in a new corpus. For the classification 

task, a variety of algorithms have been applied, including, among others, maximum entropy 

in, e.g., de Marneffe et al. (2008), SVM (Vapnik 1995) in, e.g., Malakasiotis and Androut-

sopoulos (2007), decision trees in, e.g., Hickl et al. (2007), nearest (shrunken) centroids 

(Tibshirani et al. 2003), and random forest (Breiman 2001) in Lendvai/Reichel (2016). The 

maximum-entropy algorithm has proved to be most efficient so far. For the application of 

the classifiers, the WEKA machine-learning tool6 described, e.g., in Smith/Frank (2016) was 

preferred. 

Concerning pre-defined features, some classification-based systems relied on the degree 

(or score) of similarity between text and hypothesis sentences (for definition of text and 

hypothesis see Footnote 4) in tokens, lemma, parts-of-speech, and sentence length (e.g. 

Malakasiotis/Androutsopoulos 2007; Lendvai/Reichel 2016) computed by multiple similarity 

measures, without considering any other information. For this task, a number of similarity 

measures have been applied, including among others, the Levenshtein distance, the Jaro-

Winkler distance, the Manhattan distance, the Euclidean distance, the cosine similarity, the 

n-gram distance, the matching coefficient, the Dice coefficient, as well as the Jaccard coef-

ficient. In general, the results show that although classification based on similarity scores 

works well for recognizing entailments and neutral cases, CD represents a more complex 

task (Lendvai/Reichel 2016). 

Another group of classification-based systems in turn relied on features which are charac-

teristic for contradiction, including negations, antonyms, numerical mismatches as well as 

mismatches in grammatical functions and thematic roles (Harabagiu et al. 2006; de Marn-

effe et al. 2008). In contrast to the simple computation of similarity, the detection of a con-

tradictory relation requires additional steps such as a unified comparable representation of 

  
6 https://www.cs.waikato.ac.nz/ml/weka/index.html 


State of the Art 24 

 
Feature 

Study 

M
a
la

k
a
s
io

ti
s
/ 

 
A
n
d
ro

u
ts

o
p

o
u
lo

s
 

C
la

rk
 e

t 
a
l.
 

T
a
tu

/M
o
ld

o
v
a
n

 
H
ic

k
l 
e
t 

a
l.
 

B
o
b
ro

w
 e

t 
a
l.
 

M
a
c
C

a
rt

n
e
y
/M

a
n
n
in

g
 

If
te

n
e
/ 

B
a
la

h
u
r-

D
o
b
re

s
c
u

 
W
a
n

g
/N

e
u
m

a
n
n
  

Accuracy (%) 49.4 45.1 71.3 73.1 43.6 59.1 56.9 45.5 

Preprocessing   X X     

Parsing   X      

SRL   X  X    

Anaphora resolution   X X     

Lexical resources  X X X X  X  

Paraphrasing  X X X   X  

World knowledge   X       

M
e
a
n

in
g

  
R
e
p

re
s
e
n

ta
ti

o
n

 Bag-of-words         

Logical form  X X   X   

Dependency 
graph / tree 

 X     X X 

Other        X 

Alignment       X X 

Machine learning X   X    X 

String similarity X   X     

Topic identification         

C
o

n
tr

a
d

ic
ti

o
n

 
c
lu

e
s

 
Negation X X X    X  

Opposition  X       

Other         

Sentiment analysis         

D
a
ta

s
e
ts

 RTE (original) X X X X X X X X 

RTE (modified)         

Other      X   

Table 1: CD Systems submitted for the RTE-3 challenge – Extended Task (2007). 

  
State of the Art 25 

 
Feature 

Study 

G
a
la

n
is

/M
a
la

k
a
s
io

ti
s
  

C
la

rk
/H

a
rr

is
o

n
  

G
lin

o
s
 

W
a
n

g
/N

e
u
m

a
n
n
  

A
g
ic

h
te

in
 e

t 
a

l.
  

M
o
n
ta

lv
o

-H
u

h
n
/T

a
y
lo

r 

V
a
rm

a
 e

t 
a
l.
  

K
re

s
te

l 
e
t 

a
l.
  

S
ib

lin
i/
K

o
s
s
e

im
  

L
i 
e

t 
a

l.
  

C
a
s
ti
llo

/A
lo

n
s
o
 i
 A

le
-

m
a
n
y
  

P
a
d

ó
 e

t 
a
l.
  

If
te

n
e

 
M
o
h
a

m
m

a
d
 e

t 
a

l.
  

Accuracy (%) 67.6 54.7 41.6 61.4 54.7 46.6 30.9 43.2 61.6 58.8 54.6 55.3 68.5 55.6 

Prepro-
cessing 

   X X X X  X X X X X X 

Parsing  X X X   X X X   X X X 

SRL   X      X X     

Anaphora  
resolution 

  X  X    X   X  X 

Lexical  
resources 

X X X X X X X X X X X X X X 

Paraphrasing  X  X           

World 
knowledge  

           X  X 

M
e
a
n

in
g

 r
e
p

re
s
e
n

ta
ti

o
n

 
Bag-of-
words 

 X             

Logical 
form 

 X             

Depend-
ency 
graph / 
tree 

  X X   X X X   X X X 

Other               

Alignment   X    X X X   X  X 

Machine 
learning  

X   X X    X X X X  X 

String simi-
larity 

X    X X     X    

Topic identi-
fication 

           X   

C
o

n
tr

a
d

ic
ti

o
n

 
c
lu

e
s

 
Nega-
tion 

X         X  X X X 

Opposi-
tion 

     X      X X X 

Other     X     X  X X X 

Sentiment 
analysis 

              
D
a
ta

s
e
ts

 
RTE 
(original) 

X X X X X X X X X X X X X X 

RTE 
(modi-
fied) 

              
Other               

Table 2: CD systems submitted for the RTE-4 challenge (2008). 

  
State of the Art 26 

 
Feature 

Study 

M
a
la

k
a
s
io

ti
s
 

C
la

rk
/H

a
rr

is
o

n
 

H
a
n
 R

e
n
 e

t 
a

l.
  

W
a
n

g
 e

t 
a
l.
 

F
e
rr

á
n
d
e
z
 e

t 
a

l.
 

B
re

c
k
 

C
a
s
ti
llo

 
V
a
rm

a
 e

t 
a
l.
 

K
re

s
te

l 
e
t 

a
l.
 

If
te

n
e
/M

o
ru

z
 

Accuracy (%) 57.5 54.7 52.2 63.7 60 57 52.2 46.9 48.7 68.3 

Preprocessing   X  X X X X  X 

Parsing X X X X  X  X X X 

Semantic Role  
Labelling 

   X    X   

Anaphora Resolution   X X       

Lexical resources X X X X X X X X X X 

Paraphrasing  X    X    X 

World knowledge    X  X      

M
e
a
n

in
g

 
R
e
p

re
s
e
n

ta
ti

o
n

 Bag-of-words  X         

Logical form  X         

Dependency 
graph / tree 

   X     X X 

Other           

Alignment    X     X  

Machine learning X  X X X  X    

String similarity X  X  X X X    

Topic identification           

C
o

n
tr

a
d

ic
ti

o
n

 
c
lu

e
s

 
Negation X X   X X  X  X 

Opposition      X  X  X 

Other        X  X 

Sentiment analysis           

D
a
ta

s
e
ts

 RTE (original) X X X X X X X X X X 

RTE (modified)           

Other           

Table 3: CD systems submitted for the RTE-5 challenge (2009). 

  
State of the Art 27 

 
Feature 

Study 

H
a
ra

b
a

g
iu

 e
t 

a
l.
 (

2
0
0

6
) 

d
e
 M

a
rn

e
ff

e
 e

t 
a
l.
 (

2
0

0
8
) 

R
it
te

r 
e
t 
a

l.
 (

2
0
0
8
) 

K
im

/Z
h
a

i 
(2

0
0
9
) 

E
n
n

a
ls

 e
t 

a
l.
 (

2
0

1
0
) 

T
s
y
ts

a
ra

u
/P

a
lp

a
n
a
s
 (

2
0

1
1
) 

T
s
y
ts

a
ra

u
 e

t 
a

l.
 (

2
0

1
0
, 

2
0
1

1
) 

P
h
a

m
 e

t 
a
l.
 (

2
0
1

3
) 

L
e
n
d
v
a

i/
R

e
ic

h
e
l 
(2

0
1
6
) 

W
a
rt

e
n
a
 e

t 
a

l.
 (

2
0

0
6
) 

fo
r 

D
u
tc

h
 

Accuracy (%) 
Precision (%)  
Recall (%) 

64/-
/-  

-
/22.95/ 
19.44  

-/62/ 
19 

n.a. n.a. n.a. n.a. 
-/14/ 
19.44 

iPosts 
-/40/34  
Threads 
-/42/35 

n.a. 

Preprocessing X X X     X X  

Parsing X X      X   

SRL X X      X   

Anaphora resolution X X X     X   

Lexical resources X X X X    X   

Paraphrasing           

World knowledge   X X        

M
e
a
n

in
g

  
R
e
p

re
s
e
n

ta
ti

o
n

 Bag-of-words    X  X X X   

Logical form           

Dependency 
graph / tree 

X X         

Other   X     X  X 

Alignment X X X X    X   

Machine learning X X       X  

String similarity    X     X  

Topic identification  X     X    

C
o

n
tr

a
d

ic
ti

o
n

 
c
lu

e
s

 
Negation X X  X       

Opposition X X  X    X   

Other X X X     X   

Sentiment analysis    X  X X    

D
a
ta

s
e
ts

 RTE (original)  X      X   

RTE (modified) X X         

Other  X X  X X X  X X 

Table 4: Standalone CD systems. 


State of the Art 28 

 
text and hypothesis meaning and their alignment. The preferable means for meaning rep-

resentation were dependency trees converted to typed dependency graphs, e.g., in de 

Marneffe et al. (2008), functional dependency triples alone (Wang/Neumann 2008) or com-

bined with frame representation based on semantic role frames (Pham et al. 2013), the 

functional dependency tuple (Ritter et al. 2008) as well as the bag-of-words (Tsytsarau et 

al. 2010, 2011; Tsytsarau/Palpanas 2011), only to name a few. For the representation of 

sentences as a functional dependency of a verb predicate and two arguments, the REVERB 

tool (Fader et al. 2011) applied in Pham et al. (2013) and the TextRunner Open Information 

Extraction system (Banko et al. 2007; Banko/Etzioni 2008) has been applied in Ritter et al. 

(2008). For alignment, despite a greedy algorithm, the maximum entropy-based classifier 

was preferred (Hickl et al. 2006) in, e.g., Harabagiu et al. (2006). 

In addition to classification- and rule-based systems, the third group of systems adopt a 

slightly loose logical form in their meaning representation and incorporate logical inference 

rules (Tatu/Moldovan 2007; Clark and Harrison 2009; MacCartney/Manning 2007) as well 

as detect contradictions based on opposite sentiments and statistical computing (Tsytsarau 

et al. 2010, 2011; Tsytsarau/Palpanas 2011; Dînşoreanu/Potolea 2013) or patterns over 

ontology terms (Wartena et al. 2006). 

Common for all systems is the use of lexical resources (Section 6.2.5.1) such as WordNet 

(Fellbaum 1998), VerbNet (Kipper et al. 2000), and DIRT (Lin and Pantel 2001) for identify-

ing meaning relations (i.a., oppositions and synonyms) for the purpose of sentence align-

ment, improving the building of a classification model and detecting contradictions. For 

knowledge-based contradictions, the Wikipedia resource was most preferred.  

A number of studies emphasize the importance of finding related text and hypothesis sen-

tences which describe the same event in order to achieve better performance of the sys-

tems on CD task (de Marneffe et al. 2008; Kim/Zhai 2009; Pham et al. 2013; 

Lendvai/Reichel 2016). The authors proceed on the assumption that two events cannot be 

contradictory when they are not related. The related sentences were found in the proposed 

systems by means of, e.g., a Jaccard similarity function in combination with WordNet by, 

e.g., Kim/Zhai (2009) as well as a latent Dirichlet allocation (LDA) topic modelling algorithm 

(Blei et al. 2003) at a sentence level (Denecke/Brosowski 2010) applied in Tstytsarau et al. 

(2011). 

The general natural processing tasks integrated into the systems include data normalization 

(i.a., temporal, abbreviations, etc.), parsing for the purpose of identifying grammatical func-

tions and constructing meaning representations, part-of-speech tagging, anaphora resolu-

tion within a sentence or between two neighbor sentences, semantic role labeling for iden-

tifying the thematic roles, polarity computing, and others. For parsing, the Charniak parser 


State of the Art 29 

 
(Charniak 2000), chart parser SAPIR (Harrison/Maxwell 1986), Collins parser (Collins 

2003), Stanford dependency parser (Klein/Manning 2003; de Marneffe et al. 2006) and 

MiniPar (Lin 1994) have been applied. The LingPipe tool (e.g., described in Baldwin/Daya-

nidhi 2014) was a preferred toolkit for named entity recognition (NER) and TnT (Brants 

2000) for part-of-speech tagging. Anaphora resolution in turn has been performed, e.g., by 

means of a tool which combines the Hobbs algorithm (Hobbs 1978) and the resolution of 

anaphora procedure (Lappin/Leass 1994). Semantic role labeling was conducted by means 

of, e.g., the SENNA package (Collobert et al. 2011). For normalization of time expressions, 

e.g. the TARSQI toolkit (Verhagen et al. 2008) has been applied. Only a few systems (Har-

abagiu et al. 2006; de Marneffe et al. 2008) make use of information on modality and quan-

tification, which is essential for the task of CD. 

To the most prominent, most cited, and interesting CD approaches for English belong to 

those developed and described in Harabagiu et al. (2006) and de Marneffe et al. (2008), as 

well as its improvement and extension proposed in Padó et al. (2008) and Ritter et al. 

(2008), and sentiment-based CD presented in Tsytsarau et al. (2010, 2011) and 

Tsytsarau/Palpanas (2011). 

As already mentioned earlier, Harabagiu et al. (2006) were the first to provide empirical 

results for the task of CD. The authors point out that the task can increase the quality of 

other NLP tasks such as question-answering and multi-document summarization. In the 

case of discovering contradictory information from multiple sources, the systems have to 

decide which information is preferred for the output. For this, the inconsistent information 

can either be checked by the additional intervention of a user or by contacting additional 

knowledge resources. 

The system proposed in Harabagiu et al. (2006) detects contradictions by following two 

views. According to the first view, contradictions can be recognized by removing the nega-

tions of propositions (argument-predicate structure) and then testing the propositions for 

textual entailment. Harabagiu et al. (2006) used their own textual entailment system for 

conducting this task. According to the second view, contradictions can be detected by train-

ing a classifier upon positive representatives of the contradictions relying on linguistic infor-

mation such as negations (n’t, not; verbs to deny, to fail; prepositions without, except, etc.), 

antonyms as well as explicit cues of contrast relations (e.g., but, although, however). For 

the classification task, the maximum entropy machine learning algorithm was applied.  

To train and evaluate the classifier for detecting the contradictions arising from negations 

and antonyms, a modified RTE-2 dataset (for more information, see Section 2.2.2) has been 

used. For training and evaluation of the classifier for recognizing contrast relations datasets 


State of the Art 30 

 
of a total of 10,000 sentence pairs (9,000 training datasets and 1,000 evaluation datasets) 

have been collected from online news articles.  

The results of the training and the following testing of the system showed that the system, 

by following the second view, shows better performance in CD. The proposed approach 

could achieve a 62% overall accuracy in identifying contradictions arising from negation and 

antonyms.  

A similar but more extended system was proposed in de Marneffe et al. (2008). Analogous 

to Harabagiu et al. (2006), the system makes use of the predicate-argument, meaning rep-

resentation, recognition of textual entailment, and supervised machine learning techniques 

but relies, in contrast, the system of Harabagiu et al. (2006) not only on information of ne-

gation and antonyms. 

Moreover, the authors compiled the first corpus of naturally occurring contradictions, repre-

senting a more realistic data basis for system development (Section 2.2.3). Based on their 

corpus, de Marneffe et al. (2008) constructed a typology of contradiction cues, including 

negation, antonymy, numerical mismatches, structural, factivity, and modality information 

as well as world knowledge (see Section 3.4.3.2 for more information on these types). The 

authors point out that the contradictions arising from the first three features are relatively 

easy to model and detect as no deep comprehension is required. Detecting the contradic-

tions marked by the latter aspects, in turn, requires a more precise meaning modeling. 

The system proposed in de Marneffe et al. (2008) is based on the Stanford RTE system 

(MacCartney et al. 2006) and was extended by an additional step of event coreference 

recognition. The authors claim that sentences about different events cannot be contradic-

tory. However, as the result of missing context, sentences such as (2.1) were assumed to 

be contradictory without further analyzing whether woman refers to the same person. 

(2.1)  Passions surrounding Germany’s final match turned violent when a woman 
stabbed her partner because she didn’t want to watch the game. 

 
A woman passionately wanted to watch the game. 

In general, the CD process by the Stanford system consists of four steps. First, the input 

text and hypothesis sentences are syntactically and semantically analyzed by means of the 

Stanford dependency parser (Klein/Manning 2003; de Marneffe et al. 2006) and then con-

verted to typed dependency graphs. In the second step, based on the similarity and syntac-

tic information that was combined by means of the margin infused relaxed algorithm (Cram-

mer/Singer 2001), the graphs are aligned with each other, if possible. Padó et al. (2008) 

offered an improvement on this step by applying the edit distance-based alignment system 

MANLI (MacCartney et al. 2008) and the stochastic aligner. In the third step, sentences that 

are not related and do not describe the same event are filtered out by the system. Two 


State of the Art 31 

 
different approaches have been proposed for this task. The authors claim that on one side, 

the root of the hypothesis graph aligned with text graph can indicate the co-referent events. 

It is, however, efficient in the case when the hypothesis sentences are shorter than the text 

sentences. On the other side, the authors propose modeling the sentence topicality as a 

technique for co-referent event detection. The two approaches were tested on the RTE-3 

development dataset. The results are presented in Table 5, indicating first that the two ap-

proaches, in general, have to be improved and second, addressing the need for other tech-

niques for filtering the non-co-referent events. Finally, in the fourth step, the contradictory 

features are extracted, and logistic regression is applied to classify the hypothesis and text 

sentences as contradictory or not. 

Approach Precision Recall 

No filter 55.10 32.93 

Root alignment 61.36 32.93 

Root alignment + topicality modelling 61.90 31.71 

Table 5: Comparison of approaches to graph alignment applied in de Marneffe et al. (2008). 

To test the system, the modified RTE-1_test, the RTE-2_test (contradictions arising from 

negations) dataset, and the original RTE-3_test dataset were used. The authors report a 

42.22% precision and a 26.21% recall for detecting contradictions in the RTE-1_test da-

taset, a 22.95% precision and a 19.44% recall for the RTE-3_test dataset, and a 62.97% 

precision, a 62.50% recall, and a 62.74% accuracy for the modified RTE-2_test dataset of 

negation. Further, the comparison of the results for each contradiction type separately 

shows that the system is efficient in the detection of contradictions arising from negation, 

antonyms, and numeric mismatches and needs improvement in the detection of lexical and 

world knowledge contradictions. 

Ritter et al. (2008) proposed an extension of the Stanford system, addressing the problem 

of world knowledge contradictions in their study, such as in (2.2):  

(2.2) a. Mozart was born in Salzburg.  
 
b. Mozart was born in Vienna. 

Here, a contradiction arises as the result of the incompatibility between Salzburg and Vi-

enna in respect to the co-referent subject Mozart and driven by the relation expression 

was_born_in. This kind of relation, which can be formally represented as R(x,y), the authors 

call functional. The functional relation e.g. in (2.2a) can thus be represented as 

was_born_in(Mozart, Salzburg). The relation R between x (subject) and y (object) is a func-

tional relation if and only if x is not ambiguous and is not related to different entities in the 

real world, and the function R maps x to the unique variable y.  


State of the Art 32 

 
For the detection of contradictions marked by functional relations, Ritter et al. (2008) pro-

posed a three-staged domain-independent system which they called AuContraire. In the 

first stage, the system analyzes sentences and presents them as one or more tuples that 

have the form R(x,y). For this task, the TextRunner component of the Open Information 

Extraction system (Banko et al. 2007; Banko/Etzioni 2008) was applied. In the second 

stage, the system identifies pairs of sentences which, with high probability, are functional 

relations and groups them into a set of R(x, •) with the same subject. For this, the authors 

propose the application of a modified expectation-maximization algorithm (Dempster et al. 

1997). Finally, in the third stage, the system filters out cases, such as in (2.3), by reasoning 

about the synonymy, meronymy, and type of x and y (person, data, location, etc.) and iden-

tifying the non-co-referent arguments. For identifying the meronyms, the developers used 

diverse lexical resources such as Tipster Gazetteer and WordNet (Fellbaum 1998). The 

synonyms, in turn, were recognized by computing the edit distance and string similarity 

(Cohen et al. 2003), as well as by applying a RESOLVER system for synonyms identifica-

tion (Yates/Etzioni 2007), and by WordNet. To identify the type of x and y, the task of NER 

was performed in combination with lists of personal and geographical names. 

(2.3) Alan Turing was born in London.  
 
Alan Turing was born in England.  

To evaluate the system, Ritter et al. (2008) first used TextRunner to collect 1000 relations 

automatically from 117 million Web pages. They labeled each relation as a functional or a 

non-functional relation. They achieved a 62% precision and a 12% recall, and a 92% recall 

and a 51% precision on the balanced data (contradictions and non-contradictions in a pro-

portion of 1:1). 

2.2 Corpora of Contradictions 

2.2.1 FraCas Inference Data Suite 

The FraCas (A Framework for Computational Semantics) inference test suite is considered 

to be the first corpus for English, which includes contradictions together with examples of 

entailments. The dataset was developed within the scope of a joint project of the Universität 

des Saarlandes (Germany), Universität Stuttgart (Germany), and University of Edinburgh 

(United Kingdom) in the middle of the 1990s (Cooper et al. 1996). The purpose of the project 

was to provide data for the development, evaluation, and improvement of applications for 

NLP focusing on inference processing. Cooper et al. (1996) define the central capability of 

such applications to be the ability of inference processing.  

The FraCas corpus consists of 346 units, so-called problems, each including 1-5 statements 

(premises), one yes/no question, and a yes/no answer, where yes indicates an entailment, 


State of the Art 33 

 
no a contradiction, and don’t know remains for neutral cases. Some yes and no answers 

additionally include comments and explanations as in the example of (2.4). The number of 

premises in the problems in total amounts to 536. The distribution of the premises and an-

swers in the corpus is presented in Table 6 and Table 7, respectively. 

(2.4) Premise: Dumbo is a large animal.  
Question: Is Dumbo a small animal?  
Answer: [No]  
Large(N) => ¬Small(N) 

Number of Premises Number of Problems Number of Problems (%) 

1 192 55.5 

2 122 35.3 

3 29 8.4 

4 2 0.6 

5 1 0.3 

Table 6: The distribution of premises in the FraCas corpus. 

Answer Number of Answers Number of Answers (%) 

Yes 180 52 

Don’t 94 27 

No 31 9 

Other/complex 41 12 

Table 7: The distribution of answers in the FraCas corpus. 

In general, the FraCas problems are divided into nine groups, according to the categories 

involved in semantic inference construction such as quantifiers, plurals, anaphora, ellipsis, 

adjectives, comparatives, temporal reference, verbs, and attitudes. The problems in each 

group, in turn, are further divided into subgroups representing single aspects of each cate-

gory. The problem unit in (2.4) is an example of the category adjectives, the subcategory 

opposites. The distribution of the problems in the groups is presented in Table 8. 

  
State of the Art 34 

 
Group of Problems Number of Problems Number of Problems (%) 

Quantifiers 80 23 

Plurals 33 10 

Anaphora 28 8 

Ellipsis 55 16 

Adjectives 23 7 

Comparatives 31 9 

Temporal 75 22 

Verbs 8 2 

Attitudes 13 4 

Table 8: The distribution of the problems per group in the FraCas corpus. 

In 2009, MacCartney improved the FraCas corpus for the purpose of his study and anno-

tated it with XML. Besides conducting some corrections and adding relevant notes, Mac-

Cartney (2009) rephrased the questions into declarative sentences, facilitating them for au-

tomatic processing. The original version of the FraCas corpus as ps-file and its improved 

XML-version are freely available for download at the webpage of Stanford University.7 

2.2.2 RTE Datasets and Their Modifications 

A number of datasets, including contradictions, have been developed within the RTE chal-

lenge during the period of 2006 to 2011. The RTE datasets were created with the aim of 

providing a comparable basis for evaluation of the systems participating in the RTE chal-

lenges. All datasets are divided into development and text datasets and include mainly 

manually constructed pairs of sentences, representing entailments and non-entailments 

(contradictions and neutral cases). The statistics on RTE datasets, partially adapted from 

Bentivogli et al. (2009), are presented in Table 9. 

All RTE datasets are freely available on the web, directly or upon request.8 Since RTE-6 

(Bentivogli et al. 2010) and RTE-7 (Bentivogli et al. 2011) include no annotations of contra-

dictions, as well as no extensions of the datasets as regards to contradictions, the datasets 

will not any further be taken into consideration.  

The RTE-1 (Dagan et al. 2006), RTE-2 (Bar-Haim et al. 2006), and RTE-3 Main Task (Giam-

piccolo et al. 2007) challenges were interested only in the task of automatic classification of 

the data in entailments and non-entailments. For this reason, the correspondent datasets 

are annotated exclusively with the categories entailments (label yes) and non-entailments 

                                                
7 https://nlp.stanford.edu/~wcmac/downloads/ 
8 https://tac.nist.gov// 


State of the Art 35 

 
(label no), without further specification of non-entailments in contradictions and neutral 

cases. In terms of the RTE challenge, this is called a two-way task. The three-way task 

annotation of the RTE-1 and RTE-2 datasets, representing entailments, contradictions, and 

neutral cases, was later performed by Harabagiu et al. (2006) and de Marneffe et al. (2008). 

Challenge Dataset 
Size (No. 
of pairs) 

Hypothesis 
length (No. of 
words) 

Text length 
(No. of words) 

Contradictions 
(%) 

RTE-1 
Dev 567 10.08 24.78 - 

Test 800 10.8 26.04 - 

RTE-2 
Dev 800 9.65 27.15 - 

Test 800 8.39 28.37 - 

RTE-3  
(Extended) 

Dev 800 8.46 34.98 10 

Test 800 7.87 30.06 9 

RTE-4 Test 1,000 7.7 40.15 15 

RTE-5 
Dev 600 7.79 99.49 15 

Test 600 7.92 99.41 15 

Table 9: The statistics on RTE datasets partially adapted from Bentivogli et al. (2009). 

Harabagiu et al. (2006) modified the RTE-2 dataset for the purpose of training and testing 

their system for the detection of contradictions marked by explicit negations (e.g., not), an-

tonymy, and contrast discourse relation cues (e.g., but, although). To our current 

knowledge, the modified corpus is not available, neither for free nor for purchase.  

In modifying the RTE-2 dataset, Harabagiu et al. (2006) followed three different approaches. 

First, 800 instances of positive entailments from the RTE-2 dataset were manually negated 

by human annotators, such as shown in (2.5). As a result, a balanced corpus of 800 con-

tradictions (Dataset 1) has been created. In order to avoid overtraining the model, the an-

notators were also asked to negate 800 examples of negative entailments (=non-entail-

ments) from the RTE-2 dataset. The produced instances (Dataset 2) were then checked to 

remove contradictions. 

(2.5) a. Former dissident John Bok, who has been on a hunger strike since Monday, 
says he wants to increase pressure on Stanislav Gross to resign as prime minister. 
 
b. A hunger strike was not attempted. 

Second, the human annotators were asked to paraphrase the negative sentences created 

in each pair of the Dataset 1, such as in the example of (2.6). As the paraphrasing was not 

possible for all cases, the corpus of 638 out of 800 instances could be created. 

(2.6) a. Former dissident John Bok, who has been on a hunger strike since Monday, 
says he wants to increase pressure on Stanislav Gross to resign as prime minister. 
 
b. A hunger strike was called off. 


State of the Art 36 

 
Finally, the third dataset was created by combining 800 examples of non-contradictions with 

a randomly chosen 400 contradictions from the first and second datasets.  

Two years later, de Marneffe et al. (2008) proposed modifications and extensions of the 

RTE-1, RTE-2 and RTE-3 datasets.9 First, following the methodology of Harabagiu et al. 

(2006), they modified the RTE-2 dataset by randomly choosing 102 pairs of sentences (51 

entailment and 51 non-entailments) from the RTE-2 test dataset and changing them by 

adding explicit negation. Afterward, they labeled the sentence pairs with yes for contradic-

tion and no for a non-contradiction. The datasets can be downloaded from the website of 

the Stanford NLP Group10. 

Second, de Marneffe et al. (2008) extended the annotation of the sentence pairs of the 

RTE-1, RTE-2, and RTE-3 (Main Task) datasets from two-way task labels (yes for entail-

ment relation between the sentences in the pair and no for non-entailment) to three-way 

task labels (yes for entailment relation between the sentences in the pair, no for contradic-

tion, and unknown for non-entailment relation, excluding contradiction). For this, each in-

stance of non-entailments in the RTE-1, RTE-2, and RTE-3 datasets was checked whether 

it is a contradiction or not. The decision is made by following the guidelines prepared by the 

Stanford project team.11 The pairs were labeled manually, either by one or two annotators. 

Moreover, the contradictions in the RTE-1, RTE-2, and RTE-3 datasets were assigned a 

type of contradiction based on a contradiction type (e.g., negation, antonymy, world 

knowledge, etc.). More details on the characteristics of each contradiction type are provided 

in Section 3.4.3.2 of the present work. The distribution of contradictions in the RTE-1, RTE-

2, and RTE 3 tests and development datasets is presented in Table 10. According to the 

statistics, contradictions constitute in total only 10% of the instances in all three RTE da-

tasets. The distribution of contradictions according to their types on the example of the RTE-

3 development dataset is presented in Table 11. 

Challenge Dataset Original file name 
Number of con-
tradictions 

Total number of 
instances 

RTE-1 

development (1)  RTE1_dev1 48 287 

development (2)  RTE1_dev2 55 280 

test  RTE1_test 149 800 

RTE-2 development  RTE2_dev 11 800 

RTE-3 
development  RTE3_dev 80 800 

test  RTE3_test 72 800 

Table 10: Number of contradictions in the RTE-1, RTE-2, and RTE-3 datasets. 

                                                
9 De Marneffe et al. (2008) explain the need to again modify the datasets by the fact, that the corpora 
could not be made available by Harabagiu et al. 

10 https://nlp.stanford.edu/projects/contradiction/ 
11 https://nlp.stanford.edu/projects/contradiction/contradiction_guidelines.pdf 


State of the Art 37 

 
Type of contradiction Distribution (%) 

Antonym 15.0 

Negation 8.8 

Numeric 8.8 

Factive/Modal 5.0 

Structure 16.3 

Lexical 18.8 

World Knowledge 27.5 

Table 11: Distribution of contradictions occurring in the RTE-3 development dataset accord-
ing to the contradiction type. 

Since 2008 three-way task labeled RTE-4 (Giampiccolo et al. 2008) and RTE-5 (Bentivogli 

et al. 2009) datasets specifying non-entailments into contradiction and unknown have been 

created. Sentence pairs in the datasets are labeled with yes for positive entailment, no for 

contradiction and unknown for neutral cases. The methodology of datasets compilation and 

annotation is the same as for the RTE-2 and is described in more detail in Dagan et al. 

(2009). The distribution of contradictions in the RTE-4 and RTE-5 datasets (test and devel-

opment) is presented in Table 9. The main particularity of the RTE-5 dataset toward the 

other RTE datasets is the larger size of texts, in such a way providing a more realistic data 

basis for the development and evaluation of CD and RTE systems. 

2.2.3 Stanford Corpus of Real-Life Contradictions 

Besides modifying and extending the RTE datasets, de Marneffe et al. (2008) additionally 

compiled a corpus of natural, or “real-life”, contradictions. The authors argue that manually 

created contradictions from the RTE 1-3 datasets do not necessarily cover the diversity of 

contradictions naturally occurring in the language and, therefore, provide an insufficient data 

basis for the development of efficient and effective systems for CD. Additionally, they claim 

that real contradictions can be more challenging for automatic recognition than the manually 

created ones. 

To compile a corpus of naturally occurring contradictions, de Marneffe et al. (2008) collected 

131 pairs of contradictory sentences from the web. The instances included 19 contradictions 

from news articles (predominately from Google News), 51 from Wikipedia, 10 from the Lexis 

Nexis database, and 51 from the LDC project data. The sentence pairs were then manually 

annotated by two annotators with contradiction types. In case of divergences in annotator’s 

judgments, these have been clarified by discussion with agreement achieved if possible. 

Unfortunately, no information on an agreement between annotators on contradiction types 

has been provided by the researchers. The distribution of contradictions according to their 

type is presented in Table 12.  


State of the Art 38 

 
Type of contradiction Distribution (%) 

Antonym 9.0 

Negation 17.6 

Numeric 29.0 

Factive / Modal 6.9 

Structure 3.1 

Lexical 21.4 

World Knowledge 13.0 

Table 12: Distribution of contradictions occurring in the Stanford Corpus of Real-Life Contra-
dictions according to the contradiction type. 

2.2.4 SNLI Corpus 

Another corpus developed by the Stanford group, not only for the study of contradiction and 

textual entailment but also for the development of other applications for NLP is the SNLI 1.0 

(Stanford Natural Language Inference) balanced corpus. Currently, the SNLI is considered 

as the largest state-of-the-art corpus for the task of RTE (also natural inference).  

The corpus is divided into development, test, and training datasets and consists of a total 

570,152 sentence pairs, including examples of entailment, contradiction, and neutral cases. 

Their distribution in each dataset is presented in Table 13. The total number of instances in 

the corpus amounts 37,026. 

Dataset/Char-
acteristics 

Size 
(No. of 
pairs) 

No. of contra-
diction 

No. of entail-
ments 

No. of neu-
tral cases 

No. of unla-
belled cases 

Development 10,000 3,278 3,329 3,235 158 

Test 10,000 3,237 3,368 3,219 176 

Training 550,152 183,187 183,416 182,764 785 

Table 13: Distribution of contradictions, entailments, neutral, and unlabeled cases in the SNLI 
corpus. 

The sentence pairs for the corpus have been created manually in “a grounded naturalistic 

context” (Bowman et al. 2015: 1) by about 2,500 participants of the crowdsourcing Internet 

marketplace Amazon Mechanical Turk. For this purpose, the Stanford team developed the 

following methodology. Each MTurk worker was presented with a caption of a photo that 

served as a premise and was given a task to write three kinds of hypotheses for this caption, 

representing entailment (definitely a true description of the photo caption), contradiction 

(definitely a false description as of a photo) and a neutral sentence (might be a true descrip-

tion of a caption of a photo) for one premise. Photo captions were provided by the Flickr 

corpus which consists of 160,000 unattributed captions to 30,000 scenes (Young et al. 

2014).  


State of the Art 39 

 
Thus, for example, for a caption of a photo Two dogs are running through a field, the entail-

ment could be as shown in (2.7a), the neutral sentence as in (2.7b), and the contradiction 

as in (2.7c). The examples are taken from Bowman et al. (2015: 3).  

(2.7) a. There are animals outdoors. 
 

b. Some puppies are running to catch a stick. 
 
c. The pets are sitting on a couch. (Under assumption that both refer to the same 
point  in the time) 

In total, 570,152 sentence pairs have been collected. These are presented as original sen-

tences, as syntactically parsed, and as S-ROOT parsed. The premise sentences are pre-

dominantly longer than the hypothesis sentences. That is, the mean length of premise sen-

tence is 14.1 tokens, and the mean length of the hypothesis is 8.3 tokens. Moreover, prem-

ise and hypothesis are in, most cases, syntactically different from each other. Further, the 

data in the corpus is not cleaned and includes few mistakes. The SNLI is released under a 

Creative Commons Attribution-Share Alike 4.0 International License and can be down-

loaded freely.12 It is available in the JSON format and as text files with tab separated values. 

2.3 Summary 

To sum up, the present methods and systems for CD task show good but still insufficient 

performance. That is, the mean accuracy score that the current systems could achieve ac-

counts for 60%. The relatively low performance of the systems can be explained by the 

complexity of natural language contradictions, as well as by the diversity of ways and mech-

anisms of their realization, making the task of automatic CD challenging. The specific rea-

sons for the low performance of the systems can be the following. First, most of the methods 

were initially developed and tested on the basis of artificially synthesized pairs of contradic-

tory sentences and are, therefore, probably not able to cover the whole diversity of naturally 

occurring contradictions. Second, the systems developed focus mainly on detection of ex-

plicitly expressed contradictions, relying on linguistic features such as negation and anto-

nyms. Only a few methods address the detection of implicit contradictions, which requires 

more sophisticated processing than the detection of explicitly expressed contradictions. 

Third, the pairs of contradictory sentences were analyzed out of the context in which they 

occur, in this way losing helpful information for CD such as e.g. the aspect of coreference 

between entities and events. Thus, there still remains a need for an efficient method for an 

                                                
12 nlp.stanford.edu/projects/snli/ 


State of the Art 40 

 
automatic CD indicating, foremost, gaps in the efficient methods for finding related sen-

tences that may potentially form a contradictory or contrary relation. 

Though different approaches have been applied to the collection of contradictions, including 

manual construction and free collection from the web, the manual construction of contradic-

tions has been preferred so far. In our opinion, however, the manually constructed examples 

do not have a claim to cover the diversity of the naturally occurring contradictions. Addition-

ally, due to the limitations of the manual data creation, contradiction pairs are presented 

isolated from their text and context, thereby losing valuable information such as, e.g., the 

co-references (without knowledge about the referents in the real world) that can contribute 

to the better performance of the systems. Finally, with the exception of corpora that include 

some single examples, there is no special corpus for news text contradictions. Therefore, 

built on this background, there arises the need of collecting our own data – contradictions 

that occur in news texts – for the purpose of the study. Our methodology for collection of 

contradictions naturally occurring in news texts, along with text they appear in, will be pro-

vided in Chapter 5. 

  
Contradiction in Logic and Language