Channels Resources Recent Items Reading list HomeRegisterLoginSupportContact


Query: discourse or RST or centering language
Status: updated [Success]
1-20 of 4050: 12345...203
View PDF Sentence Level Discourse Parsing using Syntactic and Lexical InformationAbstract: We introduce two probabilistic models that can be used to identify elementary discourse units and build sentence-level discourse parse trees. The models use syntactic and lexical features. A discourse parsing algorithm that implements these models derives discourse parse trees with an error reduction of 18.8% over a state-of-the-art decision-based discourse parser. A set of empirical evaluations shows that our discourse parsing model is sophisticated enough to yield discourse trees at an accuracy level that matches near-human levels of performance.
Radu Soricut and Daniel Marcu
Google Scholar CiteSeer X DBLP Database
H.3.4 Systems
Google Scholar CiteSeer X DBLP Database
Subalalitha C N, Ranjani Parthasarathi
Google Scholar CiteSeer X DBLP Database
View PDF Text-level Discourse Parsing with Rich Linguistic FeaturesAbstract: In this paper, we develop an RST-style textlevel discourse parser, based on the HILDA discourse parser (Hernault et al., 2010b). We significantly improve its tree-building step by incorporating our own rich linguistic features. We also analyze the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourseparsing performance under different discourse conditions.
Vanessa Wei Feng Graeme Hirst
Google Scholar CiteSeer X DBLP Database
Suzan Verberne Lou Boves Nelleke Oostdijk Peter-Arno Coppen
Google Scholar CiteSeer X DBLP Database
Fei Wang Yunfang Wu Likun Qiu
Google Scholar CiteSeer X DBLP Database
View PDF Cross-lingual Identification of Ambiguous Discourse Connectives for Resource-Poor LanguageAbstract: The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chinese. A language independent framework is proposed utilizing bilingual dictionaries, Penn Discourse Treebank and parallel data between English and Chinese. We start from translating the English connectives to Chinese using a bi-lingual dictionary. Then, the ambiguities in terms of senses a connective may signal are estimated based on the ambiguities of English connectives and word alignment information. Finally, the ambiguity between discourse usage and non-discourse usage were disambiguated using the co-training algorithm. Experimental results showed the proposed method not only built a high quality connective lexicon for Chinese but also achieved a high performance in recognizing the ambiguities. We also present a discourse corpus for Chinese which will soon become the first Chinese discourse corpus publicly available.
Wei Gao Bin yang Li Zhong yu Wei Fai Wong
Google Scholar CiteSeer X DBLP Database
View PDF Identification of Truth and Deception in Text: Application of Vector Space Model to Rhetorical Structure TheoryAbstract: The paper proposes to use Rhetorical Structure Theory (RST) analytic framework to identify systematic differences between deceptive and truthful stories in terms of their coherence and structure. A sample of 36 elicited personal stories, self-ranked as completely truthful or completely deceptive, is manually analyzed by assigning RST discourse relations among a story's constituent parts. Vector Space Model (VSM) assesses each story's position in multi-dimensional RST space with respect to its distance to truth and deceptive centers as measures of the story's level of deception and truthfulness. Ten human judges evaluate if each story is deceptive or not, and assign their confidence levels, which produce measures of the human expected deception and truthfulness levels. The paper contributes to deception detection research and RST twofold: a) demonstration of discourse structure analysis in pragmatics as a prominent way of automated deception detection and, as such, an effective complement to lexico-semantic analysis, and b) development of RST-VSM methodology to interpret RST analysis in identification of previously unseen deceptive texts.
Victoria L. Rubin Tatiana Vashchilko
Google Scholar CiteSeer X DBLP Database
View PDF Text-level Discourse Parsing with Rich Linguistic FeaturesAbstract: In this paper, we develop an RST-style textlevel discourse parser, based on the HILDA discourse parser (Hernault et al., 2010b). We significantly improve its tree-building step by incorporating our own rich linguistic features. We also analyze the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourseparsing performance under different discourse conditions.
Vanessa Wei Feng Graeme Hirst
Google Scholar CiteSeer X DBLP Database
View PDF Towards Semi-Supervised Classification of Discourse Relations using Feature CorrelationsAbstract: Two of the main corpora available for training discourse relation classifiers are the RST Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), which are both based on the Wall Street Journal corpus. Most recent work using discourse relation classifiers have employed fully-supervised methods on these corpora. However, certain discourse relations have little labeled data, causing low classification performance for their associated classes. In this paper, we attempt to tackle this problem by employing a semi-supervised method for discourse relation classification. The proposed method is based on the analysis of feature cooccurrences in unlabeled data. This information is then used as a basis to extend the feature vectors during training. The proposed method is evaluated on both RST-DT and PDTB, where it significantly outperformed baseline classifiers. We believe that the proposed method is a first step towards improving classification performance, particularly for discourse relations lacking annotated data.
Hugo Hernault Danushka Bollegala Mitsuru Ishizuka
Google Scholar CiteSeer X DBLP Database
View PDF Multi-Layer Discourse Annotation of a Dutch Text CorpusAbstract: We have compiled a corpus of 80 Dutch texts from expository and persuasive genres, which we annotated for rhetorical and genre-specific discourse structure, and lexical cohesion with the goal of creating a gold standard for further research. The annotations are based on a segmentation of the text in elementary discourse units that takes into account cues from syntax and punctuation. During the labor-intensive discourse-structure annotation (RST analysis), we took great care to thoroughly reconcile the initial analyses. That process and the availability of two independent initial analyses for each text allows us to analyze our disagreements and to assess the confusability of RST relations, and thereby improve the annotation guidelines and gather evidence for the classification of these relations into larger groups. We are using this resource for corpus-based studies of discourse relations, discourse markers, cohesion, and genre differences, e.g., the question of how discourse structure and lexical cohesion interact for different genres in the overall organization of texts. We are also exploring automatic text segmentation and semi-automatic discourse annotation. Keywords: discourse structure, coherence relations, lexical cohesion
Gisela Redeker, Gosse Bouma, Markus Egg
Google Scholar CiteSeer X DBLP Database
View PDF Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure TheoryAbstract: We describe our experience in developing a discourse -annotated corpus for community -wide use. Working in the framework of Rhetorical Structure Theory, we were able to create a large annotated resource with very high consistency, using a well-defined methodology and protocol. This resource is made publicly available through the Linguistic Data Consortium to enable researchers to develop empirically grounded, discourse -specific applications.
Lynn Carlson Daniel Marcu Mary Ellen Okurowski
Google Scholar CiteSeer X DBLP Database
View PDF Measuring the Strength of Linguistic Cues for DiscourseAbstract: Discourse relations in the recent literature are typically classified as either explicit (e.g., when a discourse connective like "because" is present) or implicit. This binary treatment of implicitness is advantageous for simplifying the explanation of many phenomena in discourse processing. On the other hand, linguists do not yet agree as to what types of textual particles contribute to revealing the relation between any pair of sentences or clauses in a text. At one extreme, we can claim that every single word in either of the sentences involved can play a role in shaping a discourse relation. In this work, we propose a measure to quantify how good a cue a certain textual element is for a specific discourse relation, i.e., a measure of the strength of discourse markers. We will illustrate how this measure becomes important both for modeling discourse relation construction as well as developing automatic tools for identifying discourse relations.
(no authors)
Google Scholar CiteSeer X DBLP Database
View PDF Analyses of the Association between Discourse Relation and Sentiment Polarity with a Chinese Human-Annotated CorpusAbstract: Discourse relation may entail sentiment information. In this work, we annotate both discourse relation and sentiment information on a moderate-sized Chinese corpus extracted from the ClueWeb09. Based on the annotation, we investigate the association between the relation type and the sentiment polarity in Chinese and interpret the data from various aspects. Finally, we highlight some language phenomena and give some remarks.
Hen-Hsen Huang Chi-Hsin Yu Tai-Wei Chang Cong-Kai Lin Hsin-Hsi Chen
Google Scholar CiteSeer X DBLP Database
View PDF Discourse indicators for content selection in summarizationAbstract: We present analyses aimed at eliciting which specific aspects of discourse provide the strongest indication for text importance. In the context of content selection for single document summarization of news, we examine the benefits of both the graph structure of text provided by discourse relations and the semantic sense of these relations. We find that structure information is the most robust indicator of importance. Semantic sense only provides constraints on content selection but is not indicative of important content by itself. However, sense features complement structure information and lead to improved performance. Further, both types of discourse information prove complementary to non-discourse features. While our results establish the usefulness of discourse features, we also find that lexical overlap provides a simple and cheap alternative to discourse for computing text structure with comparable performance for the task of content selection.
Annie Louis, Aravind Joshi, Ani Nenkova
Google Scholar CiteSeer X DBLP Database
View PDF grounded language learningAbstract: Grounded language learning, the task of mapping from natural language to a representation of meaning, has attracted more and more interest in recent years. In most work on this topic, however, utterances in a conversation are treated independently and discourse structure information is largely ignored. In the context of language acquisition, this independence assumption discards cues that are important to the learner, e.g., the fact that consecutive utterances are likely to share the same referent (Frank et al., 2013). The current paper describes an approach to the problem of simultaneously modeling grounded language at the sentence and discourse levels. We combine ideas from parsing and grammar induction to produce a parser that can handle long input strings with thousands of tokens, creating parse trees that represent full discourses. By casting grounded language learning as a grammatical inference task, we use our parser to extend the work of Johnson et al. (2012), investigating the importance of discourse continuity in children's language acquisition and its interaction with social cues. Our model boosts performance in a language acquisition task and yields good discourse segmentations compared with human annotators.
(no authors)
Google Scholar CiteSeer X DBLP Database
View PDF Chapter #Abstract: the framework of Rhetorical Structure Theory, we were able to create a large annotated resource with very high consistency, using a well-defined methodology and protocol. This resource is made publicly available through the Linguistic Data Consortium to enable researchers to develop empirically grounded, discoursespecific applications. Key words: discourse, corpus, annotation, rhetorical structure
Lynn Carlson, Daniel Marcu, and Mary Ellen Okurowski
Google Scholar CiteSeer X DBLP Database
View PDF Discourse Structure in Simultaneous Spoken TurkishAbstract: The current debate regarding the data structure necessary to represent discourse structure, specifically whether tree-structure is sufficient to represent discourse structure or not, is mainly focused on written text. This paper reviews some of the major claims about the structure in discourse and proposes an investigation of discourse structure for simultaneous spoken Turkish by focusing on tree-violations and exploring ways to explain them away by non-structural means.
Iin Demirahin
Google Scholar CiteSeer X DBLP Database
View PDF Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse AnalysisAbstract: We propose a novel approach for developing a two-stage document-level discourse parser. Our parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Conditional Random Fields: one for intrasentential parsing and the other for multi-sentential parsing. We present two approaches to combine these two stages of discourse parsing effectively. A set of empirical evaluations over two different datasets demonstrates that our discourse parser significantly outperforms the state-of-the-art, often by a wide margin.
Shafiq Joty Giuseppe Carenini, Raymond T. Ng, Yashar Mehdad
Google Scholar CiteSeer X DBLP Database
1-20 of 4050: 12345...203


1947 users, 670 channels, 349 resources, 56080 items