Braque: Conditional Random Fields
Items for query: CRF or CRFs or "conditional random field" or "conditional random fields" on Braque
08/07/2014, 02:22 PM en-us Scalable Gaussian Process Structured Prediction for Grid Factor Graph Applications (Sebastien Bratieres Novi Quadrianto Sebastian Nowozin Zoubin Ghahramani) UPTM1XLH 05/21/2014, 02:41 PM [<a href=>read later</a>] Structured prediction is an important and wellstudied problem with many applications across machine learning. GPstruct is a recently proposed structured prediction model that offers appealing properties such as being kernelised, non-parametric, and supporting Bayesian inference( Brati`eres et al. , 2013 ).The model places a Gaussian process prior over energy functions which describe relationships between input variables and structured output variables. However, the memory demand of GPstruct is quadratic in the number of latent variables and training runtime scales cubically. This prevents GPstruct from being applied to problems involving grid factor graphs, which are prevalent in computer vision and spatial statistics applications. Here we explore a scalable approach to learning GPstruct models based on ensemble learning, with weak learners (predictors) trained on subsets of the latent variables and bootstrap data, which can easily be distributed. We show experiments with 4 M latent variables on image segmentation. Our method outperforms widely-used conditional random field models trained with pseudo-likelihood. Moreover, in image segmentation problems it improves over recent state-of-the-art marginal optimisation methods in terms of predictive performance and uncertainty calibration. Finally, it generalises well on all training set sizes. Proceedings of the 31 st International Conference on Machine Learning , Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s). Supplementary Material-- Scalable Gaussian Process Structured Prediction for Grid Factor Graph Applications (1 Algorithm complexity) S5DR34OO 05/21/2014, 02:41 PM [<a href=>read later</a>] Grounded Compositional Semantics for Finding and Describing Images with Sentences (Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, Andrew Y. Ng) MOVHEWVN 04/19/2014, 03:34 AM [<a href=>read later</a>] Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images. However, the sentence vectors of previous models cannot accurately represent visually grounded meaning. We introduce the DTRNN model which uses dependency trees to embed sentences into a vector space in order to retrieve images that are described by those sentences. Unlike previous RNN-based models which use constituency trees, DT-RNNs naturally focus on the action and agents in a sentence. They are better able to abstract from the details of word order and syntactic expression. DT-RNNs outperform other recursive and recurrent neural networks, kernelized CCA and a bag-of-words baseline on the tasks of finding an image that fits a sentence description and vice versa. They also give more similar representations to sentences that describe the same image. Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint (Ji Liu Ryohei Fujimaki Jieping Ye) SKTB2BBQ 04/19/2014, 02:27 AM [<a href=>read later</a>] We consider forward-backward greedy algorithms for solving sparse feature selection problems with general convex smooth functions. A state-of-the-art greedy method, the ForwardBackward greedy algorithm (FoBa-obj) requires to solve a large number of optimization problems, thus it is not scalable for large-size problems. The FoBa-gdt algorithm, which uses the gradient information for feature selection at each forward iteration, significantly improves the efficiency of FoBa-obj. In this paper, we systematically analyze the theoretical properties of both algorithms. Our main contributions are: 1) We derive better theoretical bounds than existing analyses regarding FoBa-obj for general smooth convex functions; 2) We show that FoBagdt achieves the same theoretical performance as FoBa-obj under the same condition: restricted strong convexity condition. Our new bounds are consistent with the bounds of a special case (least squares) and fills a previously existing theoretical gap for general convex smooth functions; 3) We show that the restricted strong convexity condition is satisfied if the number of independent samples is more than k log d where k is the sparsity number and d is the dimension of the variable; 4) We apply FoBa-gdt (with the conditional random field objective) to the sensor selection problem for human indoor activity recognition and our results show that FoBa-gdt outperforms other methods based on forward greedy selection Proceedings of the 31 st International Conference on Machine Learning , Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s). and L1-regularization. Learning Appearance Manifolds from Video (Ali Rahimi MIT CS and AI Lab, Ben Recht MIT Media Lab, Trevor Darrell MIT CS and AI Lab,) 2JEKKPFV 04/18/2014, 09:55 PM [<a href=>read later</a>] The appearance of dynamic scenes is often largely governed by a latent low-dimensional dynamic process. We show how to learn a mapping from video frames to this lowdimensional representation by exploiting the temporal coherence between frames and supervision from a user. This function maps the frames of the video to a low-dimensional sequence that evolves according to Markovian dynamics. This ensures that the recovered low-dimensional sequence represents a physically meaningful process. We relate our algorithm to manifold learning, semi-supervised learning, and system identification, and demonstrate it on the tasks of tracking 3D rigid objects, deformable bodies, and articulated bodies. We also show how to use the inverse of this mapping to manipulate video. Towards a Unified Architecture for in-RDBMS Analytics (Xixuan Feng Arun Kumar Benjamin Recht Christopher R) 514PNHIX 04/18/2014, 09:54 PM [<a href=>read later</a>] The increasing use of statistical data analysis in enterprise applications has created an arms race among database vendors to offer ever more sophisticated in-database analytics. One challenge in this race is that each new statistical technique must be implemented from scratch in the RDBMS, which leads to a lengthy and complex development process. We argue that the root cause for this overhead is the lack of a unified architecture for in-database analytics. Our main contribution in this work is to take a step towards such a unified architecture. A key benefit of our unified architecture is that performance optimizations for analytics techniques can be studied generically instead of an ad hoc, per-technique fashion. In particular, our technical contributions are theoretical and empirical studies of two key factors that we found impact performance: the order data is stored, and parallelization of computations on a single-node multicore RDBMS. We demonstrate the feasibility of our architecture by integrating several popular analytics techniques into two commercial and one open-source RDBMS. Our architecture requires changes to only a few dozen lines of code to integrate a new statistical technique. We then compare our approach with the native analytics tools offered by the commercial RDBMSes on various analytics tasks, and validate that our approach achieves competitive or higher performance, while still achieving the same quality. YouTubeCat: Learning to Categorize Wild Web Videos (Zheshen Wang , Ming Zhao , Yang Song , Sanjiv Kumar , and Baoxin Li) E1CROBOZ 04/18/2014, 07:14 PM [<a href=>read later</a>] Automatic categorization of videos in a Web-scale un-constrained collection such as YouTube is a challenging task. A key issue is how to build an effective training set in the presence of missing, sparse or noisy labels. We propose to achieve this by first manually creating a small labeled set and then extending it using additional sources such as related videos, searched videos, and text-based webpages. The data from such disparate sources has different properties and labeling quality, and thus fusing them in a coherent fashion is another practical challenge. We propose a fusion framework in which each data source is first combined with the manually-labeled set independently. Then, using the hierarchical taxonomy of the categories, a Conditional Random Field (CRF) based fusion strategy is designed. Based on the final fused classifier, category labels are predicted for the new videos. Extensive experiments on about 80K videos from 29 most frequent categories in YouTube show the effectiveness of the proposed method for categorizing large-scale wild Web videos 1 . Multiclass Latent Locally Linear Support Vector Machines (Marco Fornoni Barbara Caputo Francesco Orabona) AZAFJXIP 04/18/2014, 01:57 PM [<a href=>read later</a>] Kernelized Support Vector Machines (SVM) have gained the status of off-the-shelf classifiers, able to deliver state of the art performance on almost any problem. Still, their practical use is constrained by their computational and memory complexity, which grows super-linearly with the number of training samples. In order to retain the low training and testing complexity of linear classifiers and the flexibility of non linear ones, a growing, promising alternative is represented by methods that learn non-linear classifiers through local combinations of linear ones. In this paper we propose a new multi class local classifier, based on a latent SVM formulation. The proposed classifier makes use of a set of linear models that are linearly combined using sample and class specific weights. Thanks to the latent formulation, the combination coefficients are modeled as latent variables. We allow soft combinations and we provide a closed-form solution for their estimation, resulting in an efficient prediction rule. This novel formulation allows to learn in a principled way the sample specific weights and the linear classifiers, in a unique optimization problem, using a CCCP optimization procedure. Extensive experiments on ten standard UCI machine learning datasets, one large binary dataset, three character and digit recognition databases, and a visual place categorization dataset show the power of the proposed approach. Probabilistic Label Trees for Efficient Large Scale Image Classification (Baoyuan Liu, Fereshteh Sadeghi, Marshall Tappen Ohad Shamir, Ce Liu) MTW2TQTT 04/18/2014, 01:23 PM [<a href=>read later</a>] Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classification with the traversal of the tree so that complexity grows logarithmically. In this paper, we show how the parameters of the label tree can be found using maximum likelihood estimation. This new probabilistic learning technique produces a label tree with significantly improved recognition accuracy. Stochastic Dual Coordinate Ascent Methods for Regularized Loss (Shai Shalev-Shwartz Tong Zhang) 5IOVIARB 04/18/2014, 09:23 AM [<a href=>read later</a>] Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised machine learning optimization problems such as SVM, due to their strong theoretical guarantees. While the closely related Dual Coordinate Ascent (DCA) method has been implemented in various software packages, it has so far lacked good convergence analysis. This paper presents a new analysis of Stochastic Dual Coordinate Ascent (SDCA) showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD. This analysis justifies the effectiveness of SDCA for practical applications. Keywords: stochastic dual coordinate ascent, optimization, computational complexity, regularized loss minimization, support vector machines, ridge regression, logistic regression Event Extraction Using Distant Supervision (Kevin Reschke , Martin Jankowiak , Mihai Surdeanu Christopher D. Manning , Daniel Jurafsky) HLC25CTY 04/11/2014, 08:52 PM [<a href=>read later</a>] Distant supervision is a successful paradigm that gathers training data for information extraction systems by automatically aligning vast databases of facts with text. Previous work has demonstrated its usefulness for the extraction of binary relations such as a person's employer or a film's director. Here, we extend the distant supervision approach to template-based event extraction, focusing on the extraction of passenger counts, aircraft types, and other facts concerning airplane crash events. We present a new publicly available dataset and event extraction task in the plane crash domain based on Wikipedia infoboxes and newswire text. Using this dataset, we conduct a preliminary evaluation of four distantly supervised extraction models which assign named entity mentions in text to entries in the event template. Our results indicate that joint inference over sequences of candidate entity mentions is beneficial. Furthermore, we demonstrate that the S EARN algorithm outperforms a linear-chain CRF and strong baselines with local inference. Keywords: Distant-Supervision, Event-Extraction, Searn Optimal Decisions from Probabilistic Models: the Intersection-over-Union Case (Sebastian Nowozin) HTPGMW1T 04/09/2014, 02:40 PM [<a href=>read later</a>] A probabilistic model allows us to reason about the world and make statistically optimal decisions using Bayesian decision theory. However, in practice the intractability of the decision problem forces us to adopt simplistic loss functions such as the 0/1 loss or Hamming loss and as result we make poor decisions through MAP estimates or through low-order marginal statistics. In this work we investigate optimal decision making for more realistic loss functions. Specifically we consider the popular intersection-over-union (IoU) score used in image segmentation benchmarks and show that it results in a hard combinatorial decision problem. To make this problem tractable we propose a statistical approximation to the objective function, as well as an approximate algorithm based on parametric linear programming. We apply the algorithm on three benchmark datasets and obtain improved intersection-over-union scores compared to maximum-posterior-marginal decisions. Our work points out the difficulties of using realistic loss functions with probabilistic computer vision models. Language Resource Addition: Dictionary or Corpus? (Shinsuke Mori, Graham Neubig) N0UKMMU2 03/28/2014, 08:49 PM [<a href=>read later</a>] In this paper, we investigate the relative effect of two strategies of language resource additions to the word segmentation problem and partof-speech tagging problem in Japanese. The first strategy is adding entries to the dictionary and the second is adding annotated sentences to the training corpus. The experimental results showed that the annotated sentence addition to the training corpus is better than the entries addition to the dictionary. And the annotated sentence addition is efficient especially when we add new words with contexts of three real occurrences as partially annotated sentences. According to this knowledge, we executed annotation on the invention disclosure texts and observed word segmentation accuracy. Keywords: Partial annotation, Dictionary, Word segmentation, POS tagging Automatically enriching spoken corpora with syntactic information for linguistic studies (Alexis Nasr, Frederic Bechet, Benoit Favre, Thierry Bazillon, Jose Deulofeu, Andre Valli) U42HLHOR 03/28/2014, 06:47 PM [<a href=>read later</a>] Syntactic parsing of speech transcriptions faces the problem of the presence of disfluencies that break the syntactic structure of the utterances. We propose in this paper two solutions to this problem. The first one relies on a disfluencies predictor that detects disfluencies and removes them prior to parsing. The second one integrates the disfluencies in the syntactic structure of the utterances and train a disfluencies aware parser. A Lightweight and High Performance Monolingual Word Aligner (Xuchen Yao Benjamin Van Durme Chris Callison-Burch Peter Clark) MD4RIACU 03/28/2014, 05:52 PM [<a href=>read later</a>] Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system. Adaptivity of Averaged Stochastic Gradient Descent to Local Strong Convexity for Logistic Regression (Leon Bottou) U2N1POFR 03/21/2014, 05:53 PM [<a href=>read later</a>] In this paper, we consider supervised learning problems such as logistic regression and study the stochastic gradient method with averaging, in the usual stochastic approximation setting where observations are used only once. We show that after N iterations, with a constant step-size proportional to 1 /R 2 N where N is the number of observations and R is the maximum norm of the observations, the convergence rate is always of order O (1 / N ), and improves to O ( R 2 /N ) where is the lowest eigenvalue of the Hessian at the global optimum (when this eigenvalue is greater than R 2 / N ). Since does not need to be known in advance, this shows that averaged stochastic gradient is adaptive to unknown local strong convexity of the objective function. Our proof relies on the generalized selfconcordance properties of the logistic loss and thus extends to all generalized linear models with uniformly bounded features. Keywords: stochastic approximation, logistic regression, self-concordance RERANKED ALIGNERS FOR INTERACTIVE TRANSCRIPT CORRECTION (Benoit Favre, Mickael Rouvier, Frederic Bechet) 5S2KIL5M 03/07/2014, 03:14 AM [<a href=>read later</a>] Clarification dialogs can help address ASR errors in speech-tospeech translation systems and other interactive applications. We propose to use variants of Levenshtein alignment for merging an errorful utterance with a targeted rephrase of an error segment. ASR errors that might harm the alignment are addressed through phonetic matching, and a word embedding distance is used to account for the use of synonyms outside targeted segments. These features lead to a relative improvement of 30% of word error rate on sentences with ASR errors compared to not performing the clarification. Twice as many utterances are completely corrected compared to using basic word alignment. Furthermore, we generate a set of potential merges and train a neural network on crowd-sourced rephrases in order to select the best merger, leading to 24% more instances completely corrected. The system is deployed in the framework of the BOLT project. Index Terms -- Error correction, Dialog systems, ASR error detection, Reranking, Levenshtein alignment 1. INTRODUCTION Automatic Speech Recognition systems often generate imperfect transcripts mainly due to challenging acoustic conditions, out-ofvocabulary words or language ambiguities that could only be handled with world knowledge and long term context analysis. Even though there has been a large body of work on robust ASR [1, 2], advances in open-vocabulary speech recognition [3, 4] and long-range language modeling [5, 6, 7], ASR systems still make errors which can impact downstream applications. Interactive systems offer an opportunity for ASR errors to be detected and corrected through clarification dialogs. In closed-domain dialog systems, methods have been developed for explicit and implicit confirmation of user intent [8, 9], but they cannot be applied to open-domain speech because of lack of prior on the message to be understood. Nevertheless, recent efforts in confidence measure estimation [10, 11], OOV detection [12, 13, 14], error detection [15] and AMU subcontract to SRI Internationa ... RETRIEVING THE SYNTACTIC STRUCTURE OF ERRONEOUS ASR TRANSCRIPTIONS FOR OPEN-DOMAIN SPOKEN LANGUAGE UNDERSTANDING (Frederic Bechet, Benoit Favre, Alexis Nasr, Mathieu Morey) RC21J3WV 03/07/2014, 03:14 AM [<a href=>read later</a>] Retrieving the syntactic structure of erroneous ASR transcriptions can be of great interest for open-domain Spoken Language Understanding tasks in order to correct or at least reduce the impact of ASR errors on final applications. Most of the previous works on ASR and syntactic parsing have addressed this problem by using syntactic features during ASR to help reducing Word Error Rate (WER). The improvement obtained is often rather small, however the structure and the relations between words obtained through parsing can be of great interest for the SLU processes, even without a significant decrease of WER. That is why we adopt another point of view in this paper: considering that ASR transcriptions contain inevitably some errors, we show in this study that it is possible to improve the syntactic analysis of these erroneous transcriptions by performing a joint error detection / syntactic parsing process. The applicative framework used in this study is a speech-to-speech system developed through the DARPA BOLT project. Index Terms -- Automatic Speech Recognition, Spoken Language Understanding, Dependency Parsing, Confidence Measures 1. INTRODUCTION Open Domain Spoken Language Understanding aims at enriching spoken transcriptions with structural and semantic information without using a domain ontology limited to a specific application (e.g. flight booking). Such information includes finding sentence boundaries, parsing sentence and dialogue structure, spotting semantic concepts and predicate-argument entities, and distinguishing relevant from superfluous pieces of information. The recent generalization of speech technology to a large range of applications such as voice search, speech-to-speech translation, personal assistant or multimedia interfaces have highlighted the need for domain-independent SLU models that can process speech input and output structured representation of messages. These structured representations can be seen as an interface between the word transcriptions (1-best, n-best, word lattice) out ... Active Boundary Annotation using Random MAP Perturbations (Subhransu Maji Tamir Hazan Tommi Jaakkola) L3RKSKBR 02/27/2014, 08:47 PM [<a href=>read later</a>] We address the problem of efficiently annotating labels of objects when they are structured. Often the distribution over labels can be described using a joint potential function over the labels for which sampling is provably hard but efficient maximum a-posteriori (MAP) solvers exist. In this setting we develop novel entropy bounds that are based on the expected amount of perturbation to the potential function that is needed to change MAP decisions. By reasoning about the entropy reduction and cost tradeoff, our algorithm actively selects the next annotation task. As an example of our framework we propose a boundary refinement task which can used to obtain pixelaccurate image boundaries much faster than traditional tools by focussing on parts of the image for refinement in a multi-scale manner. Heterogeneous Networks and Their Applications: Scientometrics, Name Disambiguation, and Topic Modeling (Ben King, Rahul Jha Dragomir R. Radev) XHSIHE5O 02/14/2014, 12:46 AM [<a href=>read later</a>] We present heterogeneous networks as a way to unify lexical networks with relational data. We build a unified ACL Anthology network, tying together the citation, author collaboration, and term-cooccurence networks with affiliation and venue relations. This representation proves to be convenient and allows problems such as name disambiguation, topic modeling, and the measurement of scientific impact to be easily solved using only this network and off-the-shelf graph algorithms.