Query: "dirichlet process" or "dirichlet processes"

Status: *updated [Success]*

Parallel Markov Chain Monte Carlo for Nonparametric Mixture Models**Abstract:** Nonparametric mixture models based on the Dirichlet process are an elegant alternative to finite models when the number of underlying components is unknown, but inference in such models can be slow. Existing attempts to parallelize inference in such models have relied on introducing approximations, which can lead to inaccuracies in the posterior estimate. In this paper, we describe auxiliary variable representations for the Dirichlet process and the hierarchical Dirichlet process that allow us to perform MCMC using the correct equilibrium distribution, in a distributed manner. We show that our approach allows scalable inference without the deterioration in estimate quality that accompanies existing methods.

Pitfalls in the use of Parallel Inference for the Dirichlet Process**Abstract:** Recent work done byLovell, Adams, and Mansingka [2012]andWilliamson, Dubey, and Xing [2013]has suggested an alternative parametrisation for the Dirichlet process in order to derive non-approximate parallel MCMC inference for it. This approach to parallelisation has been picked-up and implemented in several different fields[Chahuneau et al., 2013, Pan et al., 2013].In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data. We characterise the requirements of efficient parallel inference for the Dirichlet process and show that the proposed inference fails most of these conditions (while approximate approaches often satisfy most of them). We present both theoretical and experimental evidence of this, analysing the load balance for the inference showing that it is independent of the size of the dataset and the number of nodes available in the parallel implementation, and end with preliminary suggestions of alternative paths of research for efficient non-approximate parallel inference for the Dirichlet process.

Variance Reduction for Stochastic Variational Inference**Abstract:** (no abstract)

Parallel Markov Chain Monte Carlo for Nonparametric Mixture Models: Supplementary Material**Abstract:** (no abstract)

Dependent Hierarchical Normalized Random Measures for Dynamic Topic Modeling**Abstract:** We develop dependent hierarchical normalized random measures and apply them to dynamic topic modeling. The dependency arises via superposition , subsampling and point transition on the underlying Poisson processes of these measures. The measures used include normalised generalised Gamma processes that demonstrate power law properties, unlike Dirichlet processes used previously in dynamic topic modeling. Inference for the model includes adapting a recently developed slice sampler to directly manipulate the underlying Poisson process. Experiments performed on news, blogs, academic and Twitter collections demonstrate the technique gives superior perplexity over a number of previous models.

A simple example of Dirichlet process mixture inconsistency for the number of components**Abstract:** For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures (DPMs) not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of clusters -- that is, the posterior on the number of components represented in the observed data. However, it turns out that this posterior is not consistent -- it does not concentrate at the true number of components. In this note, we give an elementary proof of this inconsistency in what is perhaps the simplest possible setting: a DPM with normal components of unit variance, applied to data from a "mixture" with one standard normal component. Further, we show that this example exhibits severe inconsistency: instead of going to 1, the posterior probability that there is one cluster converges (in probability) to 0.

Dependent Normalized Random Measures**Abstract:** In this paper we propose two constructions of dependent normalized random measures , a class of nonparametric priors over dependent probability measures. Our constructions, which we call mixed normalized random measures (MNRM) and thinned normalized random measures (TNRM), involve (respectively) weighting and thinning parts of a shared underlying Poisson process before combining them together. We show that both MNRM and TNRM are marginally normalized random measures, resulting in well understood theoretical properties. We develop marginal and slice samplers for both models, the latter necessary for inference in TNRM. In time-varying topic modeling experiments, both models exhibit superior performance over related dependent models such as the hierarchical Dirichlet process and the spatial normalized Gamma process.

Factorial Multi-Task Learning : A Bayesian Nonparametric Approach**Abstract:** Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are un-related and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. To realize our framework, we introduce a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real datasets show the superiority of our model to other recent multi-task learning methods.

Nonparametric Mixture of Gaussian Processes with Constraints**Abstract:** Motivated by the need to identify new and clinically relevant categories of lung disease, we propose a novel clustering with constraints method using a Dirichlet process mixture of Gaussian processes in a variational Bayesian nonparametric framework. We claim that individuals should be grouped according to biological and/or genetic similarity regardless of their level of disease severity; therefore, we introduce a new way of looking at subtyping/clustering by recasting it in terms of discovering associations of individuals to disease trajectories (i.e., grouping individuals based on their similarity in response to environmental and/or disease causing variables). The nonparametric nature of our algorithm allows for learning the unknown number of meaningful trajectories. Additionally, we acknowledge the usefulness of expert guidance by providing for their input using must-link and cannot-link constraints. These constraints are encoded with Markov random fields. We also provide an efficient variational approach for performing inference on our model.

Nested Chinese Restaurant Franchise Processes: Applications to User Tracking and Document Modeling**Abstract:** Much natural data is hierarchical in nature. Moreover, this hierarchy is often shared between different instances. We introduce the nested Chinese Restaurant Franchise Process to obtain both hierarchical tree-structured representations for objects, akin to (but more general than) the nested Chinese Restaurant Process while sharing their structure akin to the Hierarchical Dirichlet Process. Moreover, by decoupling the structure generating part of the process from the components responsible for the observations, we are able to apply the same statistical approach to a variety of user generated data. In particular, we model the joint distribution of microblogs and locations for Twitter for users. This leads to a 40% reduction in location uncertainty relative to the best previously published results. Moreover, we model documents from the NIPS papers dataset, obtaining excellent perplexity relative to (hierarchical) Pachinko allocation and LDA.

Online Latent Dirichlet Allocation with Infinite Vocabulary**Abstract:** Topic models based on latent Dirichlet allocation (LDA) assume a predefined vocabulary. This is reasonable in batch settings but not reasonable for streaming and online settings. To address this lacuna, we extend LDA by drawing topics from a Dirichlet process whose base distribution is a distribution over all strings rather than from a finite Dirichlet. We develop inference using online variational inference and--to only consider a finite number of words for each topic--propose heuristics to dynamically order, expand, and contract the set of words we consider in our vocabulary. We show our model can successfully incorporate new words and that it performs better than topic models with finite vocabularies in evaluations of topic quality and classification performance.

An Autoregressive Approach to Nonparametric Hierarchical Dependent Modeling**Abstract:** We propose a conditional autoregression framework for a collection of random probability measures. Under this framework, we devise a conditional autoregressive Dirichlet process (DP) that we call one-parameter dependent DP ( DDP). The appealing properties of this specification are that it has two equivalent representations and its inference can be implemented in a conditional Polya urn scheme. Moreover, these two representations bear a resemblance to the Polya urn scheme and the stick-breaking representation in the conventional DP. We apply this DDP to Bayesian multivariate-response regression problems. An efficient Markov chain Monte Carlo algorithm is developed for Bayesian computation and prediction.

Parameter Estimation for LDA-Frames**Abstract:** LDA-frames is an unsupervised approach for identifying semantic frames from semantically unlabeled text corpora, and seems to be a useful competitor for manually created databases of selectional preferences. The most limiting property of the algorithm is such that the number of frames and roles must be predefined. In this paper we present a modification of the LDA-frames algorithm allowing the number of frames and roles to be determined automatically, based on the character and size of training data.

The Discrete Infinite Logistic Normal Distribution for Mixed-Membership Modeling**Abstract:** We present the discrete infinite logistic normal distribution (DILN, "Dylan"), a Bayesian non-parametric prior for mixed membership models. DILN is a generalization of the hierarchical Dirichlet process (HDP) that models correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables, and study its statistical properties. We consider applications to topic modeling and derive a variational Bayes algorithm for approximate posterior inference. We study the empirical performance of the DILN topic model on four corpora, comparing performance with the HDP and the correlated topic model.

Online Variational Inference for the Hierarchical Dirichlet Process**Abstract:** The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric model that can be used to model mixed-membership data with a potentially infinite number of components. It has been applied widely in probabilistic topic modeling, where the data are documents and the components are distributions of terms that reflect recurring patterns (or "topics") in the collection. Given a document collection, posterior inference is used to determine the number of topics needed and to characterize their distributions. One limitation of HDP analysis is that existing posterior inference algorithms require multiple passes through all the data--these algorithms are intractable for very large scale applications. We propose an on-line variational inference algorithm for the HDP, an algorithm that is easily applicable to massive and streaming data. Our algorithm is significantly faster than traditional inference algorithms for the HDP, and lets us analyze much larger data sets. We illustrate the approach on two large collections of text, showing improved performance over on-line LDA, the finite counterpart to the HDP topic model.

Dirichlet Process with Mixed Random Measures: A Nonparametric Topic Model for Labeled Data**Abstract:** We describe a nonparametric topic model for labeled data. The model uses a mixture of random measures (MRM) as a base distribution of the Dirichlet process (DP) of the HDP framework, so we call it the DPMRM. To model labeled data, we define a DP distributed random measure for each label, and the resulting model generates an unbounded number of topics for each label. We apply DP-MRM on single-labeled and multi-labeled corpora of documents and compare the performance on label prediction with MedLDA, LDA-SVM, and Labeled-LDA. We further enhance the model by incorporating ddCRP and modeling multi-labeled images for image segmentation and object labeling, comparing the performance with nCuts and rddCRP.

A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling**Abstract:** Driven by the multi-level structure of human intracranial electroencephalogram (iEEG) recordings of epileptic seizures, we introduce a new variant of a hierarchical Dirichlet Process--the multi-level clustering hierarchical Dirichlet Process (MLC-HDP)--that simultaneously clusters datasets on multiple levels. Our seizure dataset contains brain activity recorded in typically more than a hundred individual channels for each seizure of each patient. The MLC-HDP model clusters over channels-types, seizure-types, and patient-types simultaneously. We describe this model and its implementation in detail. We also present the results of a simulation study comparing the MLC-HDP to a similar model, the Nested Dirichlet Process and finally demonstrate the MLC-HDP's use in modeling seizures across multiple patients. We find the MLC-HDP's clustering to be comparable to independent human physician clusterings. To our knowledge, the MLCHDP model is the first in the epilepsy literature capable of clustering seizures within and between patients.

Coupling Nonparametric Mixtures via Latent Dirichlet Processes**Abstract:** Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more flexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling.

Variational Inference for Adaptor Grammars**Abstract:** Adaptor grammars extend probabilistic context-free grammars to define prior distributions over trees with "rich get richer" dynamics. Inference for adaptor grammars seeks to find parse trees for raw text. This paper describes a variational inference algorithm for adaptor grammars, providing an alternative to Markov chain Monte Carlo methods. To derive this method, we develop a stick-breaking representation of adaptor grammars, a representation that enables us to define adaptor grammars with recursion. We report experimental results on a word segmentation task, showing that variational inference performs comparably to MCMC. Further, we show a significant speed-up when parallelizing the algorithm. Finally, we report promising results for a new application for adaptor grammars, dependency grammar induction.