FREE ELECTRONIC LIBRARY - Online materials, documents

Pages:   || 2 | 3 | 4 |


-- [ Page 1 ] --

Under review as a conference paper at ICLR 2016



Zachary C. Lipton ∗ Sharad Vikram †

Computer Science and Engineering Computer Science and Engineering

University of California, San Diego University of California, San Diego

La Jolla, CA 92093, USA La Jolla, CA 92093, USA zlipton@cs.ucsd.edu svikram@cs.ucsd.edu Julian McAuley ‡ Computer Science and Engineering University of California, San Diego La Jolla, CA 92093, USA jmcauley@cs.ucsd.edu ABSTRACT We present a character-level recurrent neural network that generates relevant and coherent text given auxiliary information such as a sentiment or topic.1 Using a simple input replication strategy, we preserve the signal of auxiliary input across wider sequence intervals than can feasibly be trained by back-propagation through time. Our main results center on a large corpus of 1.5 million beer reviews from BeerAdvocate. In generative mode, our network produces reviews on command, tailored to a star rating or item category. The generative model can also run in reverse, performing classification with surprising accuracy. Performance of the reverse model provides a straightforward way to determine what the generative model knows without relying too heavily on subjective analysis. Given a review, the model can accurately determine the corresponding rating and infer the beer’s category (IPA, Stout, etc.). We exploit this capability, tracking perceived sentiment and class membership as each character in a review is processed. Quantitative and qualitative empirical evaluations demonstrate that the model captures meaning and learns nonlinear dynamics in text, such as the effect of negation on sentiment, despite possessing no a priori notion of words. Because the model operates at the character level, it handles misspellings, slang, and large vocabularies without any machinery explicitly dedicated to the purpose.

1 INTRODUCTION Our work is motivated by an interest in product recommendation. Currently, recommender systems assist users in navigating an unprecedented selection of items, personalizing services to a diverse set of users with distinct individual tastes. Typical approaches surface items that a customer is likely to purchase or rate highly, providing a basic set of primitives for building functioning internet applications. Our goal is to create richer user experiences, not only recommending products but generating descriptive text. For example, engaged users may wish to know what precisely their impression of an item is expected to be, not simply whether the item will warrant a thumbs up or thumbs down. Consumer reviews can address this issue to some extent, but large volumes of reviews are difficult to sift through, especially if a user is interested in some niche aspect. Our fundamental goal is to resolve this issue by building systems that can both generate contextually appropriate descriptions and infer items from



∗ Author website: http://zacklipton.com † Author website: http://www.sharadvikram.com ‡ Author website: http://cseweb.ucsd.edu/∼jmcauley/ Live web demonstration of rating and category-based review generation (http://deepx.ucsd.edu/beermind) Figure 1: Our generative model runs in reverse, inferring ratings and categories given reviews without any a priori notion of words.

Character-level Recurrent Neural Networks (RNNs) have a remarkable ability to generate coherent text (Sutskever et al., 2011), appearing to hallucinate passages that plausibly resemble a training corpus. In contrast to word-level models, they do not suffer from computational costs that scale with the size of the input or output vocabularies. This property is alluring, as product reviews draw upon an enormous vocabulary. Our work focuses on reviews scraped from Beer Advocate (McAuley and Leskovec, 2013). This corpus contains over 60,000 distinct product names alone, in addition to standard vocabulary, slang, jargon, punctuation, and misspellings.

Character-level LSTMs powerfully demonstrate the ability of RNNs to model sequences on multiple time scales simultaneously, i.e., they learn to form words, to form sentences, to generate paragraphs of appropriate length, etc. To our knowledge, all previous character-level generative models are unsupervised. However, our goal is to generate character-level text in a supervised fashion, conditioning upon auxiliary input such as an item’s rating or category2. Such conditioning of sequential output has been performed successfully with word-level models, for tasks including machine translation (Sutskever et al., 2014), image captioning (Vinyals et al., 2015; Karpathy and Fei-Fei, 2014;

Mao et al., 2014), and even video captioning (Venugopalan et al., 2014). However, despite the aforementioned virtues of character-level models, no prior work, to our knowledge, has successfully trained them in such a supervised fashion.

Most supervised approaches to word-level generative text models follow the encoder-decoder approach popularized by Sutskever et al. (2014). Some auxiliary input, which might be a sentence or an image, is encoded by an encoder model as a fixed-length vector. This vector becomes the initial input to a decoder model, which then outputs at each sequence step a probability distribution predicting the next word. During training, weights are updated to give high likelihood to the sequences encountered in the training data. When generating output, words are sampled from each predicted distribution and passed as input at the subsequent sequence step. This approach successfully produces coherent and relevant sentences, but is generally limited to generating sentences (e.g. typically less than 10 words in length), as the model gradually ‘forgets’ the auxiliary input.

However, to model longer passages of text (such as reviews), and to do so at the character level, we must produce much longer sequences than seem practically trainable with an encoder-decoder approach. To overcome these challenges, we present an alternative modeling strategy. At each sequence step t, we concatenate the auxiliary input vector xaux with the character representation (t) (t) xchar, using the resulting vector x to train an otherwise standard generative RNN model. It might seem redundant to replicate xaux at each sequence step, but by providing it, we eliminate pressure on the model to memorize it. Instead, all computation can focus on modeling the text and its interaction with the auxiliary input.

In this paper, we implement the concatenated input model, demonstrating its efficacy at both review generation and traditional supervised learning tasks. In generative mode, our model produces We use auxiliary input to differentiate the “context” input from the character representation passed in at each sequence step. By supervised, we mean the output sequence depends upon some auxiliary input.

Under review as a conference paper at ICLR 2016

convincing reviews, tailored to a star rating and category. We present a live web demonstration of this capability (http://deepx.ucsd.edu/beermind). This generative model can also run in reverse, performing classification with surprising accuracy (Figure 1). The purpose of this model is to generate text, but we find that classification accuracy of the reverse model provides an objective way to assess what the model has learned. An empirical evaluation shows that our model can accurately classify previously unseen reviews as positive or negative and determine which of 5 beer categories is being described, despite operating at the character level and not being optimized directly to minimize classification error. Our exploratory analysis also reveals that the model implicitly learns a large vocabulary and can effectively model nonlinear dynamics, like the effect of negation. Plotting the inferred rating as each character is encountered for many sentences (Figure 1) shows that the model infers ratings quickly and anticipates words after reading particularly informative characters.


We focus on data scraped from Beer Advocate as originally collected and described by McAuley and Leskovec (2013). Beer Advocate is a large online review community boasting 1,586,614 reviews of 66,051 distinct items composed by 33,387 users. Each review is accompanied by a number of numerical ratings, corresponding to “appearance”, “aroma”, “palate”, “taste”, and also the user’s “overall” impression. The reviews are also annotated with the item’s category. For our experiments on ratings-based generation and classification, we select 250,000 reviews for training, focusing on the most active users and popular items. For our experiments focusing on generating reviews conditioned on item category, we select a subset of 150,000 reviews, 30,000 each from 5 of the top categories, namely “American IPA”, “Russian Imperial Stout”, “American Porter”, “Fruit/Vegetable Beer”, and “American Adjunct Lager”. From both datasets, we hold out 10% of reviews for testing.


–  –  –

Figure 2: (a) Standard generative RNN; (b) encoder-decoder RNN; (c) concatenated input RNN.


Before introducing our contributions, we review the generative RNN model of Sutskever et al. (2011;

2014) on which we build. A generative RNN is trained to predict the next token in a sequence, i.e. y t = x(t+1), given all inputs to that point (x1,..., xt ). Thus input and output strings are equivaˆ lent but for a one token shift (Figure 2a). The output layer is fully connected with softmax activation, ensuring that outputs specify a distribution. Cross entropy is the loss function during training.

Once trained, the model is run in generative mode by sampling stochastically from the distribution output at each sequence step, given some starting token and state. Passing the sampled output as the subsequent input, we generate another output conditioned on the first prediction, and can continue in this manner to produce arbitrarily long sequences. Sampling can be done directly according to softmax outputs, but it is also common to sharpen the distribution by setting a temperature ≤ 1, analogous to the so-named parameter in a Boltzmann distribution. Applied to text, generative models trained in this fashion produce surprisingly coherent passages that appear to reflect the characteristics of the training corpus. They can also be used to continue passages given some starting tokens.


Our goal is to generate text in a supervised fashion, conditioned on an auxiliary input xaux. This has been done at the word-level with encoder-decoder models (Figure 2b), in which the auxiliary input is encoded and passed as the initial state to a decoder, which then must preserve this input signal

Under review as a conference paper at ICLR 2016

across many sequence steps (Sutskever et al., 2014; Karpathy and Fei-Fei, 2014). Such models have successfully produced (short) image captions, but seem impractical for generating full reviews at the character level because signal from xaux must survive for hundreds of sequence steps.

We take inspiration from an analogy to human text generation. Consider that given a topic and told to speak at length, a human might be apt to meander and ramble. But given a subject to stare at, it is far easier to remain focused. The value of re-iterating high-level material is borne out in one study, Surber and Schroeder (2007), which showed that repetitive subject headings in textbooks resulted in faster learning, less rereading and more accurate answers to high-level questions.

Thus we propose a simple architecture in which input xaux is concatenated with the character rept) (t) (t) resentation xchar. Given this new input x = [xchar ; xaux ] we can train the model precisely as with the standard generative RNN (Figure 2c). At train time, xaux is a feature of the training set.

At predict time, we fix some xaux, concatenating it with each character sampled from y (t). One ˆ might reasonably note that this replicated input information is redundant. However, since it is fixed over the course of the review, we see no reason to require the model to transmit this signal across hundreds of time steps. By replicating xaux at each input, we free the model to focus on learning the complex interaction between the auxiliary input and language, rather than memorizing the input.


Models with even modestly sized auxiliary input representations are considerably harder to train than a typical unsupervised character model. To overcome this problem, we first train a character model to convergence. Then we transplant these weights into a concatenated input model, initializing the extra weights (between the input layer and the first hidden layer) to zero. Zero initialization is not problematic here because symmetry in the hidden layers is already broken. Thus we guarantee that the model will achieve a strictly lower loss than a character model, saving (days of) repeated training. This scheme bears some resemblance to the pre-training common in the computer vision community (Yosinski et al., 2014). Here, instead of new output weights, we train new input weights.


Many common document classification models, like tf-idf logistic regression, maximize the likelihood of the training labels given the text. Given our generative model, we can then produce a predictor by reversing the order of inference, that is by maximizing the likelihood of the text, given a classification. The relationship between these two tasks (P (xaux |Review) and P (Review|xaux )) follows from Bayes’ rule. That is, our model predicts the conditional probability P (Review|xaux ) of an entire review given some xaux (such as a star rating). The normalizing term can be disregarded in determining the most probable rating and when the classes are balanced, as they are in our test cases, the prior also vanishes from the decision rule leaving P (xaux |Review) ∝ P (Review|xaux ).


Pages:   || 2 | 3 | 4 |

Similar works:

«The Presence of the Jungian Archetype of Rebirth in Steven Erikson’s Gardens of the Moon Emil Wallner ENGK01 Bachelor’s thesis in English Literature Spring semester 2014 Center for Languages and Literature Lund University Supervisor: Ellen Turner Table of Contents Introduction The Collective Unconscious and the Archetype of Rebirth Renovatio (Renewal) – Kellanved and Dancer, and K’rul Resurrection – Rigga, Hairlock, Paran, and Tattersail Metempsychosis – Tattersail and Silverfox The...»

«Credit Opinion: Alfa-Bank Global Credit Research 14 Mar 2016 Moscow, Russia Ratings Category Moody's Rating Rating(s) Under Outlook Review Bank Deposits *Ba2/NP Baseline Credit Assessment **ba3 Adjusted Baseline Credit **ba3 Assessment Counterparty Risk Assessment *Ba1(cr)/NP(cr) Senior Unsecured **Ba2 Subordinate *B2 * Rating(s) within this class was/were placed on review on March 9, 2016 ** Placed under review for possible downgrade on March 9, 2016 Contacts Analyst Phone Irakli Pipia/London...»

«Loughborough University Institutional Repository Reference recalibration repairs: adjusting the precision of formulations for the task at hand This item was submitted to Loughborough University's Institutional Repository by the/an author. Citation: LERNER, G.H.. et al., 2012. Reference recalibration repairs: adjusting the precision of formulations for the task at hand. Research on Language and Social Interaction, 45 (2), pp. 191-212.Additional Information: This is an Accepted Manuscript of an...»

«East Meon The parish of East Meon, is situated 8.1 kilometres west of Petersfield, 26.6 kilometres east of Winchester and 23.3 kilometres north east of Fareham. At one time the parish included Oxenbourn, Coomb, Ripling, Peak, Langrish, and Ramsdean. In 1894 about the same time these tithings were separated Langrish, Ramsdean and Bordean, became a separate parish called Langrish. East Meon is similar to that of West Meon in that the fertile land lies around the valley of the River Meon, whilst...»

«EMEA Issuer Profile 15 April 2015 Caisse des Dépôts et Consignations Rating Outlook Moody’s Aa1 Negative Jakub Lichwa Credit Research S&P AA Negative +44 20 7597 8466 Jakub.Lichwa@uk.daiwacm.com Fitch AA Stable Source: Moody’s, S&P and Fitch Background Caisse des Dépôts et Consignations (CDC) is a French public financial institution that performs public interest missions on behalf of the government, as well as engaging in long-term investment intended to support economic development in...»

«World Bank Editorial Style Guide November 7, 2005 ©2006 The International Bank for Reconstruction and Development / The World Bank 1818 H Street NW Washington DC 20433 Telephone: 202-473-1000 Internet: www.worldbank.org E-mail: feedback@worldbank.org Contents 1. Introduction 2. The Editorial Process Managing the Editorial Process Text Figures and Other Pictorial Elements Math Tables Editing Text Figures and Other Pictorial Elements Math Tables Proofreading 3. Parts of the Book 4. Permissions...»

«Property of: _ Address: _ Phone #: _In case of emergency, please notify: Name: Phone #: _ The information in this book was the best available at press time. Watch for additional information and changes. ©2010 School Datebooks, Inc. All rights reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in any retrieval system, or translated in any form without the written permission of School Datebooks, Inc. 2880 U.S. Hwy. 231 S., Suite 200 • Lafayette, IN 47909...»

«NOJ / НОЖ: Nabokov Online Journal, Vol. III / 2009 Laurence Petit SPEAK, PHOTOGRAPHS? VISUAL TRANSPARENCY AND VERBAL OPACITY IN NABOKOV’S SPEAK, MEMORY L ike many contemporary writers, Vladimir Nabokov in Speak, Memory explores the destabilizing interaction between visual and verbal codes in an autobiographical work combining text and photographic image. The originality of this book, however, is that it is not so much, as in so many other postmodern works, the supposedly truthful...»

«A/S-21/2 United Nations Report of the Commission on Population and Development acting as the preparatory committee for the twenty-first special session of the General Assembly General Assembly Official Records Twenty-first special session Supplement No. 1 (A/S-21/2) General Assembly Official Records Twenty-first special session Supplement No. 1 (A/S-21/2) Report of the Commission on Population and Development acting as the preparatory committee for the twenty-first special session of the...»

«Page 1 of 5 EDGE SCORE CARD AND PILOT PROGRAMS FAQS SUMMARY OF EDGEAPPROVED PILOTS 2011Present Total Jobs 10,002 Average Wage $76,432 Investment $2,100,400,124 MWBE/LOSB Target Spending $311,242,866 Total New Local Tax Revenue Generated $741,717,190 Total Tax Abated $283,908,228 Benefit-to-Cost Ratio (New Tax/Tax Abated) 2.61 Incentive Amount Per Job/Per Year $3,213 Average PILOT Term 9 years Number of PILOTs Approved 43 What is EDGE? EDGE was created by a Joint Resolution of the Memphis City...»

«Twenty Critical Controls for Effective Cyber Defense: Consensus Audit Guidelines Version 2.1: August 10, 2009 Update: Added NIST SP 800-53 Revision 3 mapping to each control, and updated appendix to include each area of direct mapping between 20 Critical Controls and 800-53 Rev 3 Priority 1 controls.INTRODUCTION Securing our nation against cyber attacks has become one of the nation’s highest priorities. To achieve this objective, networks, systems, and the operations teams that support them...»

«ANNUAL REPORT ON THE MANAGEMENT OF STORMWATER IMPAIRED WATERS IN VERMONT STATE FISCAL YEAR 2013 {Title 10 VSA, Section 1264(f); Act 43 of the Acts of 2007, as amended by Act 130 of the Acts of 2008} Submitted to the Vermont General Assembly By: Department of Environmental Conservation Agency of Natural Resources January, 2013 Seventeen streams in Vermont are principally impaired due to the effects of stormwater runoff (Figure 1). Pursuant to Act 140 (2004), and the federal Clean Water Act, the...»

<<  HOME   |    CONTACTS
2017 www.thesis.dislib.info - Online materials, documents

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.