WWW.THESIS.DISLIB.INFO
FREE ELECTRONIC LIBRARY - Online materials, documents
 
<< HOME
CONTACTS



Pages:     | 1 || 3 | 4 |

«Under review as a conference paper at ICLR 2016 CAPTURING MEANING IN PRODUCT REVIEWS WITH CHARACTER-LEVEL GENERATIVE TEXT MODELS Zachary C. Lipton ...»

-- [ Page 2 ] --

All experiments are executed with a custom recurrent neural network library written in Python, using Theano (Bergstra et al.) for GPU acceleration. Our networks use 2 hidden layers with 1024 nodes per layer. During training, examples are processed in mini-batches and we update weights with RMSprop (Tieleman and Hinton, 2012). To assemble batches, we concatenate all reviews in the training set together, delimiting them with (STR) and (EOS) tokens. We split this string into mini-batches of size 256 and again split each mini-batch into segments with sequence length

200. Furthermore, LSTM state is preserved across batches during training. To combat exploding gradients, we clip the elements of each gradient at ± 5. We found that it was faster to first train the concatenated input model if we first trained an unsupervised character-level generative RNN to convergence. We then transplant weights from the unsupervised net to initialize the concatenatedinput RNN. We implement two nets in this fashion, one using the star rating scaled to [-1, 1] as xaux, and a second using a one-hot encoding of 5 beer categories as xaux.

Under review as a conference paper at ICLR 2016

Figure 3: (top) Probability of each category and (bottom) most likely star rating as each letter is encountered. The RNN learns Budweiser is a lager and that stouts and porters are heavy. It learns to tilt positive by the ‘c’ in ‘excellent’ and that the ‘f’ in ‘awful’ reveals negative sentiment.

4.1 GENERATING TEXT

Running the concatenated input RNN in generative mode and conditioning upon a 5 star rating, we

produce a decidedly positive review:

STRPoured from a 12oz bottle into a pint glass. A: Pours a deep brown color with a thin tan head. The aroma is of coffee, chocolate, and coffee. The taste is of roasted malts, coffee, chocolate, and coffee. The finish is slightly sweet and smooth with a light bitterness and a light bitterness that lingers on the palate. The finish is slightly bitter and dry. Mouthfeel is medium bodied with a good amount of carbonation. The alcohol is well hidden. Drinkability is good. I could drink this all day long. I would love to try this one again and again. EOS Conditioning on the “Fruit / Vegetable Beer” category, the model generates a commensurately botanical review; interestingly the user “Mikeygrootia” does not exist in the dataset.

STRThanks to Mikeygrootia for the opportunity to try this one. A: Poured a

nice deep copper with a one finger head that disappears quickly. Some lacing. S:

A very strong smelling beer. Some corn and grain, some apple and lemon peel.

Taste: A very sweet berry flavor with a little bit of a spice to it. I am not sure what to expect from this beer. This stuff is a good summer beer. I could drink this all day long. Not a bad one for me to recommend this beer.EOS For more examples of generated text, please see Appendix A and Appendix B.

4.2 PREDICTING SENTIMENT AND CATEGORY ONE CHARACTER AT A TIME

In addition to running the model to generate output, we take example sentences from unseen reviews and plot the rating which gives the sentence maximum likelihood as each character is encountered (Figure 3). We can also plot the network’s perception of item category, using each category’s prior and the review’s likelihood to infer posterior probabilities after reading each character. These visualizations demonstrate that by the “d” in “Budweiser”, our model recognizes a “lager”. Similarly, Under review as a conference paper at ICLR 2016 (a) “Mindblowing experience.” (b) “Tastes watered down.” (c) “Not the best, not worst.” Figure 4: Log likelihood of the review for many settings of the rating. This tends to be smooth and monotonic for unambiguous sentences. When the sentiment is less extreme, the peak is centered.

reading the “f” in “awful”, the network seems to comprehend that the beer is “awful” and not “awesome” (Figure 3). See appendices C and D for more examples.

To verify that the argmax over many settings of the rating is reasonable, we plot the log likelihood after the final character is processed, given by a range of fine-grained values for the rating (1.0, 1.1, etc.). These plots show that the log likelihood tends to be smooth and monotonic for sentences with unambiguous sentiment, e.g., “Mindblowing experience”, while, they are smooth with a peak in the middle when sentiment is ambiguous, e.g., “not the best, not the worst.” (Figure 4). We also find that the model understands nonlinear dynamics of negation and can handle simple spelling mistakes, as seen in Appendices E and D.

4.3 CLASSIFICATION RESULTS

While our motivation is to produce a character-level general model, running in reverse-fashion as a classifier proved an effective way to objectively gauge what the model knows. To investigate this capability more thoroughly, we compared it to a word-level tf-idf n-gram multinomial logistic regression (LR) model, using the top 10,000 n-grams. Our model achieves a classification accuracy of 89.9% while LR achieves 93.4% (Table 1). Both models make the majority of their mistakes confusing Russian Imperial Stouts for American Porters, which is not surprising because a stout is a sub-type of porter. If we collapse these two into one category, the RNN achieves 94.7% accuracy while LR achieves 96.5%. While the reverse model does not yet eclipse a state of the art classifier, it was trained at the character level and was not optimized to minimize classification error or with attention to generalization error. In this light, the results appear to warrant a deeper exploration of this capability. Please see Appendix F for detailed classification results. We also ran the model in reverse to classify results as positive (≥ 4.0 stars) or negative (≤ 2.0 stars), achieving AUC of.88 on a balanced test set with 1000 examples.





–  –  –

The prospect of capturing meaning in character-level text has long captivated neural network researchers. In the seminal work, “Finding Structure in Time”, Elman (1990) speculated, “one can ask whether the notion ‘word’ (or something which maps on to this concept) could emerge as a consequence of learning the sequential structure of letter sequences that form words and sentences (but in which word boundaries are not marked).” In this work, an ‘Elman RNN’ was trained with 5 input

Under review as a conference paper at ICLR 2016

nodes, 5 output nodes, and a single hidden layer of 20 nodes, each of which had a corresponding context unit to predict the next character in a sequence. At each step, the network received a binary encoding (not one-hot) of a character and tried to predict the next character’s binary encoding. Elman plots the error of the net character by character, showing that it is typically high at the onset of words, but decreasing as it becomes clear what each word is. While these nets do not possess the size or capabilities of large modern LSTM networks trained on GPUs, this work lays the foundation for much of our research. Subsequently, in 2011, Sutskever et al. (2011) introduced the model of text generation on which we build. In that paper, the authors generate text resembling Wikipedia articles and New York Times articles. They sanity check the model by showing that it can perform a debagging task in which it unscrambles bag-of-words representations of sentences by determining which unscrambling has the highest likelihood. Also relevant to our work is Zhang and LeCun (2015), which trains a strictly discriminative model of text at the character level using convolutional neural networks (LeCun et al., 1989; 1998). Demonstrating success on both English and Chinese language datasets, their models achieve high accuracy on a number of classification tasks.

Related works generating sequences in a supervised fashion generally follow the pattern of Sutskever et al. (2014), which uses a word-level encoder-decoder RNN to map sequences onto sequences.

Their system for machine translation demonstrated that a recurrent neural network can compete with state of the art machine translation systems absent any hard-coded notion of language (beyond that of words). Several papers followed up on this idea, extending it to image captioning by swapping the encoder RNN for a convolutional neural network (Mao et al., 2014; Vinyals et al., 2015; Karpathy and Fei-Fei, 2014).

5.1 KEY DIFFERENCES AND CONTRIBUTIONS

RNNs have been used previously to generate text at the character level. And they have been used to generate text in a supervised fashion at the word-level. However, to our knowledge, this is the first work to demonstrate that an RNN can generate relevant text at the character level. Further, while Sutskever et al. (2011) demonstrates the use of a character level RNN as a scoring mechanism, to our knowledge, this is the first paper to use such a scoring mechanism to infer labels, simultaneously learning to generate text and to perform supervised tasks like multiclass classification with high accuracy. Our work is not the first to demonstrate a character-level classifier, as Zhang and LeCun (2015) offered such an approach. However, while their model is strictly discriminative, our model’s main purpose is to generate text, a capability not present in their approach. Further, while we present a preliminary exploration of ways that our generative model can be used as a classifier, we do not train it directly to minimize classification error or generalization error, rather using the classifier interpretation to validate that the generative model is in fact modeling the auxiliary information meaningfully.

6 CONCLUSION

In this work, we demonstrate the first character-level recurrent neural network to generate relevant text conditioned on auxiliary input. This work is also the first work, to our knowledge, to generate coherent product reviews conditioned upon data such as rating and item category. Our quantitative and qualitative analysis shows that our model can accurately perform sentiment analysis and model item category. While this capability is intriguing, much work remains to investigate if such an approach can be competitive against state of the art word-level classifiers. The model learns nonlinear dynamics of negation, and appears to respond intelligently to a wide vocabulary despite lacking any a priori notion of words.

We believe that this is only beginning of this line of research. Next steps include extending our work to the more complex domain of individual items and users. Given users with extensive historical feedback in a review community and a set of frequently reviewed items, we’d like to take a previously unseen (user, item) pair and generate a review that plausibly reflects the user’s tastes and writing style as well as the item’s attributes. We also imagine an architecture by which our concatenated input network could be paired with a neural network encoder, to leverage the strengths of both the encoder-decoder approach and our approach. Details of this proposed model are included in Appendix G.

Under review as a conference paper at ICLR 2016

7 ACKNOWLEDGEMENTS

Zachary C. Lipton’s research is funded by the UCSD Division of Biomedical Informatics, via NIH/NLM training grant T15LM011271. Sharad Vikram’s research is supported in part by NSF grant CNS-1446912. We would like to thank Professor Charles Elkan for his mentorship. We gratefully acknowledge the NVIDIA Corporation, whose hardware donation program furnished us with a Tesla K40 GPU, making our research possible.

REFERENCES Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult. Neural Networks, IEEE Transactions on, 5(2):157–166, 1994.

James Bergstra, Olivier Breuleux, Fr´ d´ ric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume ee Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: A cpu and gpu math compiler in python.

Jeffrey L. Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.

Felix A. Gers, J¨ rgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction u with LSTM. Neural computation, 12(10):2451–2471, 2000.

Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.

Sepp Hochreiter and J¨ rgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):

u 1735–1780, 1997.

Andrej Karpathy and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. arXiv preprint arXiv:1412.2306, 2014.

Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition.

Neural computation, 1(4):541–551, 1989.

Yann LeCun, L´ on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to e document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

Zachary C. Lipton, John Berkowitz, and Charles Elkan. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019, 2015.

Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, and Alan Yuille. Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv preprint arXiv:1412.6632, 2014.

Julian John McAuley and Jure Leskovec. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In Proceedings of the 22nd international conference on World Wide Web, pages 897–908. International World Wide Web Conferences Steering Committee, 2013.

John R. Surber and Mark Schroeder. Effect of Prior Domain Knowledge and Headings on Processing of Informative Text. Contemporary Educational Psychology, 32(3):485–498, jul 2007. ISSN 0361476X. doi: 10.1016/j.cedpsych.2006.08.002. URL http://www.sciencedirect.

com/science/article/pii/S0361476X06000348.

Ilya Sutskever, James Martens, and Geoffrey E. Hinton. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 1017–1024, 2011.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks.



Pages:     | 1 || 3 | 4 |


Similar works:

«Pakistan Sugar Journal January-March 2010 Contents Vol. XXV, No.01 Editorial Board 2 Morphological responses of autumn planted M. Asghar Qureshi Chairman sugarcane to planting geometry and nutrient Dr. Shahid Afghan Member management on different soils under arid conditions Dr. Muhammad Zubair Member Abdul Gaffar Suggu, Ejaz Ahmed, Haji Himayatullah, Dr. Shahid Mahboob Rana Member M. Ayaz, Haji Khalil Ahmed, Muhammad Aslam Published 10 Integrated control strategies for sugarcane disease Under...»

«About this eBook This eBook is best read in two-page view so that the text and the accompanying glosses are displayed alongside one another. If you're reading in Adobe Digital Editions, you can switch to two-page view by selecting Fit Double Pages in the Reading menu. Outlined boxes that are found in the commentary notes on the left side of the screen, as well as elsewhere in the book, indicate a link to the place referenced; clicking on the place referenced (whether it be the appropriate...»

«Successful and Sustainable Neighbourhoods: Sheffield City Council, Design Quality and the Housing Market Renewal Programme 1 Overview Nationally, the Housing Market Renewal (HMR) initiative is being used to help transform our most deprived communities by creating more successful and sustainable neighbourhoods. The aim of this publication is to illustrate how Sheffield, and the South Yorkshire Pathfinder, has put the achievement of design quality at the heart of their approach. After a brief...»

«EXECUTIVE SUMMARY Island of Impunity PUERTO RICO’S OUTLAW POLICE FORCE JUNE 2012 American Civil Liberties Union 125 Broad Street, 18th Floor New York, NY 10004 www.aclu.org I. Executive Summary The Puerto Rico Police Department (PRPD), charged with policing the Commonwealth of Puerto Rico, is the second-largest police department in the United States, second only to the New York City Police Department. The PRPD’s over 17,000 police officers police the island’s approximately 3.7 million...»

«Extracted from the Geological Conservation Review Volume 28: Coastal Geomorphology of Great Britain You can view an introduction to this volume Chapter 3: Hard-rock cliffs – GCR site reports at http://www.jncc.gov.uk/page-2731 Site: DUNBAR (GCR ID: 2301) © JNCC 1980–2007 DUNBAR J.D. Hansom OS Grid Reference: NT661778 Introduction The GCR site of Dunbar contains an excellent range of rocky coastal landforms within a 2 km stretch of coastline. Of exceptional note is a series of emerged and...»

«PRAYERS TO STRENGTHEN YOUR INNER MAN by Mike Bickle IHOP.org MikeBickle.org PRAYERS TO STRENGTHEN YOUR INNER MAN By Mike Bickle Published by Forerunner Publishing International House of Prayer 3535 E. Red Bridge Road Kansas City, MO 64137 IHOP.org MikeBickle.org Copyright © 2009 by Forerunner Publishing Our copyright is the right to copy. ISBN: 978-0-9823262-1-3 Unless otherwise noted, all Scripture quotations are from the New King James Version of the Bible. Copyright © 1979, 1980, 1982 by...»

«Economic Development for the 21st Century Collection Editors: Christopher Houser Alina Slavik Economic Development for the 21st Century Collection Editors: Christopher Houser Alina Slavik Authors: Christopher Houser Alina Slavik Ryan Stickney Online: http://legacy.cnx.org/content/col11747/1.5/ OpenStax-CNX This selection and arrangement of content as a collection is copyrighted by Christopher Houser, Alina Slavik. It is licensed under the Creative Commons Attribution License 4.0...»

«REVISED INTERIM WRITTEN DESCRIPTION GUIDELINES TRAINING MATERIALS Contents Synopsis.. 4 Decision Trees Written Description Amended or New Claims or Claims Asserting the Benefit of an Earlier Filing Date. 6 Original Claims.. 7 Example 1: Amended claims. 10 Example 2: 35 USC 120 Priority. 13 Example 2A: Essential element missing from original claim. 15 Example 2B: A preferred element missing from original claim. 17 Example 3: New claims.. 19 Example 4: Original claim. 22 Example 5: Flow...»

«P R E LI M I NAR I E S AN OVERVIEW OF THE EAST ASIA SUMMIT RAPID DISASTER RESPONSE TOOLKIT PRELIMINARIES The East Asia Summit Rapid Disaster Response Toolkit has been prepared by Emergency Management Australia and BNPB, Indonesia, in collaboration with relevant agencies from all 18 East Asia Summit participating countries and in consultation with the ASEAN Committee on Disaster Management (ACDM). Parts of the text and templates contained in this publication are quoted or reprinted from the...»

«Edition 30 September 2013 WRNM News! The village newsletter of Wykeham, Ruston and North Moor LOCAL NEWS FOR LOCAL PEOPLE!! Ruston Highland Cattle September is here heralding the end of a long and beautiful Summer celebrated in the Villages with the Village Show and the first flower festival for many years. You will find reports on both in this edition together with lots of cricket news and more about those beautiful cows on the cover. This is the 30th edition of WRNM News! Hard to believe...»

«Philosophica 85 (2012) pp. 35-66 GIVING RESPONSIBILITY A GUILT-TRIP: VIRTUE, TRAGEDY, AND PRIVILEGE Kevin M. DeLapp ABSTRACT In this paper, I argue for the ethical importance of the retributive emotion of ‗tragic-guilt,‘ namely, the feeling of self-recrimination for doing harm even if it could not be prevented. Drawing on empirical evidence concerning the phenomenology of such guilt, as well as thought-experiments concerning moral responsibility for inherited privilege, I distinguish...»

«Vision In and Out of Vehicles: Integrated Driver and Road Scene Monitoring Nicholas Apostoloff and Alexander Zelinsky The Australian National University, Robotic Systems Laboratory, Research School of Information Sciences and Engineering, Canberra ACT 0200, Australia Abstract. 1.17 million people die in road crashes around the world each year. It is estimated that up to 30% of these fatalities are caused by fatigue and inattention. This paper presents preliminary results of an Intelligent...»





 
<<  HOME   |    CONTACTS
2017 www.thesis.dislib.info - Online materials, documents

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.