Academia.eduAcademia.edu
Sentiment Analysis: An Overview Comprehensive Exam Paper Yelena Mejova Computer Science Department, University of Iowa yelena-mejova@uiowa.edu November 16, 2009 Abstract As a response to the growing availability of informal, opinionated texts like blog posts and product review websites, a field of Sentiment Analysis has sprung up in the past decade to address the question What do people feel about a certain topic? Bringing together researchers in computer science, computational linguistics, data mining, psychology, and even sociology, Sentiment Analysis expands the traditional fact-based text analysis to enable opinion-oriented information systems. This paper is an overview of Sentiment Analysis, its basic tasks and the latest techniques developed to address the challenges of working with emotionally-charged text. 1 CONTENTS 2 Contents 1 Introduction 3 2 Definitions 2.1 What is Sentiment? . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Examples of Sentiment Research . . . . . . . . . . . . . . . . . . 3 3 5 6 3 Goals of Sentiment Analysis 6 4 Methodologies 4.1 Classification . . . . . . . . . . . . . . . . . . . . . . . 4.2 Identifying the semantic orientation of words . . . . . . 4.3 Identifying semantic orientation of sentences and phrases 4.4 Identifying the semantic orientation of documents . . . . 4.5 Object feature extraction . . . . . . . . . . . . . . . . . 4.6 Comparative Sentence Identification . . . . . . . . . . . 4.7 Performance Achieved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 9 10 15 15 17 18 20 5 General Questions 20 6 Commercial Uses 23 7 Open Research Directions 24 8 Conclusions 26 A A selection of lists of “fundamental” or “basic” emotions 28 1 INTRODUCTION 3 1 Introduction When conducting serious research or making every-day decisions, we often look for other people’s opinions. We consult political discussion forums when casting a political vote, read consumer reports when buying appliances, ask friends to recommend a restaurant for the evening. And now Internet has made it possible to find out the opinions of millions of people on everything from latest gadgets to political philosophies. The latest Pew study on Internet and Civic engagement says that “just under one in five internet users (19%) have posted material about political or social issues or a used a social networking site for some form of civic or political engagement”1 . Another study shows that a third (33%) of internet users read blogs, with 11% doing so on daily basis2 . Internet is increasingly both the forum for discussion and source of information for a growing number of people. Ready availability of opinionated text has created a new area in text analysis, expanding the subject of study from traditionally fact- and information-centric view of text to enable sentiment-aware applications. In the past decade, extraction of sentiment from text has been getting a lot of attention in both industry and academia. Increasingly businesses realize the importance of Internet users’ opinions about their product and services. This paper is an overview of the area of Sentiment Analysis, which deals with subjective texts. Our first task is to define sentiment and delineate its relation to text. 2 Definitions 2.1 What is Sentiment? One of the challenges of Sentiment Analysis is defining the objects of the study – opinions and subjectivity. Originally, subjectivity was defined by linguists, most prominently, Randolph Quirk (R. Quirk and Svartvik, 1985). Quirk defines private state as something that is not open to objective observation or verification. These private states include emotions, opinions, and speculations, among others. The very definition of a private state foreshadows difficulties in analyzing sentiment. Subjectivity is often implied in conversation, it is highly context-sensitive, and its expression is often peculiar to each person. Note, however, that subjective does not imply not true (Wiebe et al., 2004). The sentence “Mary loves chocolate” 1 http://www.pewinternet.org/Reports/2009/15–The-Internet-and-Civic-Engagement.aspx 2 http://www.pewinternet.org/Commentary/2008/July/New-numbers-for-blogging-and-blog- readership.aspx 2 DEFINITIONS 4 expresses a sentiment of Mary towards chocolate, but it doesn’t mean it’s not true. Likewise, not all objective sentences are true. To underline the ambiguity of the concept, Pang and Lee (Pang and Lee, 2008) list the definitions of terms closely linked to the notion of sentiment: • Opinion implies a conclusion thought out yet open to dispute (“each expert seemed to have a different opinion”). • View suggests a subjective opinion (“very assertive in stating his views”). • Belief implies often deliberate acceptance and intellectual assent (“a firm belief in her partys platform”). • Conviction applies to a party’s firmly and seriously held belief (“the conviction that animal life is as sacred as human”). • Persuasion suggests a belief grounded on assurance (as by evidence) of its truth (“was of the persuasion that everything changes”). • Sentiment suggests a settled opinion reflective of ones feelings (“her feminist sentiments are well-known”). Wiebe, a prominent Natural Language Processing (NLP) researcher, used Quirk’s definition of the private state when tracking point of view in narrative (Wiebe, 1994). She defines private state as a tuple (p, experiencer, attitude, object) relating experiencer’s state p to his/her attitude possibly toward an object. In practice, a simplified version of this model, where we look only at polarity and the target of the sentiment, is usually used. In fact, many researchers define sentiment loosely, as a negative or positive opinion (Pang and Lee, 2002; Hu and Liu, 2005; Melville et al., 2009). Sentiment also has several unique properties that set it apart from other qualities that we may want to track in text. Often we want to categorize text by topic, which may involve dealing with whole taxonomies of topics. Sentiment classification, on the other hand, usually deals with two classes (positive vs. negative), a range of polarity (e.g. star ratings for movies), or even a range in strength of opinion (Pang and Lee, 2008). These classes span many topics and users and kinds of documents. Although dealing with only a few classes may seem like an easier task than standard text analysis, it couldn’t be further from the truth. 2 DEFINITIONS 5 2.2 Sentiment Analysis As a field of research, it is closely related to (or can be considered a part of) computational linguistics, natural language processing, and text mining. Proceeding from the study of affective state (psychology) and judgment (appraisal theory), this field seeks to answer questions long studied in other areas of discourse using new tools provided by data mining and computational linguistics. Sentiment Analysis has many names. It’s often referred to as subjectivity analysis, opinion mining, and appraisal extraction, with some connections to affective computing (computer recognition and expression of emotion) (Pang and Lee, 2008). The field usually studies subjective elements, defined by Wiebe et. al. as “linguistic expressions of private states in context” (Wiebe et al., 2004). These are usually single words, phrases, or sentences. Sometimes whole documents are studied as a sentiment unit (Turney and Littman, 2003; Agrawal et al., 2003), but it’s generally agreed that sentiment resides in smaller linguistic units (Pang and Lee, 2008). Since sentiment and opinion often refer to the same idea, this paper will use the terms interchangeably. Sentiment that appears in text comes in two flavors: explicit where the subjective sentence directly expresses an opinion (“It’s a beautiful day”), and implicit where the text implies an opinion (“The earphone broke in two days”) (Liu, 2006). Most of the work done so far focuses on the first kind of sentiment, since it is the easier one to analyze. Sentiment polarity is a particular feature of text. It is usually dichotomised into two – positive and negative – but polarity can also be thought of as a range. A document containing several opinionated statements would have a mixed polarity overall, which is different from not having a polarity at all (being objective). Furthermore, a distinction must be made between the polarity of sentiment and of its strength. One may feel strongly about a product being OK, not particularly good or bad; or weakly about a product being very good (because perhaps one owned it for too short time to form a strong opinion). Another important part of sentiment is its target - an object, a concept, a person, anything. Most work has been done on product and movie reviews, where it is easy to identify the topic of the text. But it is often useful to pay attention to which feature of this object the writer is talking about: is it the camera display or battery life that troubles consumers the most? Because of ready availability of product review datasets, feature extraction has been closely studied in the past decade (Liu, 2006; Hu and Liu, 2005; Popescu and Etzioni, 2005). The mention of these features in text can also be explicit (“Battery life is too short”) or implicit (“Camera is too large”) (Liu, 2006). Unlike in usual topical analysis, sentiment statement authorship can be integral to the problem. One of the main problems is quotation. It is important to 3 GOALS OF SENTIMENT ANALYSIS 6 know that the sentiment expressed in the document is representative of the actual intent of the author. Political commentaries and news are full of quotations and opinion citations, and can have a convoluted structure that is difficult to discern. For example, a news article about a political debate would have a mix of quotations from the debaters, the pundits commenting on the debate, and perhaps even the author’s stance on the issues. 2.3 Examples of Sentiment Research As mentioned before, some of the most studied texts in Sentiment Analysis are product and movie reviews (Hu and Liu, 2005; Popescu and Etzioni, 2005). The advantage is that they already have a clearly specified topic, and it is often (reasonably) assumed that the sentiments expressed in the reviews have to do with the topic. Many also have a star rating system, which serves as a quantitative indication of the opinion. Such data is often used as gold standard while evaluating sentiment extraction/identification. A general task aimed at sentiment research would be to find opinions on a given product in any web content. Several companies offer services in brand tracking and market perception use Sentiment Analysis techniques. For example, OpSec Security3 provides “monitoring, measuring, and analyzing consumer feedback” to their customers, helping them understand the market needs, target customer segments, and their position against competitors. On the other hand, one of the most difficult areas for Sentiment Analysis methods is that of politics. Political discussions are fraught with quotations, sarcasm, and complex references to persons, organizations, and ideas (Gamon et al., 2008). Some work has been done on determining whether a political speech is in support of or in opposition to the issue under debate (Bansal et al., 2008; Thomas and B. Pang, 2006). There is a related work on categorizing election forums into “likely to win” and “unlikely to win” (Kim and Hovy, 2007). This problem of complex discussions will be further addressed in Open Research Directions section. 3 Goals of Sentiment Analysis Because of the complexity of the problem (underlying concepts, expressions in text, etc.), Sentiment Analysis encompasses several separate tasks. These are usually combined to produce some knowledge about the opinions found in text. This section provides an overview of these tasks, and the next will discuss some of the tools that are used for each. 3 http://www.opsecsecurity.com/ 3 GOALS OF SENTIMENT ANALYSIS 7 The first task is sentiment or opinion detection, which may be viewed as classification of text as objective or subjective. Usually opinion detection is based on the examination of adjectives in sentences. For example, the polarity of “this is a beautiful picture” can be determined easily by looking at the adjective. An early study by Hatzivassiloglou (Hatzivassiloglou and Wiebe, 2000) examines the effects of adjectives in sentence subjectivity. More recent studies (Benamara et al., 2007) have shown that adverbs may be used for similar purpose. A survey of subjectivity recognition techniques can be found in (Wiebe et al., 2004). The second task is that of polarity classification. Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities (Pang and Lee, 2008). When viewed as a binary feature, polarity classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion. Most of this research was done on product reviews, where the definitions of “positive” and “negative” are clear. Other tasks, such as classifying news as “good” or “bad” present some difficulty. A news article may contain “bad” news without actually using any subjective terms. Furthermore, these classes usually appear intermixed when a document expresses both positive and negative sentiments. Then the task can be to identify the main sentiment of the document. To distinguish between different mixtures of the two opposites, polarity classification uses a multi-point scale (such as the number of stars for a movie review). This is where the task becomes a multi-class text categorization problem. But unlike the topic-based multi-class classification problems where vocabularies differ for each class (or overlap slightly), the vocabularies for positive, neutral, and negative classes can be very much alike, and differ only in few crucial words. Since many documents have a “mixed” opinion, this class is actually a combination of positive and negative. Negations, which tend to be disregarded in much of text analysis as unimportant, play an important role in sentiment, flipping an originally positive term into negative, and vice versa (see section 4.1.5 for more on negations). The above two tasks can be done at several levels: term, phrase, sentence, or document level. It is common to use the output of one level as the input for the higher layers (Turney and Littman, 2003; Dave et al., 2003; Kanayama et al., 2004). For instance, we may apply sentiment analysis to phrases, and then use this information to evaluate sentences, then paragraphs, etc. Different techniques are suitable for different levels. Techniques using n-gram classifiers or lexicons usually work on term level, whereas Part-Of-Speech tagging is used for phrase and sentence analysis. Heuristics are often used to generalize the sentiment to document level. A third task that is complementary to sentiment identification is the discovery 4 METHODOLOGIES 8 of the opinion’s target. The difficulty of this task depends largely on the domain of the analysis. As mentioned earlier, it is usually safe to assume that product reviews usually talk about the specified product. On the other hand, general writing such as webpages and blogs don’t always have a pre-defined topic, and often mention many objects. Another lively area of research is feature extraction, given an object or topic of the text (Liu et al., 2005; Popescu and Etzioni, 2005; Hu and Liu, 2005). Liu et. al. define features as either components or attributes of an object (Liu, 2006), which is a definition that is mostly used in practice. An example of features extracted for a scanner, for example, can be found in Table 1 in section 4.5. Breaking down the discussion into features allows for a more precise analysis of the sentiments, and for a more detailed summarization of the results. Sometimes there is more than one target in a sentiment sentence, which is the case in comparative sentences. A subjective comparative sentence orders objects in order of preferences, for example, “this camera is better than my old one”. These sentences can be identified using comparative adjectives and adverbs (more, less, better, longer), superlative adjectives (most, least, best) and other words such as same, differ, win, prefer, etc. (Liu, 2006) Once the sentences have been retrieved, the objects can be put in an order that is most representative of their merits, as described in text. One of the peculiarities of sentiment is that even though the notion of positive and negative opinion is a general one, the expression of these opinions differs widely across the spectrum of topical domains. Thus, topic-specific and crosstopic sentiment analysis is studied in order to improve performance in a particular domain. Here, combining general knowledge about the expression of sentiment and topic-specific one is an important issue. In cross-topic analysis, the idea is to use the knowledge gathered about one domain in another (perhaps from one with labeled data to one without) (Nigam and Hurst, 2004; Blitzer et al., 2007). 4 Methodologies A wide range of tools and techniques are used to tackle the goals described above. This section describes some of the most common and interesting ones. First, Machine Learning and Part-Of-Speech tagging will be discussed, since these are very powerful tools that are most often used in Sentiment Analysis. Then specific techniques and approaches for tackling each of the tasks described in the previous section will be addressed. 4 METHODOLOGIES 9 4.1 Classification Many of the tasks in Sentiment Analysis can be thought of as classification. Machine Learning offers many algorithms designed to do just that, but this task of classifying text according to its sentiment presents many unique challenges. These can be formulated in one question: “What kinds of features do we use?” 4.1.1 Term Presence Vs Frequency Traditional Information Retrieval systems have long emphasized the importance of term frequency. The famous TF-IDF (Term Frequency - Inverse Document Frequency) measure is well-used in modeling documents (Jones, 1972). The intuition is that terms that often appear in the document but seldom in the whole collection are more informative as to what the document is about as compared to the terms mentioned just once. In the field of Sentiment Analysis we find that instead of paying attention to most frequent terms, it is more beneficial to seek out the most unique ones. Pang et al. (Pang and Lee, 2002) improve the performance of their system using term presence instead of frequency. Document representation emphasizing term presence contain 1 if the term appears in the document at least once, 0 otherwise. Wiebe (Wiebe et al., 2004) writes “apparently people are creative when they are being opinionated”, implying that increased importance of low-frequency terms in opinionated texts. 4.1.2 n-grams Term positions are also important in document representation for Sentiment Analysis. The position of terms determines, and sometimes reverses, the polarity of the phrase. So, position information is sometimes encoded into the feature vector (Pang and Lee, 2002; Kim and Hovy, 2006). Wiebe (Wiebe et al., 2004) selects ngrams (n=1,2,3,4) based on precision calculated using annotated documents. The n-grams are a word-stem, part-of-speech pair. For instance (in-prep the-det cannoun) is a 3-gram. The importance of part-of-speach tagging is discussed in the next section. 4.1.3 Part-of-Speech As mentioned earlier, it has been determined that adjectives are good indicatives of sentiment in text (Hatzivassiloglou and Wiebe, 2000; Benamara et al., 2007), and in the past decade they have been commonly exploited in Sentiment Analysis (Mullen and Collier, 2004; Whitelaw et al., 2005). This is true for other fields in textual analysis, since part-of-speech tags can be considered to be a crude form of word sense disambiguation (Wilks and Stevenson, 1998). For example, Turney 4 METHODOLOGIES 10 (Turney, 2002) uses part-of-speech patterns, most including an adjective or an adverb, for sentiment detection at the document level. 4.1.4 Syntax Syntax information has also been used in feature sets, though there is still discussion about the merit of this information in Sentiment classification (Pang and Lee, 2008). This information may include important text features such as negation, intensifiers, and diminishers (Kennedy and Inkpen, 2006). Kudo et. al. (Kudo and Matsumoto, 2004) used subtree-based boosting algorithm with dependencytree-based features for polarity classification, and show that it outperforms the bag-of-words baseline. 4.1.5 Negations Negations have been long known to be integral in Sentiment Analysis. The usual bag-of-words representation of text disconnects all of the words, and considers sentences like “I like this book” and “I don’t like this book” very similar, since only one word distinguishes one from the other. But when talking about sentiment, a negation flips the polarity of a whole phrase. Negations are often considered in post-processing of results, while the original representation of text ignores them (Hu and Liu, 2005). Or, as in Das and Chen (Das and Chen, 2001a), one could explicitly include the negation in the document representation by appending them to the terms that are close to negations; for example term “like-NOT” would be extracted form “I don’t like this book” (Pang and Lee, 2008). Though using co-location may be too crude a technique. It would be incorrect to negate the sentiment in a sentence such as “No wonder everyone loves it”. To handle such cases, Na et. al. (Na et al., 2004) use specific part-of-speech tags patterns to identify the negations relevant to the sentiment polarity of a phrase. 4.2 Identifying the semantic orientation of words One of the most basic tasks in Sentiment Analysis is identifying the semantic orientation (the polarity and objectivity) of a word. A variety of techniques have been used, which can be roughly categorized in the following: • using a lexicon, constructed manually or automatically • using some statistical techniques such as looking at concurrence of a word with a word of a known polarity • using training documents, labeled or unlabeled, as a source of knowledge about the polarity of terms within the collection 4 METHODOLOGIES 11 Each of these techniques has its advantages and difficulties, which will be discussed in detail in this section. 4.2.1 Lexicons Extended lexicons are a fundamental part of Sentiment Analysis, but not all of them are alike. The simplest ones are ones with binary classification of words into positive vs. negative polarities or objective vs. subjective. A more fine distinction between the classes can be made with fuzzy lexicons where each label has a score associated with it, conveying the “strength” of the label. A yet more sophisticated approach is to adopt any of the finer-grained affective classifications developed in areas of psychology such as Plutchik’s emotion model (Prinz, 2004). Ortony, Clore, and Collins’s book The Cognitive Structure of Emotions (Ortony et al., 1988) provides an overview of several theories of “fundamental” or “basic” emotions (see Appendix A). Finally, even more sophisticated knowledge can be utilized, like commonsense knowledge bases that have been developed by researchers in Artificial Intelligence. This section describes recent work done on sentiment-annotated lexicons. A variety of lexicons have been created for the use in Sentiment Analysis, often by extending existing general-purpose lexicons. For example, Subasic and Huettner (Subasic and Huettner, 2001) have manually constructed a lexicon associating words with affect categories, specifying an intensity (strength of affect level) and centrality (degree of relatedness to the category) (Dave et al., 2003). This lexicon can be called “fuzzy” since it is able to handle ambiguity of a term by assigning it to several semantic categories. The system is then designed to work with these “fuzzy” definitions: “After the affect words in a document are tagged, the fuzzy logic part of the system handles them by using fuzzy combination operators, set extension operators and a fuzzy thesaurus to analyze fuzzy sets representing affects” (Subasic and Huettner, 2001). Besides manual annotation, other resources can be used to build lexicons. Existing lexicons can be augmented to include sentiment information. Princeton University’s WordNet lexicon has been one of the most popular ones to be used for Sentiment Analysis. As described on http://wordnet.princeton.edu/, WordNet R is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. Esuli and Sebastiani (Esuli and Sebastiani, 2006) expand WordNet by adding polarity (Positive-Negative) and objectivity (Subjective-Objective) labels for each 4 METHODOLOGIES 12 Figure 1: Graphical representation of opinion-related properties of a term term. To label each term, they classify the synset (a group of synonyms) to which this term belongs using a set of ternary classifiers (a device that attaches to each object exactly one out of three labels), each of them capable of deciding whether a synset is Positive, or Negative, or Objective. The resulting scores range from 0.0 to 1.0, giving a graded evaluation of opinion-related properties of the terms. These can be summed up visually as in Figure A. The edges of the triangle represent one of the three classifications (positive, negative, and objective). A term can be located in this space as a point, representing the extent to which it belongs to each of the classifications. Another extension to WordNet is WordNet-Affect, developed by Strapparava and Valitutti (Strapparava and Vlitutti, 2004). They label WordNet synsets using affective labels (a-labels) representing different affective category like emotion, covnitive state, attitude, feeling, etc. WordNet has been also directly used in Sentiment Analysis. For example, Kim and Hovy (Kim and Hovy, 2004) and Hu and Liu (Hu and Liu, 2005) generate lexicons of positive and negative terms by starting with a small list of “seed” terms of known polarities (e.g. love, like, nice, etc.) and then using the antonymy and synonymy properties of terms to group them into either of the polarity categories. Other resources have been used to generate lexicons. Extensive work has been done to create common sense knowledge bases in the field of Artificial Intelligence. These are collections of facts and information that an ordinary person is expected to know. Some of the most prominent projects are Cyc4 , Open Mind Common Sense5 , and ThoughtTreasure6 (Liu et al., 2003). Liu et. al. have used 4 http://www.cyc.com/ 5 http://openmind.media.mit.edu/ 6 http://alumni.media.mit.edu/ mueller/papers/tt.html 4 METHODOLOGIES 13 the Open Mind Common Sense knowledge base, containing close to half a million sentences, to create several models mapping different concepts to six “basic” emotions - happiness, sadness, anger, fear, disgust, surprise - based on Ekman’s research on universal facial expressions (Ekman, 1993). Furthermore, Zhou and Chaovalit have developed an ontology-supported polarity mining (OSPM) approach to semantic labeling (Zhou and Chaovalit, 2008). They manually built an ontology for movie reviews and incorporated it into the polarity classification task, significantly improving performance over standard baseline. There are also ways of determining sentiment orientation of words using statistical analysis of large corpora of text. Turney and Littman (Turney and Littman, 2003), for example, use word co-occurrence to infer semantic orientation of words, for which they explore two methods: Pointwise Mutual Information (PMI) (Church and Hanks, 1989) and Latent Semantic Analysis (LSA) (Landauer and Dumais, 1997). The ideas is that “the semantic orientation of a word tends to correspond to the semantic orientation of its neighbors”. The techniques they utilize are actually quite different in nature: PMI calculates word co-occurrence by querying a search engine, and LSA uses a matrix factorization technique Singular Value Decomposition to analyze the statistical relationship between words. As resources are seldom compared, it is still an open question as to which of these is the most beneficial for building annotated lexicons. The approaches described above vary greatly in the amount of data or human supervision needed. Thus, it is an important task to understand just how much of an improvement in performance we get by using a more sophisticated lexicon instead of a basic one. 4.2.2 Using Training Documents It is possible to perform sentiment classification using statistical analysis and machine learning tools that take advantage of the vast resources of labeled (manually by annotators or using a star/point system) documents availble. Product review websites like C-NET7 , Ebay8 , RottenTomatoes9 and the Internet Movie Database (IMDB)10 have all been extensively used as sources of annotated data. The star (or tomato, as it were) system provides an explicit label of the overall polarity of the review, and it is often taken as a gold standard in algorithm evaluation. A variety of manually labeled data is available through evaluation efforts such as the Text REtrieval Conference (TREC)11 , NII Test Collection for IR Systems 7 http://www.cnet.com/ 8 http://www.ebay.com/ 9 http://www.rottentomatoes.com/ 10 http://www.imdb.com/ 11 http://trec.nist.gov/ 4 METHODOLOGIES 14 (NTCIR)12 , and Cross Language Evaluation Forum (CLEF)13 . The datasets these efforts produce often serve as standard in the Information Retrieval community, including for Sentiment Analysis researchers. Individual researchers and research groups have also produced many interesting data sets. Here are some of these: • Congressional floor-debate transcripts14 - published by Thomas et al. (Thomas and B. Pang, 2006) contains political speeches that are labeled to indicate whether the speaker supported or opposed the legislation discussed. • Economining15 - published by Stern School at New York University, consisting of feedback postings for merchants at Amazon.com. • Cornell movie-review datasets16 - introduced by Pang and Lee (Pang and Lee, 2008), containing 1000 positive and 1000 negative automatically derived document-level labels, and 5331 positive and 5331 negative sentences /snippets • MPQA Corpus17 - Corpus and Opinion Recogntion System corpus contains 535 manually annotated news articles from a variety of news sources containing labels for opinions and private states (beliefs, emotions, speculations, etc.) • Multiple-aspect restaurant reviews18 - introduced by Snyder and Barzilay (Snyder and Barzilay, 2007), contains 4,488 reviews with an explicit 1-to-5 rating for five different aspects - food, ambiance, service, value, and overall experience Once a desirable data set has been obtained, a variety of machine learning algorithms can be used to train sentiment classifiers. Some of the most popular algorithms are Support Vector Machines (Pang and Lee, 2002; Dave et al., 2003; Gamon, 2004; Matsumoto et al., 2005; Airoldi et al., 2006), Naive Bayes (Wiebe et al., 1999; Yu and Hatzivassiloglou, 2003; Melville et al., 2009), and maximumentropy-based classifiers (Nigam et al., 1999; Pang and Lee, 2002). A comparison between these can be found in (Pang and Lee, 2002). 12 http://research.nii.ac.jp/ntcir/ 13 http://www.clef-campaign.org/ 14 http://www.cs.cornell.edu/home/llee/data/convote.html 15 http://economining.stern.nyu.edu/datasets.html 16 http://www.cs.cornell.edu/people/pabo/movie-review-data/ 17 http://www.cs.pitt.edu/mpqa/databaserelease/ 18 http://people.csail.mit.edu/bsnyder/naacl07 4 METHODOLOGIES 15 4.3 Identifying semantic orientation of sentences and phrases Once the semantic orientation of individual words has been determined, it is often desirable to extend this to the phrase or sentence the word appears in. One of the most straightforward ways to accomplish this is to take an average of the polarities of words in the sentence. Hu and Liu (Hu and Liu, 2005) write: “if positive/negative opinion prevails, the opinion sentence is regarded as a positive/negative one”. In the case that the number of positive and negative opinion words is the same, they take the orientation of the closest opinion sentence. Yu and Hatzivassiloglou (Yu and Hatzivassiloglou, 2003) train a Naive Bayes classifier using sentences and documents labeled as opinionated or factual as examples of the two categories. The features include words, bigrams, and trigrams, as well as the parts of speech in each sentence. They also use the presence of words with known polarities in a sentence as an indication that the sentence is subjective. And they take into consideration the effect of negation words such as “no”, “not”, and “yet” appearing in the window of 5 words around the word in question. Although simplistic, this heuristic has been shown to work for most of the cases. An even more sophisticated combination of sentiment labels is possible by taking advantage of syntactic relationships between words. For example, Popescu and Etzioni (Popescu and Etzioni, 2005) use an unsupervised classification technique Relaxation Labeling that extends the label attributed to the word to the sentence it appears in. This approach takes, among other things, the negation modifiers, the significance of which is discussed in Section 4.1.5. A novel application of machine translation techniques is described in (Kanayama et al., 2004). Here, using a Japanese to English translation engine, the researchers were able to build sentence trees and then apply pattern matching to discover sentiment orientation of sentences. Because of the sophistication of the method, they were able to incorporate many linguistic cues into the process, including negations, the use of which has been discussed earlier. 4.4 Identifying the semantic orientation of documents Although most of the work is done in determining the semantic orientation of words and phrases, some tasks like summarization and text retrieval may require semantic labeling of the whole document. It may not make much sense to do this for long documents such as articles or books, which have been a key form in traditional Information Retrieval. But in the age of social networking and internet commerce, we see a vastly increasing number and variety of short documents, often containing only a few sentences. These may be product reviews, emails, blog posts, etc. Much like approaches for identifying semantic orientation of 4 METHODOLOGIES 16 words, those for documents also range from simple statistical ones to ones using elaborate knowledge structures to guide the process. One of the most popular, and simple, methods is a linear combination of all polarities. For example, Dave et al. (Dave et al., 2003) and Turney et al. (Turney and Littman, 2003) use averaging to determine the polarity of documents. The polarity of the document can be expressed as  C, eval(di ) > 0 (1) class(di ) = C′ , eval(di ) < 0 eval(di ) = ∑ score(t j ) (2) j where class C is determined by the sum of scores of all terms: if the sum adds to a positive number, the document gets a positive label, otherwise a negative one. Notice that this approach is strictly binary - it does not take into consideration the fact that if a document may have strong opinions both ways, in all it should be considered to have a mixed opinion label, instead of relying on minute differences in measurement to assign either of the polarities. As a part of the Text REtrieval Conference (TREC) task for Blog track in 2007, the teams had to classify documents as having either negative, positive, or mixed opinion (Macdonald et al., 2007). The team at University of Illinois at Chicago (Voorhees and Buckland, 2007) used a set of rules with thresholds to label the documents: • Firstly, if both positive and negative opinions are strong in the document, the document should be mixed. • Otherwise, if one type of the opinions is strong, the document is labeled to that type. • Finally, if there are no strong opinions either way, the document is labeled as mixed. Document labeling can incorporate information other than the semantic orientation of its constituent parts. Agrawal et al. (Agrawal et al., 2003) uses citations in newsgroup postings to divide the group into subgroups that are for or against an issue. An explicit assumption is made (and tested) that citations represent antagonistic standpoints, i.e. it is more likely that a reply would be in disagreement with the previous post than otherwise. Although probably an oversimplification of the actual workings of newsgroups, these are first steps in evaluating documents in their larger contexts. 4 METHODOLOGIES 17 4.5 Object feature extraction Now we move on to another important part of sentiment - its target. In shorter, more focused documents it is often safe to assume that the author is only talking about the topic of the document. Product reviews, for example, usually contain opinions about that product, and movie reviews talk about the movies in question. Yet it is often not enough to know the general topic of the writing. A company making a product would certainly want to know not only what people think about this product in general, but which features they like/dislike in particular. Thus, the task of feature extraction (where feature can be any target of an opinionated statement) has been gaining popularity in the field of Sentiment Analysis. A common approach is to use the part-of-speech (POS) tags to construct templates of how sentiment is applied to objects. For example, Bing Liu et al. (Liu et al., 2005) use this process for a phrase “included memory is stingy”: 1. Perform part-of-speech (POS) tagging and remove digits: “<V>included <N>memory <V>is <Adj>stingy” 2. Replace the actual feature words in a sentence with [feature]: “<V>included <N>[feature] <V>is <Adj>stingy” 3. Use n-gram to produce shorter segments from long ones: “<Adj>included <N>[feature] <V>is” “<N>[feature] <V>is <Adj>stingy” 4. Distinguish duplicate tags by giving them numbers: “<Adj1>included <N1>[feature] <V1>is” 5. Perform word stemming They then use association mining system CBA (Liu et al., 1998) to extract the rest of the features. Once the features are found, they are grouped using aforementioned WordNet synsets. For example, words “photo”, “picture”, and “image” all refer to the same feature in the digital camera. So, if they’re found to be synonymous, they become known synonyms of the same feature. Popescu and Etzioni (Popescu and Etzioni, 2005) improve on Hu and Liu’s algorithm by removing those noun phrases that may not be product features. Their algorithm evaluates each noun phrase by computing a Pointwise Mutual Information (PMI) score between the noun pharse and meronymy (property of being a 4 METHODOLOGIES 18 part of something) discriminators associated with the product class. Here, PMI is defined as MPI( f , d) = hits( f ∧ d) hits( f )hits(d) (3) Given a set of relations of interest, their system calculates PMI between each feature and automatically generated discriminator phrases. For example, “scanner” class would be compared with phrases like “of scanner”, “scanner has”, “scanner comes with”, etc. which are used to find components or parts of scanners by searching the Web. The PMI scores are then converted to binary features for a Naive Bayes Classifier, which outputs a probability associated with each feature (Etzioni et al., 2005). In the end, a rich system of features is developed, a part of which is shown in Figure 1. Table 1: Feature Information Explicit Features Properties Parts Features of Parts Related Concepts Related Concept’s Features Examples Scanner Size Scanner Cover Battery Life Scanner Image Scanner Image Size But the approaches above discover only explicit features, ones that are mentioned in text. There are many implicit features in sentences like “this camera is too large”, which refers to the camera’s size. These can be extracted using the context of already known features. The rule mining technique described in (Liu et al., 2005) can be extended to implicit features by tagging each feature-specific template with its respective feature. In the end, features can be used to effectively summarize sentiment found in text. In Figure 2, for example, the various features of two cameras are compared (Liu et al., 2005). Here, each bar indicates the range of opinions (ranging from negative to positive) on a camera’s feature. At a glance we can tell that there are more positive opinions on Camera 1’s picture quality than Camera 2’s. 4.6 Comparative Sentence Identification One last major research area in Sentiment Analysis is the study of comparative sentences. In (Liu, 2006) Liu defines comparative sentence as “a sentence that 4 METHODOLOGIES 19 Figure 2: Visual comparison of consumer opinions on two products expresses a relation based on similarities or differences of more than one object”. These can be classified into types, such as gradable and non-gradable comparisons. A gradable comparison is based on the relationship of greater, equal to, or less than. For example, “Intel chip is faster than the AMD one” ranks objects in quality. A non-gradable comparison the features are compared, but not ranked in the order of preference: “Coke tastes differently from Pepsi”. Both types of sentences tell us something about the relationships between different objects. Thus, one of the outputs of a comparative sentence analysis system could be a rank of products, as determined by the opinion holders. So far though, identification of comparative sentences has been the primary focus of the computational linguistics community. Jindal and Liu (Jindal and Liu, 2006) take a data mining approach to solve this problem. They use class sequential rule (CSR) mining to identify comparative sentences in customer reviews, forum discussions, and news articles. They use a relatively small list of words (using WordNet19 ) as a first step in identifying the sentences, since it successfully identifies almost all of the comparative sentences (high recall: 94%), though also getting lots of false positives (low precision: 32%). Hou and Li (Hou and Li, 2008) apply another data mining technique, Conditional Random Fields (CRF) to a manually annotated corpus of Chinese comparative sentences. They identify six semantic parts of comparative opinion: Holder, Entity 1, Comparative predicates, Entity 2, Attributes, and Sentiment, and extract them using Semantic Role Labeling, a statistical machine learning technique (Gildea and Jurafsky, 2002). 19 http://wordnet.princeton.edu/ 5 GENERAL QUESTIONS 20 4.7 Performance Achieved An overview of the work done in the most popular task of Sentiment Analysis, polarity classification, is shown in Table 4.7, which is extended from (Zhou and Chaovalit, 2008). This table is meant to offer a sample of work done and is not a comprehensive overview of the works published on the topic of Sentiment Analysis. The work in this area started around 2000 and is still strong today. As mentioned earlier, a lot of work has been done on movie and product reviews, especially popular are the Internet Movie Database (IMDb) and product reviews downloaded from Amazon. The performance achieved by these method is difficult to judge, since each method uses a variety of resources for training and different collections of documents for testing. Many studies, such as Blitzer et al. (2007), deal with several domains, some more “challenging” for their algorithms than others. Notice especially how much the results vary across domains: in a recent study by Melville et al. (2009) the performance of their system on blogs and on political commentary differs by nearly 30%. Some studies, such as Godbole et al. (2007) work on the level of words, sometimes achieving accuracy of over 90%. Others, working on longer documents, such as blog posts and full web pages, have in general performance of around 65–85%. It is clear that although we may be able to build comprehensive lexicons of sentiment-annotated words, it is still a challenge to accurately locate it in text. Few studies have been done outside the realm of short documents like product reviews, and especially in difficult domains like political commentaries. This is true partially because there is little annotated data available for realms outside reviews. Finally, although relatively high accuracy in document polarity labeling has been achieved, it is still a challenge to extract the full private state, complete with the emotion’s intensity, its holder, and its target. 5 General Questions After exploring in detail the tasks of Sentiment Analysis, it is worth to first talk briefly about the general questions Sentiment Analysis brings up. • First and foremost, the precise definition of sentiment still an open question. Can sentiment be ascribed to a word, a phrase, or can it be extended to the whole document? Must it have a target, and how granular is this target? By choosing a representation of sentiment, each researcher implicitly defines the scope and nature of the particular flavor of sentiment they are working with. Text granularity document Features Data sources/Domains Hatzivassiloglou and McKeown (1997) Das and Chen (2001b) Pang and Lee (2002) log-linear regression model conjunctions, part-of-speech Wall Street Journal corpus lexicons and grammar rules Nave Bayes, maximum entropy classification support vector machines document document financial news IMDb (Movie review) Turney (2002) pointwise mutual information document words unigram, bi-gram, contextual effect of negation, feature presence or frequency, position bi-grams Morinaga et al. (2002) decision tree induction document Dave et al. (2003) support vector machines document Yi et al. (2003) sentiment lexicon and semantic pattern Turney and Littman (2003) SO-LSA(Latent Semantic Analysis), SO-PMI (Pointwise Mutual Information) General Inquirer Nave Bayes support vector machines subjectspot terms document Pang and Lee (2004) document characteristic words, cooccurrence words, and phrases semantic features based on substitutions and proximity feature lexical semantics words and phrases sentence-level subjectivity summarization based on minimum cuts opinion words opinion sentences Known positive terms such as excellent and negative terms such as poor movies, cars, banks cellular phones, PDAs and Internet service providers Amazon Cnn.Net 66–84% digital cameras, music albums TASA-ALLcorpus (from sources such as novels and newspaper articles) IMDb 85.6% Amazon Cnn.Net cameras: 93.6% DVD plyr: 73% MP3 plyr: 84.2% cellphone: 76.4% general polarity analysis: precision: 77% (positive), 84% (negative); recall: 43% (positive), 16% (negative) precision: 89% recall 43% Opinion word extraction and aggregation enhanced with WordNet product features Nigam (2004) syntactic rules based chunking sentence a lexicon of polar phrases and their parts-of-speech, syntactic patterns online Usenet, boards) domain sentiment unit full parsing semantic analysis bulletin boards forums on digital cameras Bai et al. (2005) transfer-based machine translation, principal patterns auxiliary/nominal, patterns polarity lexicon two-stage Markov Blanket Classier document IMDb, Infonic Gamon et al. (2005) Nave Bayes classier sentence Popescu and Etzioni (2005) relaxation labeling clustering phrase dependence among words, minimal vocabulary stemmed terms, their frequency and weights, go list (salient words in a domain) Syntactic dependency templates, conjunctions and disjunctions, WordNet Hurst Hiroshi et al. (2004) resources (e.g., online message in a particular car reviews Amazon Cnn.Net N/A 88.9% 65.27% (SO-LSA) 61.26% (SO-PMI) 86.4% movie: 87.5%, news: 89-96% recall: 96% (positive) 5- 24% (negative and other) Opinion phrase polarity: precision: 86% recall: 97% relationships 21 Hu and Liu (2005) and Performance (accuracy) adjectives: precision: >90% 62% 82.9% GENERAL QUESTIONS Polarity mining techniques used 5 Studies Features Data sources/Domains Wilson et al. (2005) AdaBoost subjectivity lexicon support vector machines, termcounting method, a combination of the two Support Vector Machines Wiktionary document term frequencies document textual features (e.g., exclamation points and question marks) and lexical semantics Thomas and B. Pang (2006) support vector machines speech segment reference classification multiperspective Question, Answering Opinion Corpus General Inquirer dictionary, CTRWdictionary & Adj, IMDb (Movie review) Web sites of CNN, NPR, Atlanta Journal and Constitution, newspaper columns, reviews, political blogs, etc. 2005 U.S. floor debate in the House of Representatives Kennedy and Inkpen (2006) Kaji and Kitsuregawa (2007) Blitzer et al. (2007) phrase trees and word co-occurrence, Pointwise Mutual Information Structural Correspondence Learning phrase lexical relationships, word cooccurrence word frequencies and cooccurrences, part-of-speech Godbole et al. (2007) lexical (WordNet) word Annett and Kondrak (2008) lexical (WordNet) & Support Vector Machines document Zhou and Chaovalit (2008) Hou and Li (2008) ontology-supported polarity mining document Conditional Random Fields sentence Ferguson et al. (2009) Tan et al. (2009) Multinomial Naive Bayes (MNB) Naive Bayes Classifier with feature adaptation using Frequently Cooccuring Entropy boosting, memory-based learning, rule learning, and support vector learning Bayesian classification with lexicons and training documents Chesley et al. (2006) Wilson et al. (2009) Melville et al. (2009) document HTML documents Performance (accuracy) contextual polarity: 65.7% enhanced combined method: 86.2% positive: 84.2% negative: 80.3% objective: 72.4% with same-speaker links and agreement links: 71.16% 62.7–92.9% book, DVD, electronics and kitchen appliance product reviews newspapers, blog posts 66.1–86.6% movie reviews, blog posts 65.4–77.5% movie reviews 72.2% POS tags, comparative sentence elements product reviews, forum discussions; labeled manually and automatically phrase document binary word feature vectors words financial blog articles Education reviews, stock reviews, and computer reviews precision: man.: 89% aut.: 75% recall: man.: 81%, aut.: 71% 75.25% F1 score: 69–91% phrase words, negation, polarity modification features words MPQA Corpus 83.6% Blog posts reviewing software, political blogs, movie reviews Blogs: 91.21% Political: 63.61% movies: 81.42% document graph distance measurements between words based on relationships of synonymity and anonymity, commonality of a words number of positive/negative adjectives/adverbs, presence, absence or frequency of words, minimum distance from pivot words in WordNet n-grams, words, word senses 82.7–95.7% 22 Text granularity phrase GENERAL QUESTIONS Polarity mining techniques used 5 Studies 6 COMMERCIAL USES 23 • Once the notion of sentiment is settled, we need to find out how it is expressed in text. Is it just in the emotionally-charged words, or also in sentence structure? Can misspelling or punctuation tell us something about the sentimental nature of the passage? Does the document’s sentiment spread to other related documents, say by links or co-authorship? • Finally, a variety of cross-domain considerations need to be examined. What is the difference between expressing an opinion on zoom of a camera or an ambiance of a restaurant or fairness of a law? Are there cultural differences in the ways people express their opinions? Can people be grouped by the way they express their opinions on a subject? Is it possible to determine emotion in real time? It appears that each research project reviewed here presents its own flavor of emotional discourse. 6 Commercial Uses Although the field of Sentiment Analysis is relatively young, there are already numerous businesses that use the techniques developed in this field to customers interested in brand tracking and market perception. For instance, as a part of its anti-counterfeiting and online brand abuse services, OpSec Security20 provides sentiment analysis services such as “monitoring, measuring, and analyzing consumer feedback” so that their customers are “better informed to understand market needs, target customer segments, and position against competitors”. Specifically, these are the types of activities that may be involved: • Tracking collective user opinions and ratings of products and services • Analyzing consumer trends, competitors, and market buzz • Measuring response to company-related events and incidents • Monitoring critical issues to prevent negative viral effects • Evaluating feedback in multiple languages As a source of opinionated discourse, these companies look at • Online communities • Discussion boards 20 http://www.opsecsecurity.com/ 7 OPEN RESEARCH DIRECTIONS 24 • Weblogs • Product rating sites • Chatrooms • Price comparison portals • Newsgroups By aggregating, evaluating, and interpreting the data found on these web sites, OpSec promises to “provide insights and recommendations” and “forecast product and brand trends”. Text analysis vendor Lexalytics21 , on the other hand, worked with Cisco, where they “used a sentiment engine to determine which executives have the highest correlation to positively moving the stock price” (Grimes, 2008). The discovery of the “opinion leaders”, they claim, helps companies discover their strengths. These services may also be helpful to a government intelligence agency. Monitoring communications for spikes in negative sentiment may be of use to agencies like Homeland Security. But besides companies and government agencies, general web users can benefit from sentiment-aware tools. There are several opinionoriented search engines available online such as Opinmind22 . By pre-labeling web pages and blogs, these services provide clustered view of the results, which enhances the users understanding of the results. The topics need not be restricted to product reviews - these can be political issues or opinions about candidates running for office (Pang and Lee, 2008). Finally, opinion discovery can be a useful subcomponent of another service. Recommender systems can greatly benefit from extracting user ratings from text. Information retrieval systems can also use subjectivity measures when dealing with a certain type of information, such as when objectivity is desired in a scientific literature search. 7 Open Research Directions A relatively new field, in its brief history Sentiment Analysis has used natural language processing, data mining, and text retrieval tools to tackle the problem of extracting opinions from text. The initial attempts at solving this problem borrowed techniques from the related areas of research: statistical methods have been used 21 http://www.lexalytics.com 22 http://www.opinmind.com 7 OPEN RESEARCH DIRECTIONS 25 to track opinionated words, machine learning algorithms have been applied to labeled text to produce polarity classifiers, etc. But the complex nature of the task requires even more sophisticated approaches (perhaps a combination of known ones). Two major problems remain: • A lot of studies have been done on controlled collections of text like movie or product reviews, but algorithms that work for these collections fail miserably in a more complex setting. Extracting sentiment from more discursive texts such as political commentary (Bansal et al., 2008)(Thomas and B. Pang, 2006) or news articles (Koppel and Shtrimberg, 2004), where general topic is already known in advance, is still difficult. • Sentiment is topic-specific. The meaning of words changes, and they sometimes become reversed with context differences. A phrase “go read the book” would be a positive statement in a book review, but if said in a review about a movie it may suggest that the book is better than the movie, and thus have an opposite effect (Pang and Lee, 2008). General lexicons and algorithms must be adjusted and extended to accommodate each topic and its peculiarities. As mentioned earlier, many Sentiment Analysis approaches are lexicon-based, and many use the well-known “bag of words” text representation that disregards lexical relationships between words. Given the complexities of human language, these techniques can take us only so far. Gamon et al. (2008), for example, show several examples of political discussions involving news article links: • If you liked last term’s Supreme Court, you’re going to love the sequel Negative sentiment towards a state of affair expressed in ironic disguise • Leftard policy at its finest. $100,000 a year and they’re in public housing? [news link] I am shocked the Washington Post would even report it. - Negative sentiment expressed towards a state of affairs as reported in the news link, negative sentiment in ironic disguise towards the news provider • Taking a break from not getting anywhere on Iraq, Congress looks to not get anywhere on domestic issues for a little while. [news link] - Negative sentiment towards a state of affair, news article cited in support Because of the complex nature of sentiments, more sophisticated tools are needed to fully take advantage of semantic information in text. Some work has been done in adopting tools developed for other tasks for Sentiment Analysis. 8 CONCLUSIONS 26 A step above pure statistical analysis is lexical analysis, which includes part-ofspeech tagging and phrase-structure trees. The structure of the text is often represented in the form of POS rules (Popescu and Etzioni, 2005) and tree templates (Kanayama et al., 2004). These are created manually, and may be supplemented by bootstrapping from text (Liu et al., 2005). Yet another step above deep lexical analysis, we can use a knowledge base to get closer to the meaning of the text. Zhou and Chaovalit (2008) and Liu et al. (2003) have used knowledge bases in constructing and supplementing lexicons in the task of sentiment polarity classification. As it is time to turn to more sophisticated tools, it is also the time to turn to more interesting texts. Among untapped emotionally-charged discussions are political commentaries, inter-personal communication, and a wide variety of online discussion forums. Sentiment Analysis can be used to address larger questions on topics like • Hate Speech. How are different kinds of hate speech expressed? How does the “hate” lexicon gets developed? Which documents use this kind of language? • Online Bullying. When does correspondence becomes personal? Which blog posts, emails, or articles use combative or threatening language? • Opinion Tracking. How are opinions spread in discussions? How is a sentiment adopted and changed from author to author? Is there an “opinion drift” (like a “topic drift”)? 8 Conclusions This paper describes the field of Sentiment Analysis and its latest developments. Bringing together researchers from computer science, data mining, text retrieval, and computational linguistics, this field provides ample opportunities for both quantitative and qualitative work. Tackling the blurry definition of sentiment and the complexity of its manifestation in text, it opens doors for novel uses of techniques already developed for data mining and text analysis and brings up new questions, prompting development of yet better tools. Internet provides us with an unlimited source of the most diverse and opinionated text, and as of yet only a small part of the existing domains have been explored. Much work has been done on product reviews – short documents that have a well-defined topic. More general writing, such as blog posts and web pages, have recently been receiving more attention. Still, the field is struggling 8 CONCLUSIONS 27 with more complex texts like sophisticated political discussions and formal writings. Future work in expanding existing techniques to handle more linguistic and semantic patterns will surely be an attractive opportunity for researchers and business people alike. A A SELECTION OF LISTS OF “FUNDAMENTAL” OR “BASIC” EMOTIONS28 A A selection of lists of “fundamental” or “basic” emotions Source: The Cognitive Structure of Emotions by Ortony, Clore, and Collins. 1988. REFERENCES 29 References Agrawal, Rajagopalan, Srikant, and Xu (2003). Mining newsgroups using network arising from social behavior. Twelfth international World Wide Web Conference. Airoldi, E. M., Bai, X., and Padman, R. (2006). Markov blankets and meta-heuristic search: Sentiment extraction from unstructured text. Lecture Notes in Computer Science, 3932:167–187. Annett, M. and Kondrak, G. (2008). A comparison of sentiment analysis techniques: Polarizing movie blogs. Advances in Artificial Intelligence, 5032:25–35. Bai, X., Padman, R., and Airoldi, E. (2005). On learning parsimonious models for extracting consumer opinions. Proceedings of the Hawaii International Conference on System Sciences. Bansal, M., Cardi, C., and Lee, L. (2008). The power of negative thinking: Exploring label disagreement in the min-cut classification framework. Proceedings of the International Conference in Computational Linguistics (COLING). Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D., and Subrahmanian, V. (2007). Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In Proceedings of the Internation Conference in Weblogs and Social Media (ICWSM). Blitzer, J., Dredze, M., and Pereira, F. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 440–447. Chesley, P., Vincent, B., Xu, L., and Srihari, R. K. (2006). Using verbs and adjectives to automatically classify blog sentiment. Proceedings of the AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs. Church, K. W. and Hanks, P. (1989). Word association norms, mutual information and lexicography. Proceedings of the 27th Annual Conference of the Association of Computational Linguists, pages 76–83. Das, S. and Chen, M. (2001a). Yahoo! for amazon: Extracting market sentiment from stock message boards. Proceedings of the Asia Pacific Finance Association Annual Conference (APFA). Das, S. and Chen, M. (2001b). Yahoo! for amazon: Sentiment parsing from small talk on the web. Proceedings of the 8th annual Conference of the Asia Pacific Finance Association (APFA). Dave, K., Lawrence, S., and Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Proceedings of the World Wide Web Conference. REFERENCES 30 Ekman, P. (1993). Facial expression of emotion. American Psychologist, (48):384–392. Esuli, A. and Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. Proceedings of the 5th Conference on Language Resources and Evaluation (LREC). Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D., and Yates, A. (2005). Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91–134. Ferguson, P., O’Hare, N., Davy, M., Bermingham, A., Tattersall, S., Sheridan, P., Gurrin, C., and Smeaton, A. F. (2009). Exploring the use of paragraph-level annotations for sentiment analysis in financial blogs. 1st Workshop on Opinion Mining and Sentiment Analysis (WOMSA). Gamon, M. (2004). Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis. Proceedings of the International Conference on Computational Linguistics (COLING). Gamon, M., Aue, A., Corston-Oliver, S., and Ringger, E. (2005). Pulse: Mining customer opinions from free text. Proceedings of the 6th International Symposium on Intelligent Data Analysis. Gamon, M., Basu, S., Belenko, D., Fisher, D., Hurst, M., and Konig, A. C. (2008). Blews: Using blogs to provide context for news articles. Proceedings of the International Conference in Weblogs and Social Media. Gildea, D. and Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Liguist, 28(3):245–288. Godbole, N., Srinivasaiah, M., and Skiena, S. (2007). Large-scale sentiment analysis for news and blogs. Proceedings of the International Conference in Weblogs and Social Media. Grimes, S. (2008). Sentiment analysis: A focus on applications. Hatzivassiloglou, V. and McKeown, K. R. (1997). Predicting semantic orientation of adjectives. Proceedings of the 8th Conference of the European Chapter of the Association for Computational Linguistics. Hatzivassiloglou, V. and Wiebe, J. (2000). Effects of adjective orientation and gradability on sentence subjectivity. In Proceedings of the International Conference on Computational Linguistics (COLING). Hiroshi, K., Tetsuya, N., and Hideo, W. (2004). Deeper sentiment analysis using machine tranlation technology. Proceedings of the International Conference on Computational Linguistics (COLING). REFERENCES 31 Hou, F. and Li, G.-H. (2008). Mining chinese comparative sentences by semantic role labeling. Proceedings of the Seventh International Conference on Machine Learning and Cybernetics. Hu, M. and Liu, B. (2005). Mining and summarizing customer reviews. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Jindal, L. and Liu, B. (2006). Identifying comparative sentences in text documents. Proceedings of the 29th annual international ACM SIGIR conference on Research and Development in Information Retrieval. Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11–21. Kaji, N. and Kitsuregawa, M. (2007). Building lexicon for sentiment analysis from massive collection of html documents building lexicon for sentiment analysis from massive collection of html documents. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Kanayama, H., Nisukawa, T., and Watanabe, H. (2004). Deeper sentiment analysis using machine translation technology. Proceedings of the International Conference on Computational Linguistics. Kennedy, A. and Inkpen, D. (2006). Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence, 22:110–125. Kim, S.-M. and Hovy, E. (2004). Determining the sentiment of opinions. Proceedings of the 20th International Conference on Computational Linguistics. Kim, S.-M. and Hovy, E. (2006). Automatic identification of pro and con reasons in online reviews. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 483–490. Association for Computational Linguistics. Kim, S.-M. and Hovy, E. (2007). Crystal: Analyzing prediction opinions on the web. Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Koppel, M. and Shtrimberg, I. (2004). Good news or bad news? let the market decide. Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications, pages 86–88. Kudo, T. and Matsumoto, Y. (2004). A boosting algorithm for classification of semistructured text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). REFERENCES 32 Landauer, T. K. and Dumais, S. T. (1997). A solution to plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychology Review. Liu, B. (2006). Web Data Mining, chapter Opinion Mining. Springer. Liu, B., Hsu, W., and Ma, Y. (1998). Integrating classification and association rule mining. Proceedings of the Conference on Knowledge Discovery and Data Mining. Liu, B., Hu, M., and Cheng, J. (2005). Opinion observer: analyzing and comparing opinions on the web. Proceedings of the Internation Conference on World Wide Web. Liu, H., Lieberman, H., and Selker, T. (2003). A model of textual affect sensing using realworld knowledge. Proceedings of the Seventh International Conference on Intelligent User Interfaces, pages 125–132. Macdonald, C., Ounis, I., and Soboroff, I. (2007). Overview of the trec-2007 blog track. Proceedings of the Sixteenth Text REtrieval Conference (TREC 2007). Matsumoto, S., Takamura, H., and Okumara, M. (2005). Sentiment classification using word sub-sequences and dependency sub-trees. Proceedings of PAKDD’05, the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Melville, P., Gryc, W., and Lawrence, R. D. (2009). Sentiment analysis of blogs by combining lexical knowledge with text classification. Proceedings of the Conference on Knowledge Discovery and Data Mining 2009. Morinaga, S., Yamanishi, K., Tateishi, K., and Fukushima, T. (2002). Mining product reputations on the web. Proceedings of the 8th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining. Mullen, T. and Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 412–418. Na, J.-C., Sui, H., Khoo, C., Chan, S., and Zhou, Y. (2004). Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. Conference of the International Society of Knowledge Organization (ISKO), pages 49–54. Nigam, K. and Hurst, M. (2004). Towards a robust metric of opinion. The AAAI Spring Symposium on Exploring Attitude and Affect in Text. Nigam, K., Lafferty, J., and McCallum, A. (1999). Using maximum entropy for text classification. Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61–67. REFERENCES 33 Ortony, A., Clore, G., and Collins, A. (1988). The Cognitive Structure of Emotions. Cambridge University Press. Pang, B. and Lee, L. (2002). Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 10:79–86. Pang, B. and Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. Proceedings of the annual meeting for the Association of Computational Linguists. Pang, B. and Lee, L. (2008). Opinion mining and sentiment analysis. Foundation and Trends in Information Retrieval, 2(1-2):1–135. Popescu, A.-M. and Etzioni, O. (2005). Extracting product features and opinions from reviews. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Prinz, J. (2004). Gut Reactions: A Perceptual Theory of Emotion. Oxford University Press. R. Quirk, S. Greenbaum, G. L. and Svartvik, J. (1985). A comprehensive grammar of the English language. Longman. Snyder, B. and Barzilay, R. (2007). Multiple aspect ranking using the good grief algorithm. Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL), pages 300–307. Strapparava, C. and Vlitutti, A. (2004). Wordnet-affect: and affective extension of wordnet. In Proceedings of the 4th International Conference on Language Resources and Evaluation. Subasic, P. and Huettner, A. (2001). Affect analysis of text using fuzzy semantic typing. IEEE-FS, (9):483–496. Tan, S., Cheng, Z., Wang, Y., and Xu, H. (2009). Adapting naive bayes to domain adaptation for sentiment analysis. Advances in Information Retrieval, 5478:337–349. Thomas, M. and B. Pang, L. L. (2006). Get out the vote: Determining support or opposition from congressional floor-debate transcripts. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 327–335. Turney, P. (2002). Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. Proceedings of the Association for Computational Linguistics (ACL), pages 417–424. REFERENCES 34 Turney, P. D. and Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4):315–346. Voorhees, E. M. and Buckland, L. P. (2007). Uic at trec 2007 blog track. Proceedings of the Sixteenth Text REtrieval Conference (TREC 2007). Whitelaw, C., Garg, N., and Argamon, S. (2005). Using appraisal groups for sentiment analysis. Proceedings of the ACM SIGIR Conference on Information and Knowledge Management (CIKM), pages 625–631. Wiebe, J., Bruce, R., and O’Hara, T. (1999). Development and use of a gold standard data set for subjectivity classifications. Proceedings of the 37th Annual Meeting of the Association for Computational Linguists (ACL-99), pages 246–253. Wiebe, J. M. (1994). Tracking point of view in narrative. Computational Linguistics, 20:233–287. Wiebe, J. M., Wilson, T., Bruce, R., Bell, M., and Martin, M. (2004). Learning subjective language. Computational Linguistics, 30:277–308. Wilks, Y. and Stevenson, M. (1998). The grammar of sense: Using part-of-speech tags as a fest step in semantic disambiguation. Journal of Natural Language Engineering, 4:135–144. Wilson, T., Wiebe, J., and Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. Proceedings of HLT-EMNLP. Wilson, T., Wiebe, J., and Hoffmann, P. (2009). Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Computational Linguistics, 35(5):399–433. Yi, J., Nasukawa, T., Bunescu, R., and Niblack, W. (2003). Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. Proceedings of the 3rd IEEE International Conference on Data Mining. Yu, H. and Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Zhou, L. and Chaovalit, P. (2008). Ontology-supported polarity mining. Journal of the American Society for Information Science and Technology, 69:98–110.