Customer E-Mail Categorization and Topic Modeling

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Customer E-mail Categorization and Topic Modeling

Eshita Gangwar 1, Rutvi Sutaria2
Abstract:- Customer service centers are most crucial part and dislikes of customers about the company. It helps
of any company. They represent the company and companies evaluate the social sentiment of their
communicate with the customers on its behalf. These brand.Sentiment Analysis is crucial and should be
centers also impart valuable information through treatedseriously as customer feedback contains a plethora of
customer feedback. As they form the bridge between the useful information which if properly harnesses can help
company and its customers, it is important they convey companies achieve next to impossible goals. But, just
information timely and effectively. Emails are the most knowing what customers are talking about it is not enough.
popular means of business communication. In order to Companies must also understand how they feel. Sentiment
efficiently and effectively utilize time, the customer analysis is one way to discover these feelings. The customer
service centers need to extract the relevant information service center employees can thus determine the sentiment of
from these emails. Then they need organizing it and the email and prioritize or respond appropriately.
promptly respond to the customers accordingly. In this
paper, we propose categorizing emails based on the Massive amounts of data are collected on daily basis in
customer reviews. The polarity (positive or negative) of so many companies. The huge amount of information makes
the customer emails is determined along with its it difficult to understand the data or to find what we are
probability and also determine the topic of the email. searching. We have to employ different methods to organize
After preliminary data pre-processing, we use the Naive- the data systematically and extract relevant information from
Bayes algorithm based approach to classify the emails it.Topic modeling is one such powerful technique which
and then the topic modeling is performed. This way, the allows us to not only organize but also summarize large
project helps to classify emails and determine their amount of textual data. It is helpful for determining different
subjects to save valuable time for the employees. topical patterns present in the dataset which otherwise remain
invisible.Topic modelling is a method for finding subjects
Keywords:- Emails, Customer Care, Categorization, Topic (i.e. topics) from a data that best describe the content present
Modeling. in the document. There are many methods which can be used
to obtain topic models. For our project, we will be using Bag
I. INTRODUCTION of Words technique to find the topics of the emails at the
customer service centers. This will help the employees to
Emails are convenient and the most popular means of summarize the content of the emails.
communication for customers. They help in avoiding long
waiting hours for telephonic conversation as well as in Today, the algorithm-based sentiment analysis tools can
preserving the record of the communication that is happening handle huge volumes of customer feedback consistently and
between the customer service center and the customer. It is accurately. Paired with topic modeling, sentiment analysis
crucial that these emails be properly organized at the reveals the customers opinion about topics regarding
customer service so as to spontaneously and appropriately different products and services. In our research, we provide
reply. Also, if properly utilized, these emails can yield with various features for extraction for Naive Bayes.
invaluable data for the companies to access the needs and
demands of the customers.As technology advances, the Algorithm based classifier for the project and the
customers have more interactive and dynamic relation with subject of the email through topic modeling. This
the company. The amount of the emails received by classification helps the customer service employees to
customers at the customer service center is increasing prioritize emails. It also assists them to assess the customer
rapidly. Handling this enormous amount of email manually responses to particular products and formulate market
can be a very time-consuming and complex task. The strategies in accordance. Furthermore, the customers inputs
problem can be solved if emails are automatically classified are useful as they inadvertently provide the businesses with
on the basis of their content. insightful information regarding the current trends and
requires of customers.
Sentiment Analysis, also known as opinion mining
sometimes, is a method to assess written or spoken language
to determine whether the expression is a positive, negative,or
neutral, and to what extent. It is also the most common
classification tool for textual analysis.Sentiment analysis is
important because it helps businesses understand the likes
IJISRT18DC224 www.ijisrt.com 706

ISSN No:-2456-2165
II. PREVIOUS WORK collected dataset. At last, they also provide insights into
future work to be conducted on sentiment analysis.They
A. Polarity Categorization on Product Reviews initially propose an algorithm. This algorithm is executed
The focus of the research in this paper was to tackle the later for identifying negation phrases. This is followed by
problem of sentiment polarity categorization. The dataset is implementation of mathematical approach which helps in
collected from amazon.com. The dataset contains 376 computing the sentiment score on the product reviews from
instances of reviews of Nokia mobile in the form of a text the dataset.Later, a method for creating a feature vector is
file. Two classification algorithms namely Nave Bayes and shown for categorization of sentiment polarity.After this
Support Vector Machine Algorithms are taken to classify the initial implementation of algorithm, two experiments are
reviews as positive, negative or neutral. conducted to categorize sentiment polarity. They are based
on two levels: sentence level and review level. Performance
B. Approaches, Tools and Applications for Sentiment of all the three different classifiers are compared to one
Analysis Implementation another and evaluated based on their corresponding
In this paper,Andrea et al. (2015) describe the different experimental outputs.The three different classifiers used in
approaches and tools available for Sentiment Analysis.Their this paper for their research are Naive Bayesian classifier, the
elaborate on different approaches that can be used for Random Forest classifier and the Supervised Vector Machine
analyzing sentiment along with the different types of features classifier.The experiments are conducted on these classifier
and techniques associated with these and also the and their results are compared with one another to understand
corresponding advantages and disadvantages associated with the classification better.
these different types of sentiment analyzing approaches.They
also give an overview of different tools used for Sentiment D. Emotion Detection in Email Customer Care
Analysis over a period of time with respect to the different The paper shows how to extract salient features and
methods which can be used. Furthermore, they describe identify emotion in emails at customer service centers. these
different fields where sentiment analysis can be applied. features show customer anger, dissatisfaction with the
Some of these areas are politics, finance, public actions and business, and warnings like take legal action, report to higher
business and many other fields as well.The sentiments can be authorities or to leave.
classified as positive or negative depending upon the content
of the document. In their research work, the authors of this E. Using Text Mining for Automated Customer Inquiry
paper define three different levels of classification of Classification
sentiment. They suggest that the sentiments in a text can be This paper has illustrated the use of customer inquiry
classified as the document level, the sentence level and the classification and intention analysis on customer inquiries,
aspect level.The paper also explains the different approaches primarily in the form of unstructured multi-lingual text data.
as one of the major task of the project, like Machine learning Inquiry classification helps isolate inquiries related to the
approach, lexicon based approach and a hybrid of these both. product. Intention analysis isolates
A detailed study is done on these and their advantages and sales/procurement/licensing related queries within these. The
limitations are discussed. Also different features and paper has demonstrated an automated way of such analysis
techniques related to each of the approach is described in this resulting in significant savings of manual efforts, without
work.It also describes in detail the tools used for analysis, as compromising on the accuracy of the analysis.
the other major part of their research work, such as
EMOTICONS, LIWC, SentiStrength, SentiWordNet, III. APPROACH
SenticNet and Happiness Index The paper basically classified
Sentiment Analysis depending upon different approaches and In our work, the email can be viewed as a set of
tools. sentences. The purpose of our project is to help find the topic
of the email and also determine the polarity of the email.For
C. Sentiment analysis using product review data our project, we are using a dataset from customer service
The aim of this paper is to tackle one of the centers for four different products. The emails contain
fundamental problems of sentiment analysis.Fang and Zhan customer feedback on an MP3 player, a camera, a mobile
(2015) aim to find a solution for one of fundamental phone and a router.The first procedure is of topic
problems of sentiment polarity categorization.A process of determination. In this module, the email dataset isconverted
sentiment polarity categorization is outlined in this paper into a form which is easy to understand for finding the
along with which a detailed descriptions of the different probable topic. Thus data pre-processing is performed.Firstly,
processes involved at different stages is also explained.They we will employ the technique known as the tokenization. The
use a dataset in this study which contains product reviews technique we are using in our work is called the word
available from Amazon.com. This dataset is available online tokenizer. All the tokens that have been extracted may not be
and is provided by the company itself to be used in general useful for our classification. Therefore the unnecessary
for research.For their work, they conduct experiments for words,known as the stop words, are eliminated using the
both sentence-level and review-level categorization on the technique known as stop words removal.The remaining

ISSN No:-2456-2165
tokens are subjected to the process of stemming. Stemming B. Bag of Word
attempts to reduce a word to its root form. Therefore, the Topic Modeling is a Machine Learning technique used
document is then represented in the form of root words rather in NLP Natural Language Processing(NLP) where the
than the original words. Other method called Part of speech ”topic” or the subject of a set of documents is determined.
tagging is then applied. POS tagging associates a word in the BOW is a type of topic model that uses frequency to show
test to the corresponding part of speech. The topic is then why certain parts of data occur repeatedly. In BOW each
determined using Bag of Words technique.Remaining tokens email document is assumed to be composed of a set of
after pre-processing can express a negative or positive different words. The topics are determined from the content
opinion of the customers. For example, upset represent of the emails in the corpus. These words are used as features
negative emotion while satisfaction demonstrates positive which are then used for training. These topics then produce
opinion.This is then fed into the Naive Bayes classifier in the the subjects. BOW identifies topics based on occurrence of
Sentiment Analysis module.Naive Bayes classifier is trained different terms which are automatically determined. It
using a feature set which then determines the polarity of basically gives different topics that would represent the
emails from the incoming preprocessed datasets as either document.
positive or negative.
V. DATASET
The dataset contains around 200 emails. These emails

are comprised of four different datasets.The first dataset
contains emails which contain customers review for an MP3
player.The second dataset belongs to a camera manufacturing
company. The dataset contains reviews regarding a particular
camera. The third dataset consists of emails collecte from the
customer service center of a mobile company for a particular
mobile phone.The final dataset contains reviews for a router.
VI. EXPERIMENT
A. Sentiment Module
Fig 1:- Overview of the process This module helps to determine the polarity of the
incoming customer emails with the help of Naive Bayes
IV. ALGORITHM classifier.Firstly, we need to find a method to train our
Naive-Bayes classifier to classify our mails as positive or
A. Naive Bayes negative. For this purpose, we use the words in the positive
Naive Bayes algorithm is one of the Bayesian theorems and negative reviews as their features. The classifier, after
and a supervised machine learning technique. It can be used training, has thus learned to associate the words from positive
for binary as well as multiclass classification problems. It is reviews as features for positive sentiment and similarly those
named Naive because it simplifies the probabilities for all from negative reviews as features for negative sentiment.The
hypothesis to make their corresponding calculation easy. The feature set, which contains the features extracted after
algorithm works by finding out the probability of the training the classifier, is then loaded onto the module.The
different attributes of data being associated with the certain Naive Bayes algorithm based classifier is then used to
class. Naive Bayes classifier works under the assumption that determine the sentiment of the emails based on the content
the presence of one feature in a class is independent of the from the customers.The function sentiment() is then defined
presence of other features. The classier selects the most likely where the new text is finally classified.
classification for a given set of the attribute values.
B. Topic Module
This module contains two distinct functions; clean()
and topic().In the clean function, the data is pre-processed so
that only relevant words are retained for topic determination.
Pre-processing involves processes such as tokenization stop-
 P(c—x) is the posterior probability of class(target) given words removal, stemming and POS tagging. After the
predictor(attribute). cleaning of the dataset, it is now ready for determination of
• P(c) is the prior probability of class. topics and is thus sent to the topic function The topic
• P(x—c) is the likelihood which is the probability of function, which uses the Bag of Words approach, is then used
predictor given class. to choose the most appropriate topic for the corresponding
• P(x) is the prior probability of predictor. emails.

ISSN No:-2456-2165
VII. RESULT overview of the content of the incoming customer emails and
organize them according to their needs and/or interests. It
In our project we analyzed on four different datasets (1. also helps to reduce their corresponding response time and to
MP3 Player, 2. Camera, 3. Mobile phone, 4. Router). The analyse the customer feedback on a product or its parts.
analysis module 1 displayed the polarity of the emails and the
corresponding suggested topics by the Bag of Words (BOW) FUTURE WORKS
technique. It also showed individual pie-charts for each of the
datasets. In most charts, the amount of positive customer The work can be extended by adding some functionality
review is lesser than the negative customer reviews. The for the user, such as an interface, so the complaints can be
topics show a trend with certain topics occurring repeatedly arranged according to the priority. The paper can also help in
as compared to others.The topics show a trend with certain further categorization depending on the departments within
topics occurring repeatedly as compared to others. The the companies. The emails at the customer service centers
similar topics are mostly associated with the same sentiment. can be handled automatically. For this, templates can be
(For example,in dataset 3, the topics battery and charger created which will help to respond to distinctive types of
occur repeatedly and all are associated with negative emails appropriately. Also, in future more data analysis can
sentiment.) be performed on the data derived from customer service
centers as they can serve as data sources to analyze customer
needs and future trends.
REFERENCES
[1]. Alghamdi, R. and Alfalqi, K. (2015). A survey of topic

modeling in text mining. International Journal of
Advanced Computer Science and Applications, 6,
147153.
[2]. Andrea, A., Ferri, F., Grifoni, P., and Guzzo, T. (2015).
Approaches, tools and applications for sentiment
analysis implementation. International Journal of
Computer Applications, 125.
[3]. Fang, X. and Zhan, J. (2015). Approaches, tools and
applications for sentiment analysis implementation.
Journal of Big Data.
[4]. Gupta, N., Gilbert, M., Fabbrizio, G. D., Wu, Z., and
Fig 2:- Comparison graph of all Dataset Serker, N. H. M. K. (2009). Emotion detection in email
customer care. Proceedings of the NAACL HLT 2010
Workshop on Computational Approaches to Analysis
and Generation of Emotion in Text, 1016.
[5]. Jetley, R. P., Gugaliya, J. K., and Javed, S. (2015). Using
text mining for automated customer inquiry
classification. Journal of Vibration and Acoustics, 122,
Table 1:- Polarity for each Dataset 4651.
[6]. Joty, S., Carenini, G., Murray, G., and Ng, R. (2009).
VIII. CONCLUSIONS Finding topics in emails: Is lda enough?. NIPS-2009
workshop on applications for topic models: text and
The research paper focuses on quick and efficient email beyond.
classification at customer service centers. The emails are [7]. Mixymol, V. (2017). Polarity categorization on product
classified as positive or negative with the use of Naive-Bayes reviews. SInternational Journal on Recent and
algorithm based classifier. Also, the subject of the email is Innovation Trends in Computing and Communication, 5,
determined through topic modeling using LDA technique. 2831.
Ultimately, the outcome will display the polarity of the email [8]. Subramanian, S. K. and Ramaraj, N. (2007). Automated
with the probability and suggested topics with their classification of customer emails via association rule
probability count. The topics show a trend with certain topics mining. Information Technology Journal, 6, 567572.
occurring repeatedly as compared to others. The similar [9]. Xie, P. and P.Xing, E. (2013). Integrating document
topics are mostly associated with the same sentiment. (For clustering and topic modeling. Twenty-Ninth Conference
example, in dataset 1, the topics volume and sound occur on Uncertainty in Artificial Intelligence, 29.
repeatedly and all are associated with negative sentiment.)
Our work will help the customer service employees to get an

Customer E-Mail Categorization and Topic Modeling

Uploaded by

Document Information

Original Title

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Customer E-Mail Categorization and Topic Modeling

Uploaded by

Copyright:

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology

Customer E-mail Categorization and Topic Modeling

IJISRT18DC224 www.ijisrt.com 706

IJISRT18DC224 www.ijisrt.com 707

The dataset contains around 200 emails. These emails

IJISRT18DC224 www.ijisrt.com 708

[1]. Alghamdi, R. and Alfalqi, K. (2015). A survey of topic

IJISRT18DC224 www.ijisrt.com 709

You might also like