SlideShare a Scribd company logo
1 of 4
Blog Comments Organizer
                                An Interface for Organizing News Comments
                                    Sweta Vajjhala, Nicholas Diakopoulous, Irfan Essa
                                       Georgia Institute of Technology | College of Computing
                                                801 Atlantic Drive, Atlanta, GA 30332
                          sweta@gatech.edu, nad@cc.gatech.edu, irfan@cc.gatech.edu

ABSTRACT                                                                     Although there has been some research on organization of media
This paper focuses on organization of comments on a particular               articles, little has been done to organize readers’ comments on
blog post. The research that was done was the first of its kind.             these articles. This project focuses on that new aspect of
Background research was done with the field of computational                 computational journalism with the creation of a blog comments
journalism and its relation to the blogosphere, in additional to             organizer.
research into categorization of blog posts. Several design ideas
were then considered for ways to organize blog comments. The                 2. BACKGROUND
deciding factor was whether or not quotes from the post were                 There can be many different ways to organize an article’s
used in the comment. There was a specific algorithm that was                 comments. Today, the Internet has become the largest medium in
used to figure this out, and then, the design was applied to the             the world for reading about news and interactively discussing it.
actual blog post itself. Results indicate that this would be a               Not only has the number of readers increased, but the number of
successful application for all news blogs, should it be applied to           blogs overall, especially news ones, has drastically increased [1].
the websites accordingly.                                                    Baumer et al. state that readers now have the mentality of: “I
                                                                             know what’s there and I know where to find it when I need it.”
Categories and Subject Descriptors                                           With this mannerism, readers are able to read about any type of
H.5.2 [User Interfaces]: Graphical user interfaces, H.5.3 [Group             news article that they wish. With news blogs become increasingly
and Organization Interfaces] Collaborative computing                         popular, readers are slowly taking on the role of contributors, as
                                                                             well, by posting comments to their favorite blogs.
General Terms                                                                With the variety of different news articles and comments that are
Design, Human Factors
                                                                             posted, blogging has become a multi-faceted and heterogeneous
                                                                             activity. Articles in news blogs today are often organized by into
Keywords                                                                     different categories. In addition to this, people can add their own
design, computational journalism, blog, news, articles                       tags, which are collections of keywords attached to blog entries
                                                                             that help describe what the entries are about [2].
1. INTRODUCTION                                                              Brooks & Montanez analyzed the effectiveness of tags for
There have been many different advances in technology that have              classifying blog entries. Their results indicate that tags are using
helped organize information that is on the Internet. One of these            for grouping articles into broad categories, but less effective in
fields, called computational journalism, is specifically tailored to         indicating the particular content of an article. However, the idea
finding new ways to organize media information via technical                 of sharing tags could potentially be applied to help organize the
advancements.                                                                comments, based on the text of each comment. There are three
                                                                             main uses of tagging: annotating information for personal use,
Since the emergence of Web 2.0, interactive media has become
                                                                             placing information into broadly defined categories, and
very popular. Not only has it allowed for sharing information
                                                                             annotating particular articles so as to describe their content [2].
across the world, but it has created an environment that
                                                                             Each of these uses could also be applied to the comments on the
encourages collaboration among media articles. These
                                                                             blog post.
collaborations have formed millions of communities on the
Internet. News articles, in the form of blogs, have become very              One problem that comes with tags is trying to identify appropriate
popular, allowing readers to become contributors [1] and express             tags, while eliminating noise and spam [3]. Another problem is
their opinions.                                                              several different tags might be used to all describe the same
                                                                             concept, so this duplicity also creates extra clutter [2]. A similar
                                                                             problem needs to be addressed in the organization of blog
 Permission to make digital or hard copies of all or part of this work for   comments- which comments are useful to readers and which ones
 personal or classroom use is granted without fee provided that copies are   are spam or irrelevant to the topic of the post? One solution is to
 not made or distributed for profit or commercial advantage and that         automatically generate content-based tags, while also considering
 copies bear this notice and the full citation on the first page. To copy    when the tag was originally created. For comments, their
 otherwise, or republish, to post on servers or to redistribute to lists,    organization could be based on chronological order, with the most
 requires prior specific permission and/or a fee.                            recent comments showing up first and the oldest ones showing up
                                                                             last.
After finding a way to organize the blog comments, the last thing        comments organizer will match that of the Dot Earth page, so the
to do is to find a way to collect and organize the blog articles and     integration of the application will seem transparent to the user.
its comments. The online public nature of blogs provides
incredible resources for data mining. Kramer and Rodden state
that, after collecting a variety of blogs, they used clustering to
group the blogs into categories based on five different factors:
melancholy, social, ranting, metaphysical, and work. They found
that blog articles are difficult to group into categories, because the
blogging community is so heterogeneous. So, each blog does not
cleanly fit into any single category [4]. Comments on blogs are
also comparable to this- since there can be lots of different
discussions happening with comments, it could be very difficult
to place the comments into one category objectively.
                                                                         Figure 1. Sketch of the blog comments organizer design. By
In the following sections, the design, algorithm, and evaluation of      scrolling over the yellow highlighted text, the box at the top
the system will be presented, concluding with a discussion of the        will show up. If the user is not moused over the highlighted
results and future work.                                                 text anymore, then the box will disappear.
                                                                         The rationale for this design choice is supported by the fact that
3. BLOG COMMENTS ORGANIZER                                               the data mining yielded that quotes were very often used in the
3.1 Data Mining                                                          comments of the Dot Earth blog. The blog comments organizer
The data that was used to implement the blog comments organizer          would be a great tool for new readers to quickly get acquainted
was pulled from Dot Earth, an environmental blog written by              with the traditional posting style of contributors to the Dot Earth
Andrew Revkin of The New York Times newspaper. On average,               blog. Moreover, the blog comments organizer offers a way for
each of his articles tends to generate over 80 comments. Because         readers to find out more information on a specific part of the
of the vast popularity of the blog and the variety of comments,          article without having to read all of the 100+ comments. It
data from this blog was used in the testing of the blog comments         provides the reader with the advantage of being able to only read
organizer.                                                               the comments that he/she is interested in, based on the parts of the
                                                                         article that the reader liked.
Five articles were randomly chosen to undergo an analysis- by
hand. During this time, information and statistics about the set of      3.3 Data Collection
comments corresponding to each article were collected. The               In order to collect the data from Dot Earth, a blog scraper script
information that was collected included the number of comments           was written in the language of PHP5. The scraper script gets the
for each of the following: comments that were multiple                   60 most recent articles in the Dot Earth blog and places them into
paragraphs long, comments that used quotes from the article              a MySQL database. For each article, the scraper also gets all of
within them, comments that used statistics (or some other                the comments and places those in the database too. The schema
numbers) to support their point-of-view, comments that                   for the database is as follows- the article is linked to each of its
referenced other related articles, comments that were a response         comments using the field articleID.
to a previous comment, comments that used the same key words
(i.e. “history” or “future” or “evolution”), and finally, the number
of posts per day.
Out of the data that was collected above, the number that seemed
to yield the highest value was the number of comments that used
quotes from the article within them. As a result of this, it was
decided that the most optimal way to organize the comments for
this blog would be to show users a list of comments for each part        Figure 2. Schema for the database that stores all of the articles
of the article that was used in a quote.                                 and comments.

3.2 Design
The design for the blog comments organizer was done first with
                                                                         3.3.1 Algorithm for Gathering Data
                                                                         The algorithm for gathering all of the articles and respective
some sketches. It was then implemented using PHP, HTML,
                                                                         comments is given here.
JavaScript, and Greasemonkey.
The blog comments organizer can be easily integrated into the            First connect to the Dot Earth homepage and get its HTML
Dot Earth page. For each article, it highlights the parts of the         source. Inside the source, look for the title of each news article
article that are quoted in a comment. When a user then scrolls           based on the corresponding HTML tags. For each of the articles,
over the highlighted part of the article, the comment(s) that            look for the corresponding HTML tags for the comments. Read
reference(s) it will show up at the top of the page in reverse           the text between all of the open and close HTML tags for each
chronological order, so that the most recent comment will show           article and its comments. Insert all of this information into a
up first. A sketch of this design can be seen below. When the blog       database with the schema above. In order to get articles across
comments organizer is implemented, the style of the blog                 multiple pages, loop through the same process, after finding the
                                                                         corresponding HTML tags for each page.
Soon after this research was done, an API was introduced for Dot          4. EVALUATION
Earth. In the future, it might be easier to collect all of the data via   The reception of the blog comments organizers to some volunteer
the API. However, this would also mean that the information               testers presented some advantages and disadvantages to the blog
would be stored in an XML file, not in a database, and this could         comments organizer. First and foremost, although the design is
make it harder to find quotes in the comments.                            integrated nicely into this particular blog (Dot Earth), it would
                                                                          require a lot of customization for each blog for which this was
3.4 Finding Quotes in Comments                                            used. This is because each blog will have a different style, and
Once the articles and comments are in the database, the next step         therefore, the scraping will have to be done all over again.
is to go through all of the comments for each given article and see       However, the actual algorithm that is used to find the quotes
if there are quotes from the article in there.                            would still be the same. Displaying the blog comments organizer
                                                                          for each blog would again differ, based on the style of the blog.
First, check the comment all of the opening quote (“) symbols and
                                                                          However, the algorithms for inserting the <div></div> tags would
the closing quote (”) symbols. If this exists, then see if the data
                                                                          still remain the same, once the source code of the other sites were
between the two quotes matches any phrase from the article. Is it
                                                                          figured out.
important to check to make sure that the quotes are not links to
external pages, because these will match quotes to external pages         One disadvantage of this blog comments organizer is that the
in the article. Therefore, this case must be excluded when                algorithm searches for the start and end quote characters.
checking for quotes in the comments. If a quote in the comment            However, a comment might have article from the text in it
matches text from the article, then the starting index of the text in     paraphrased or presented without the quotation marks. If this was
the article should be stored in quote_index_start in the comments         the case, then the presented algorithm would not find this as a
database table. The end of the quote should be stored in                  quote, because it is not located within quotation marks. By
quote_index_end.                                                          allowing for this to happen, there would be more comments for
                                                                          the user to see in the design of the blog comments organizer.
3.4.1 Algorithm for Finding Quotes                                        However, to be able to detect paraphrasing, it would also require
The algorithm for finding quotes from the article within a                changing the fundamental algorithm to use some artificial
comment is below.                                                         intelligence techniques, in addition to what it is already doing,
                                                                          while searching the article text.
          for each article in the database:
            get all comments for that article
                                                                          One major advantage of this design is that the user is given a
                                                                          choice whether he or she wants to read the comments. Since the
             for each comment:
                                                                          comments show up on a mouse-over event, if the user does not
               quote_start_index = 0;                                     want to use the feature after the first time, he will not see all of
               quote_end_index = true;                                    the different comments show up. Moreover, the comments are
               as long as there is an ending quote:                       placed strategically towards the right-hand-side of the page,
                go through text and find the opening quote                where there is whitespace. This way, it does not cover up any
                                                                          possible important information that is on the page. The blog
                     if there is no opening quote, exit loop by
                     setting quote_end to false                           comments organizer acts as a supplement to the reader to make it
                                                                          easier for him to find the comments that he may be looking for,
                     if there is an opening quote:
                                                                          but it does not require the user to use it.
                      search for the ending quote starting from
                      quote_start                                         For example, someone who just wanted to browse the Dot Earth
                      search for the text between the starting and        blog and get an idea of the contributors, they might want to
                      ending quotes in the article                        browse all of the comments, not just the parts that pertain to
                      if the text exists:                                 certain parts of the text of the article. In this case, the user does
                                                                          not have to use the blog comments organizer. However, if this
                       store    the    quote_start_index   and
                       quote_end_index in the database for that           user becomes a frequent visitor and contributor to the Dot Earth
                       comment                                            blog, he may start to look for specific comments which pertain to
                      else:                                               parts of an article that he likes. In this case, the user would find
                                                                          the blog comments organizer an ideal tool to get the information
                        do not store anything and exit
                                                                          that he needs without having to go through hundreds of
                                                                          comments.
Using the algorithm above, quotes from the article were found
and the indices of where they were found were stored in the               5. DISCUSSION
database.                                                                 Because of the variety of the usage of the blog comments
Once the quote indices were known, another script was written to          organizer, there are many different ways that this tool can be
insert <div></div> tags around the quotes in the article text that        useful. Namely, it focuses on the growing field of computational
highlighted that part of the article. JavaScript was then used to         journalism: organization of media. There are many different news
trigger a mouse-over event, so that if the reader put their mouse         sites that would benefit from organization of their reader
over the highlighted part of the article text, the list of comments       comments, and this would be perfect.
that contained that part of the article as a quote would show up in       In the Evaluation section above, there was an example of the user
the right-hand-side, as was shown in Figure 2 above.                      who just wanted to find information about a specific part of the
                                                                          article. This blog comments organizer could be useful for data
analysts in the media profession. Based on a posted news article,       7. CONCLUSION
the author or the company that posted it can find out which parts       It is possible to organize blog comments in a plethora of different
of the article triggered the most comments. Based on this, the          ways. Depending on the medium and the type of blog that is being
company could post more articles that pertain to very similar           used, there could be a number of ways to analyze and organize
topics. This would attract new users, as well as retain the current     the comments in a meaningful way for the users that come by.
users.                                                                  Organization of blog comments will soon become a very powerful
The blog comment organizer could revolutionize the way that             tool that can be used to target the type of users that the blog is
articles are written and read. Based on the popularity of a certain     tailored towards.
part of an article, blogs can be tailored to suit the majority of its   While there are many different ways to organize comments and
readers. This would introduce a new level of specificity for the        using quotes (as in this particular blog comments organizer) is just
blog. If many blogs were to follow this and focused on specific         one, it is important to realize that this growing field could soon re-
topics, it might make blogs easier to categorize and make tags          define the way that media is presented to the world.
more universal.
                                                                        8. ACKNOWLEDGMENTS
6. FUTURE WORK                                                          Many thanks to all volunteer evaluators, especially Sekhar
There are many different applications and related work that could       Vajjhala, Carolina Gomez, Blair Daly, and Nicholas Bowen.
be done based on the blog comments organizer.
First and foremost, a different metric could be used to organize        9. REFERENCES
comments. Right now, only quotes are being used, but blog               [1] Baumer, Eric, Mark Sueyoshi, and Bill Tomlinson.
comments could also be gauged based on the themes of the posts              "Exploring the Role of the Reader in the Activity of
(i.e. history, evolution, etc.) or comments that used statistics.           Blogging." CHI 2008 (2008): 1111-20.
Blogs have become a source for data mining, and if users are
looking for certain quotes or numbers and comments contain              [2] Brooks, Christopher H., and Nancy Montanez. "An Analysis
those statistics, this would be very useful for the user.                   of the Effectiveness of Tagging in Blogs." American
                                                                            Association for Artificial Intelligence (2006).
The blog comments organizer could also be used to analyze
different types of media. Right now, only written blogs are being       [3] Gill, Alastair J., et al. "Emotion Rating from Short Blog
analyzed. However, video blogs are slowly becoming more                     Texts." CHI 2008 (2008): 1121-24.
popular, so being able to find comments that quoted parts of a          [4] Kramer, Adam D.I., and Kerry Rodden. "Word Usage and
video in a blog post would also prove to be very useful.                    Posting Behaviors: Modeling Blogs with Unobtrusive Data
                                                                            Collection Methods." CHI 2008 (2008): 1125-28

More Related Content

What's hot

The Benefits of Buffer Entrepreneurs are Missing Out On
The Benefits of Buffer Entrepreneurs are Missing Out OnThe Benefits of Buffer Entrepreneurs are Missing Out On
The Benefits of Buffer Entrepreneurs are Missing Out OnMellissa Thomas
 
How is Web 2.0 Changing the World?
How is Web 2.0 Changing the World?How is Web 2.0 Changing the World?
How is Web 2.0 Changing the World?Jim Angus
 
Nih Angus Nov 2008
Nih Angus Nov 2008Nih Angus Nov 2008
Nih Angus Nov 2008Jim Angus
 
Modules guide
Modules guideModules guide
Modules guidenetwench
 
How to make buzz?
How to make buzz?How to make buzz?
How to make buzz?CITIZEN ACT
 
Google in the Classroom: Google Groups And Sites Presentation
Google in the Classroom: Google Groups And Sites PresentationGoogle in the Classroom: Google Groups And Sites Presentation
Google in the Classroom: Google Groups And Sites PresentationKristin Dragos
 
2020 Social Introduction To Social Media In India
2020 Social Introduction To Social Media In India2020 Social Introduction To Social Media In India
2020 Social Introduction To Social Media In India2020 Social
 
What the Tweet is an RSS Feed - Intermediate
What the Tweet is an RSS Feed - IntermediateWhat the Tweet is an RSS Feed - Intermediate
What the Tweet is an RSS Feed - IntermediateMelanie Parlette-Stewart
 
Nurun google+ overview
Nurun google+ overviewNurun google+ overview
Nurun google+ overviewNurun
 
2020 Social Workshop on Social Media for Non-Pofits
2020 Social Workshop on Social Media for Non-Pofits2020 Social Workshop on Social Media for Non-Pofits
2020 Social Workshop on Social Media for Non-Pofits2020 Social
 
Web 20-library-20-part-one-7907
Web 20-library-20-part-one-7907Web 20-library-20-part-one-7907
Web 20-library-20-part-one-7907Vrij Kishor Mishra
 
Using Google Plus Communities in the Classroom
Using Google Plus Communities in the ClassroomUsing Google Plus Communities in the Classroom
Using Google Plus Communities in the ClassroomMax Power
 
Impact Bc Community Website V004
Impact Bc Community Website V004Impact Bc Community Website V004
Impact Bc Community Website V004Julian Barabas
 
Social Media Basics
Social Media BasicsSocial Media Basics
Social Media BasicsJared Riley
 
These article
These articleThese article
These articleLucy Moy
 

What's hot (17)

The Benefits of Buffer Entrepreneurs are Missing Out On
The Benefits of Buffer Entrepreneurs are Missing Out OnThe Benefits of Buffer Entrepreneurs are Missing Out On
The Benefits of Buffer Entrepreneurs are Missing Out On
 
How is Web 2.0 Changing the World?
How is Web 2.0 Changing the World?How is Web 2.0 Changing the World?
How is Web 2.0 Changing the World?
 
Nih Angus Nov 2008
Nih Angus Nov 2008Nih Angus Nov 2008
Nih Angus Nov 2008
 
Modules guide
Modules guideModules guide
Modules guide
 
How to make buzz?
How to make buzz?How to make buzz?
How to make buzz?
 
Google in the Classroom: Google Groups And Sites Presentation
Google in the Classroom: Google Groups And Sites PresentationGoogle in the Classroom: Google Groups And Sites Presentation
Google in the Classroom: Google Groups And Sites Presentation
 
2020 Social Introduction To Social Media In India
2020 Social Introduction To Social Media In India2020 Social Introduction To Social Media In India
2020 Social Introduction To Social Media In India
 
What the Tweet is an RSS Feed - Intermediate
What the Tweet is an RSS Feed - IntermediateWhat the Tweet is an RSS Feed - Intermediate
What the Tweet is an RSS Feed - Intermediate
 
Nurun google+ overview
Nurun google+ overviewNurun google+ overview
Nurun google+ overview
 
2020 Social Workshop on Social Media for Non-Pofits
2020 Social Workshop on Social Media for Non-Pofits2020 Social Workshop on Social Media for Non-Pofits
2020 Social Workshop on Social Media for Non-Pofits
 
Web 20-library-20-part-one-7907
Web 20-library-20-part-one-7907Web 20-library-20-part-one-7907
Web 20-library-20-part-one-7907
 
Using Google Plus Communities in the Classroom
Using Google Plus Communities in the ClassroomUsing Google Plus Communities in the Classroom
Using Google Plus Communities in the Classroom
 
Impact Bc Community Website V004
Impact Bc Community Website V004Impact Bc Community Website V004
Impact Bc Community Website V004
 
Social Media Basics
Social Media BasicsSocial Media Basics
Social Media Basics
 
Aasl2011 website
Aasl2011 websiteAasl2011 website
Aasl2011 website
 
Smwpoland day2 final -ws5
Smwpoland day2 final -ws5Smwpoland day2 final -ws5
Smwpoland day2 final -ws5
 
These article
These articleThese article
These article
 

Viewers also liked

MS Techday Botucatu - SharePoint for Internet Sites
MS Techday Botucatu - SharePoint for Internet SitesMS Techday Botucatu - SharePoint for Internet Sites
MS Techday Botucatu - SharePoint for Internet SitesFabian Gehrke
 
Enterprise Java Hosting in a Cloud Environment
Enterprise Java Hosting in a Cloud EnvironmentEnterprise Java Hosting in a Cloud Environment
Enterprise Java Hosting in a Cloud EnvironmentSweta Vajjhala
 
Continuous Delivery for Mobile
Continuous Delivery for MobileContinuous Delivery for Mobile
Continuous Delivery for MobileSweta Vajjhala
 
BluDotNet - Introdução ao SharePoint 2010
BluDotNet - Introdução ao SharePoint 2010BluDotNet - Introdução ao SharePoint 2010
BluDotNet - Introdução ao SharePoint 2010Fabian Gehrke
 
Formulario De Registro De Boleta De Ventay Mantenimiento De Cliente
Formulario De Registro De Boleta De Ventay Mantenimiento De ClienteFormulario De Registro De Boleta De Ventay Mantenimiento De Cliente
Formulario De Registro De Boleta De Ventay Mantenimiento De Clientejameszx
 

Viewers also liked (6)

MS Techday Botucatu - SharePoint for Internet Sites
MS Techday Botucatu - SharePoint for Internet SitesMS Techday Botucatu - SharePoint for Internet Sites
MS Techday Botucatu - SharePoint for Internet Sites
 
ass3
ass3ass3
ass3
 
Enterprise Java Hosting in a Cloud Environment
Enterprise Java Hosting in a Cloud EnvironmentEnterprise Java Hosting in a Cloud Environment
Enterprise Java Hosting in a Cloud Environment
 
Continuous Delivery for Mobile
Continuous Delivery for MobileContinuous Delivery for Mobile
Continuous Delivery for Mobile
 
BluDotNet - Introdução ao SharePoint 2010
BluDotNet - Introdução ao SharePoint 2010BluDotNet - Introdução ao SharePoint 2010
BluDotNet - Introdução ao SharePoint 2010
 
Formulario De Registro De Boleta De Ventay Mantenimiento De Cliente
Formulario De Registro De Boleta De Ventay Mantenimiento De ClienteFormulario De Registro De Boleta De Ventay Mantenimiento De Cliente
Formulario De Registro De Boleta De Ventay Mantenimiento De Cliente
 

Similar to Blog Comments Organizer

PEDAGOGICAL BENEFITS OF BLOG IN HIGHER EDUCATION
PEDAGOGICAL BENEFITS OF BLOG IN HIGHER EDUCATIONPEDAGOGICAL BENEFITS OF BLOG IN HIGHER EDUCATION
PEDAGOGICAL BENEFITS OF BLOG IN HIGHER EDUCATIONThiyagu K
 
Blogging for Advisors
Blogging for AdvisorsBlogging for Advisors
Blogging for Advisorsrjensen
 
User-generated metadata: Boon or bust for indexing and controlled vocabularies?
User-generated metadata: Boon or bust for indexing and controlled vocabularies?User-generated metadata: Boon or bust for indexing and controlled vocabularies?
User-generated metadata: Boon or bust for indexing and controlled vocabularies?Louise Spiteri
 
User-Generated Metadata: Boon or Bust for Indexing and Controlled Vocabularies?
User-Generated Metadata: Boon or Bust for Indexing and Controlled Vocabularies?User-Generated Metadata: Boon or Bust for Indexing and Controlled Vocabularies?
User-Generated Metadata: Boon or Bust for Indexing and Controlled Vocabularies?Louise Spiteri
 
Chapter2a McHaney 2nd edition
Chapter2a McHaney 2nd editionChapter2a McHaney 2nd edition
Chapter2a McHaney 2nd editionRoger McHaney
 
Indexing presentation 2013 06-04
Indexing presentation 2013 06-04Indexing presentation 2013 06-04
Indexing presentation 2013 06-04Louise Spiteri
 
Blogs- a sankhadeeps presentation
Blogs- a sankhadeeps presentationBlogs- a sankhadeeps presentation
Blogs- a sankhadeeps presentationsankhadeep
 
Web 2.0 In The Enterprise
Web 2.0 In The EnterpriseWeb 2.0 In The Enterprise
Web 2.0 In The EnterpriseLyndon Cerejo
 
typical recommending systems
typical recommending systemstypical recommending systems
typical recommending systemspashaying
 
Motivation for Weblogging
Motivation for WebloggingMotivation for Weblogging
Motivation for WebloggingStephan Kaiser
 
Different types of blog available
Different types of blog availableDifferent types of blog available
Different types of blog availablep10540735
 
fdmm unit-5 BY RAMAKRISHNA DASIGA.pdf
fdmm unit-5 BY RAMAKRISHNA DASIGA.pdffdmm unit-5 BY RAMAKRISHNA DASIGA.pdf
fdmm unit-5 BY RAMAKRISHNA DASIGA.pdfRAMAKRISHNA DASIGA
 
Web 2.0 2012 - lesson 3 - blog
Web 2.0 2012 - lesson 3 - blogWeb 2.0 2012 - lesson 3 - blog
Web 2.0 2012 - lesson 3 - blogCarlo Vaccari
 
Executive Summary Ayub Jake Salik Mba Itm Thesis 2008
Executive Summary Ayub Jake Salik Mba  Itm Thesis 2008Executive Summary Ayub Jake Salik Mba  Itm Thesis 2008
Executive Summary Ayub Jake Salik Mba Itm Thesis 2008Ayub Jake Salik, BE, MBA
 

Similar to Blog Comments Organizer (20)

Dg24698702
Dg24698702Dg24698702
Dg24698702
 
Blogosphere
BlogosphereBlogosphere
Blogosphere
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
PEDAGOGICAL BENEFITS OF BLOG IN HIGHER EDUCATION
PEDAGOGICAL BENEFITS OF BLOG IN HIGHER EDUCATIONPEDAGOGICAL BENEFITS OF BLOG IN HIGHER EDUCATION
PEDAGOGICAL BENEFITS OF BLOG IN HIGHER EDUCATION
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Chapter2a McHaney
Chapter2a McHaneyChapter2a McHaney
Chapter2a McHaney
 
Blogging for Advisors
Blogging for AdvisorsBlogging for Advisors
Blogging for Advisors
 
User-generated metadata: Boon or bust for indexing and controlled vocabularies?
User-generated metadata: Boon or bust for indexing and controlled vocabularies?User-generated metadata: Boon or bust for indexing and controlled vocabularies?
User-generated metadata: Boon or bust for indexing and controlled vocabularies?
 
User-Generated Metadata: Boon or Bust for Indexing and Controlled Vocabularies?
User-Generated Metadata: Boon or Bust for Indexing and Controlled Vocabularies?User-Generated Metadata: Boon or Bust for Indexing and Controlled Vocabularies?
User-Generated Metadata: Boon or Bust for Indexing and Controlled Vocabularies?
 
Chapter2a McHaney 2nd edition
Chapter2a McHaney 2nd editionChapter2a McHaney 2nd edition
Chapter2a McHaney 2nd edition
 
Indexing presentation 2013 06-04
Indexing presentation 2013 06-04Indexing presentation 2013 06-04
Indexing presentation 2013 06-04
 
Blogs- a sankhadeeps presentation
Blogs- a sankhadeeps presentationBlogs- a sankhadeeps presentation
Blogs- a sankhadeeps presentation
 
Web 2.0 In The Enterprise
Web 2.0 In The EnterpriseWeb 2.0 In The Enterprise
Web 2.0 In The Enterprise
 
typical recommending systems
typical recommending systemstypical recommending systems
typical recommending systems
 
Motivation for Weblogging
Motivation for WebloggingMotivation for Weblogging
Motivation for Weblogging
 
Different types of blog available
Different types of blog availableDifferent types of blog available
Different types of blog available
 
fdmm unit-5 BY RAMAKRISHNA DASIGA.pdf
fdmm unit-5 BY RAMAKRISHNA DASIGA.pdffdmm unit-5 BY RAMAKRISHNA DASIGA.pdf
fdmm unit-5 BY RAMAKRISHNA DASIGA.pdf
 
Web 2.0 2012 - lesson 3 - blog
Web 2.0 2012 - lesson 3 - blogWeb 2.0 2012 - lesson 3 - blog
Web 2.0 2012 - lesson 3 - blog
 
Executive Summary Ayub Jake Salik Mba Itm Thesis 2008
Executive Summary Ayub Jake Salik Mba  Itm Thesis 2008Executive Summary Ayub Jake Salik Mba  Itm Thesis 2008
Executive Summary Ayub Jake Salik Mba Itm Thesis 2008
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Blog Comments Organizer

  • 1. Blog Comments Organizer An Interface for Organizing News Comments Sweta Vajjhala, Nicholas Diakopoulous, Irfan Essa Georgia Institute of Technology | College of Computing 801 Atlantic Drive, Atlanta, GA 30332 sweta@gatech.edu, nad@cc.gatech.edu, irfan@cc.gatech.edu ABSTRACT Although there has been some research on organization of media This paper focuses on organization of comments on a particular articles, little has been done to organize readers’ comments on blog post. The research that was done was the first of its kind. these articles. This project focuses on that new aspect of Background research was done with the field of computational computational journalism with the creation of a blog comments journalism and its relation to the blogosphere, in additional to organizer. research into categorization of blog posts. Several design ideas were then considered for ways to organize blog comments. The 2. BACKGROUND deciding factor was whether or not quotes from the post were There can be many different ways to organize an article’s used in the comment. There was a specific algorithm that was comments. Today, the Internet has become the largest medium in used to figure this out, and then, the design was applied to the the world for reading about news and interactively discussing it. actual blog post itself. Results indicate that this would be a Not only has the number of readers increased, but the number of successful application for all news blogs, should it be applied to blogs overall, especially news ones, has drastically increased [1]. the websites accordingly. Baumer et al. state that readers now have the mentality of: “I know what’s there and I know where to find it when I need it.” Categories and Subject Descriptors With this mannerism, readers are able to read about any type of H.5.2 [User Interfaces]: Graphical user interfaces, H.5.3 [Group news article that they wish. With news blogs become increasingly and Organization Interfaces] Collaborative computing popular, readers are slowly taking on the role of contributors, as well, by posting comments to their favorite blogs. General Terms With the variety of different news articles and comments that are Design, Human Factors posted, blogging has become a multi-faceted and heterogeneous activity. Articles in news blogs today are often organized by into Keywords different categories. In addition to this, people can add their own design, computational journalism, blog, news, articles tags, which are collections of keywords attached to blog entries that help describe what the entries are about [2]. 1. INTRODUCTION Brooks & Montanez analyzed the effectiveness of tags for There have been many different advances in technology that have classifying blog entries. Their results indicate that tags are using helped organize information that is on the Internet. One of these for grouping articles into broad categories, but less effective in fields, called computational journalism, is specifically tailored to indicating the particular content of an article. However, the idea finding new ways to organize media information via technical of sharing tags could potentially be applied to help organize the advancements. comments, based on the text of each comment. There are three main uses of tagging: annotating information for personal use, Since the emergence of Web 2.0, interactive media has become placing information into broadly defined categories, and very popular. Not only has it allowed for sharing information annotating particular articles so as to describe their content [2]. across the world, but it has created an environment that Each of these uses could also be applied to the comments on the encourages collaboration among media articles. These blog post. collaborations have formed millions of communities on the Internet. News articles, in the form of blogs, have become very One problem that comes with tags is trying to identify appropriate popular, allowing readers to become contributors [1] and express tags, while eliminating noise and spam [3]. Another problem is their opinions. several different tags might be used to all describe the same concept, so this duplicity also creates extra clutter [2]. A similar problem needs to be addressed in the organization of blog Permission to make digital or hard copies of all or part of this work for comments- which comments are useful to readers and which ones personal or classroom use is granted without fee provided that copies are are spam or irrelevant to the topic of the post? One solution is to not made or distributed for profit or commercial advantage and that automatically generate content-based tags, while also considering copies bear this notice and the full citation on the first page. To copy when the tag was originally created. For comments, their otherwise, or republish, to post on servers or to redistribute to lists, organization could be based on chronological order, with the most requires prior specific permission and/or a fee. recent comments showing up first and the oldest ones showing up last.
  • 2. After finding a way to organize the blog comments, the last thing comments organizer will match that of the Dot Earth page, so the to do is to find a way to collect and organize the blog articles and integration of the application will seem transparent to the user. its comments. The online public nature of blogs provides incredible resources for data mining. Kramer and Rodden state that, after collecting a variety of blogs, they used clustering to group the blogs into categories based on five different factors: melancholy, social, ranting, metaphysical, and work. They found that blog articles are difficult to group into categories, because the blogging community is so heterogeneous. So, each blog does not cleanly fit into any single category [4]. Comments on blogs are also comparable to this- since there can be lots of different discussions happening with comments, it could be very difficult to place the comments into one category objectively. Figure 1. Sketch of the blog comments organizer design. By In the following sections, the design, algorithm, and evaluation of scrolling over the yellow highlighted text, the box at the top the system will be presented, concluding with a discussion of the will show up. If the user is not moused over the highlighted results and future work. text anymore, then the box will disappear. The rationale for this design choice is supported by the fact that 3. BLOG COMMENTS ORGANIZER the data mining yielded that quotes were very often used in the 3.1 Data Mining comments of the Dot Earth blog. The blog comments organizer The data that was used to implement the blog comments organizer would be a great tool for new readers to quickly get acquainted was pulled from Dot Earth, an environmental blog written by with the traditional posting style of contributors to the Dot Earth Andrew Revkin of The New York Times newspaper. On average, blog. Moreover, the blog comments organizer offers a way for each of his articles tends to generate over 80 comments. Because readers to find out more information on a specific part of the of the vast popularity of the blog and the variety of comments, article without having to read all of the 100+ comments. It data from this blog was used in the testing of the blog comments provides the reader with the advantage of being able to only read organizer. the comments that he/she is interested in, based on the parts of the article that the reader liked. Five articles were randomly chosen to undergo an analysis- by hand. During this time, information and statistics about the set of 3.3 Data Collection comments corresponding to each article were collected. The In order to collect the data from Dot Earth, a blog scraper script information that was collected included the number of comments was written in the language of PHP5. The scraper script gets the for each of the following: comments that were multiple 60 most recent articles in the Dot Earth blog and places them into paragraphs long, comments that used quotes from the article a MySQL database. For each article, the scraper also gets all of within them, comments that used statistics (or some other the comments and places those in the database too. The schema numbers) to support their point-of-view, comments that for the database is as follows- the article is linked to each of its referenced other related articles, comments that were a response comments using the field articleID. to a previous comment, comments that used the same key words (i.e. “history” or “future” or “evolution”), and finally, the number of posts per day. Out of the data that was collected above, the number that seemed to yield the highest value was the number of comments that used quotes from the article within them. As a result of this, it was decided that the most optimal way to organize the comments for this blog would be to show users a list of comments for each part Figure 2. Schema for the database that stores all of the articles of the article that was used in a quote. and comments. 3.2 Design The design for the blog comments organizer was done first with 3.3.1 Algorithm for Gathering Data The algorithm for gathering all of the articles and respective some sketches. It was then implemented using PHP, HTML, comments is given here. JavaScript, and Greasemonkey. The blog comments organizer can be easily integrated into the First connect to the Dot Earth homepage and get its HTML Dot Earth page. For each article, it highlights the parts of the source. Inside the source, look for the title of each news article article that are quoted in a comment. When a user then scrolls based on the corresponding HTML tags. For each of the articles, over the highlighted part of the article, the comment(s) that look for the corresponding HTML tags for the comments. Read reference(s) it will show up at the top of the page in reverse the text between all of the open and close HTML tags for each chronological order, so that the most recent comment will show article and its comments. Insert all of this information into a up first. A sketch of this design can be seen below. When the blog database with the schema above. In order to get articles across comments organizer is implemented, the style of the blog multiple pages, loop through the same process, after finding the corresponding HTML tags for each page.
  • 3. Soon after this research was done, an API was introduced for Dot 4. EVALUATION Earth. In the future, it might be easier to collect all of the data via The reception of the blog comments organizers to some volunteer the API. However, this would also mean that the information testers presented some advantages and disadvantages to the blog would be stored in an XML file, not in a database, and this could comments organizer. First and foremost, although the design is make it harder to find quotes in the comments. integrated nicely into this particular blog (Dot Earth), it would require a lot of customization for each blog for which this was 3.4 Finding Quotes in Comments used. This is because each blog will have a different style, and Once the articles and comments are in the database, the next step therefore, the scraping will have to be done all over again. is to go through all of the comments for each given article and see However, the actual algorithm that is used to find the quotes if there are quotes from the article in there. would still be the same. Displaying the blog comments organizer for each blog would again differ, based on the style of the blog. First, check the comment all of the opening quote (“) symbols and However, the algorithms for inserting the <div></div> tags would the closing quote (”) symbols. If this exists, then see if the data still remain the same, once the source code of the other sites were between the two quotes matches any phrase from the article. Is it figured out. important to check to make sure that the quotes are not links to external pages, because these will match quotes to external pages One disadvantage of this blog comments organizer is that the in the article. Therefore, this case must be excluded when algorithm searches for the start and end quote characters. checking for quotes in the comments. If a quote in the comment However, a comment might have article from the text in it matches text from the article, then the starting index of the text in paraphrased or presented without the quotation marks. If this was the article should be stored in quote_index_start in the comments the case, then the presented algorithm would not find this as a database table. The end of the quote should be stored in quote, because it is not located within quotation marks. By quote_index_end. allowing for this to happen, there would be more comments for the user to see in the design of the blog comments organizer. 3.4.1 Algorithm for Finding Quotes However, to be able to detect paraphrasing, it would also require The algorithm for finding quotes from the article within a changing the fundamental algorithm to use some artificial comment is below. intelligence techniques, in addition to what it is already doing, while searching the article text. for each article in the database: get all comments for that article One major advantage of this design is that the user is given a choice whether he or she wants to read the comments. Since the for each comment: comments show up on a mouse-over event, if the user does not quote_start_index = 0; want to use the feature after the first time, he will not see all of quote_end_index = true; the different comments show up. Moreover, the comments are as long as there is an ending quote: placed strategically towards the right-hand-side of the page, go through text and find the opening quote where there is whitespace. This way, it does not cover up any possible important information that is on the page. The blog if there is no opening quote, exit loop by setting quote_end to false comments organizer acts as a supplement to the reader to make it easier for him to find the comments that he may be looking for, if there is an opening quote: but it does not require the user to use it. search for the ending quote starting from quote_start For example, someone who just wanted to browse the Dot Earth search for the text between the starting and blog and get an idea of the contributors, they might want to ending quotes in the article browse all of the comments, not just the parts that pertain to if the text exists: certain parts of the text of the article. In this case, the user does not have to use the blog comments organizer. However, if this store the quote_start_index and quote_end_index in the database for that user becomes a frequent visitor and contributor to the Dot Earth comment blog, he may start to look for specific comments which pertain to else: parts of an article that he likes. In this case, the user would find the blog comments organizer an ideal tool to get the information do not store anything and exit that he needs without having to go through hundreds of comments. Using the algorithm above, quotes from the article were found and the indices of where they were found were stored in the 5. DISCUSSION database. Because of the variety of the usage of the blog comments Once the quote indices were known, another script was written to organizer, there are many different ways that this tool can be insert <div></div> tags around the quotes in the article text that useful. Namely, it focuses on the growing field of computational highlighted that part of the article. JavaScript was then used to journalism: organization of media. There are many different news trigger a mouse-over event, so that if the reader put their mouse sites that would benefit from organization of their reader over the highlighted part of the article text, the list of comments comments, and this would be perfect. that contained that part of the article as a quote would show up in In the Evaluation section above, there was an example of the user the right-hand-side, as was shown in Figure 2 above. who just wanted to find information about a specific part of the article. This blog comments organizer could be useful for data
  • 4. analysts in the media profession. Based on a posted news article, 7. CONCLUSION the author or the company that posted it can find out which parts It is possible to organize blog comments in a plethora of different of the article triggered the most comments. Based on this, the ways. Depending on the medium and the type of blog that is being company could post more articles that pertain to very similar used, there could be a number of ways to analyze and organize topics. This would attract new users, as well as retain the current the comments in a meaningful way for the users that come by. users. Organization of blog comments will soon become a very powerful The blog comment organizer could revolutionize the way that tool that can be used to target the type of users that the blog is articles are written and read. Based on the popularity of a certain tailored towards. part of an article, blogs can be tailored to suit the majority of its While there are many different ways to organize comments and readers. This would introduce a new level of specificity for the using quotes (as in this particular blog comments organizer) is just blog. If many blogs were to follow this and focused on specific one, it is important to realize that this growing field could soon re- topics, it might make blogs easier to categorize and make tags define the way that media is presented to the world. more universal. 8. ACKNOWLEDGMENTS 6. FUTURE WORK Many thanks to all volunteer evaluators, especially Sekhar There are many different applications and related work that could Vajjhala, Carolina Gomez, Blair Daly, and Nicholas Bowen. be done based on the blog comments organizer. First and foremost, a different metric could be used to organize 9. REFERENCES comments. Right now, only quotes are being used, but blog [1] Baumer, Eric, Mark Sueyoshi, and Bill Tomlinson. comments could also be gauged based on the themes of the posts "Exploring the Role of the Reader in the Activity of (i.e. history, evolution, etc.) or comments that used statistics. Blogging." CHI 2008 (2008): 1111-20. Blogs have become a source for data mining, and if users are looking for certain quotes or numbers and comments contain [2] Brooks, Christopher H., and Nancy Montanez. "An Analysis those statistics, this would be very useful for the user. of the Effectiveness of Tagging in Blogs." American Association for Artificial Intelligence (2006). The blog comments organizer could also be used to analyze different types of media. Right now, only written blogs are being [3] Gill, Alastair J., et al. "Emotion Rating from Short Blog analyzed. However, video blogs are slowly becoming more Texts." CHI 2008 (2008): 1121-24. popular, so being able to find comments that quoted parts of a [4] Kramer, Adam D.I., and Kerry Rodden. "Word Usage and video in a blog post would also prove to be very useful. Posting Behaviors: Modeling Blogs with Unobtrusive Data Collection Methods." CHI 2008 (2008): 1125-28