topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday December 12, 2024, 1:49 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Natural Language Sorting for Comments  (Read 4243 times)

icekin

  • Supporting Member
  • Joined in 2007
  • **
  • default avatar
  • Posts: 264
    • View Profile
    • icekin.com Technology,Computers and the Internet
    • Read more about this member.
    • Donate to Member
Natural Language Sorting for Comments
« on: January 06, 2010, 05:55 PM »
I am currently pursuing an idea for better presentation of comments on blogs. I would like some suggestions on this:

The Problem:

On popular blog posts, there can be over a few hundred or thousand comments, spread across several threads on the same page. After a while, the few thousand comments fall into separate distinct threads talking about different things. Some posters also discuss multiple aspects of discussion in one reply. For the blog authors, it can be quite difficult to follow all the comments, especially across several blog posts. I am trying to find a solution that can help blog authors quickly prioritize and reorganize the entire thread in a more presentable format in order of post importance.

Example 1: "A Blog post supporting the use of national level internet censoring"

Comments for such a post can easily be categorized in 3 categories:

- In Favour of the author
- Against the Author
- Uncategorized (unable to categorize)

In this case, the author may wish to reply to the posts that oppose his own viewpoint first to further the discussion.

Example 2:

Blog post about: "Small independent hardware company announcing a new low voltage processor for netbooks"

Comments for this post may be more varied in categories:

- Posts pointing out other existing low voltage processors on the market - nature: informative
- Posts appreciating the product
- Posts critisizing the product for various reasons
- other

I am wondering if there is any existing Natural Language processing algorithm or software that can analyse complete sentences or comments and determine their nature as per the categories I have stated above.

From my research, the closest example I have found is Slashdot, where each post is categorized as informative, funny, insightful etc. and given a score from -1 to 5. But this is done by other visitors, not by algorithms. It works on a popular site like slashdot, but I am not sure other posters would bother doing this on a small blog.

Other solutions I have seen include methods of sorting comments based on location (IP address), time since blog post was published, length of post etc.

Any suggestion is appreciated!

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Natural Language Sorting for Comments
« Reply #1 on: January 07, 2010, 03:50 AM »
I don't think you're going to see anything that can sort reliably by looking at text stream... we've had just how many years of AI research, and even for perfectly well-formed English text this would be damn hard. Now consider the grammar and spelling of your regular internet commenter? Ugh.
- carpe noctem

VictorM

  • Participant
  • Joined in 2009
  • *
  • Posts: 16
    • View Profile
    • Successful failure
    • Read more about this member.
    • Donate to Member
Re: Natural Language Sorting for Comments
« Reply #2 on: January 07, 2010, 05:11 AM »
It's a neat idea, although not applicable imho. What you could do is a twist on comment ranking/voting - instead of giving stars or OKs/KOs, you can let the participants mark their comment as per the categories above. It might be something that will please the socialites' mind. (As you probably figured out already, I HAD to use the word socialite, just the way anchor heads were forced to use metrosexual back in the '90s and then blush uncontrollably).
When in doubt, use http://Google

icekin

  • Supporting Member
  • Joined in 2007
  • **
  • default avatar
  • Posts: 264
    • View Profile
    • icekin.com Technology,Computers and the Internet
    • Read more about this member.
    • Donate to Member
Re: Natural Language Sorting for Comments
« Reply #3 on: January 08, 2010, 09:11 PM »
I also posted this on the Sourceforge Natural Language Processing Forum a week ago and got no replies. With all the new advances in AI being revealed on the news, I thought the time was right to make an idea like this feasible. My main problem is that with the explosion of content, its becoming a struggle to separate the relevant content from the rest, for both the post authors as well as the readers of the blog.

[rant]
As a start, I would appreciate some sort of firefox, greasemonkey or chrome plugin that could just filter out the trolls, LOLcats, rickroll videos, pedobears and other stuff that I'm sure is funny, but I don't really give a toss about. I frequently read digg technology section and in the average comment thread, I find about 5 of every 100 comments worth reading for me. By that, I mean they are informative in nature and provide links to more relevant sources about the topic which actually increase one's knowledge. Problem is, these aren't the only posts that people digg up. Someone could write a simple 2 word comment insulting another poster or a company, and that ends up as the most dugg post, even though it had nothing to do the topic.
[/rant]

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Natural Language Sorting for Comments
« Reply #4 on: January 09, 2010, 04:36 AM »
AI and "being fuzzy about things" is something computers are inherently bad at, unfortunately - even the biggest supercomputers running neural networks have only reached the "intelligence level" (what popular science tends to call it; number of neurons != intelligence level, there's also training to do) of a cat, iirc.

One of the places where neural nets is applied is the OCR software in your postal office mail sorting machines. Dunno about the rest of the world, but in Denmark they run software from Siemens... pretty big company, pretty huge corpus of training material, lots and lots of tweaking (for each update Siemens is paid both a fixed amount of cash, plus a variable amount based on the added recognition efficiency - they have a lot of incentive for making the system better, and incentive is actually what the system is called :)).

Anyway, the system works pretty well, handles a very high percentage of letters automatically. But there's still a lot of letters that "slip by" and have to be manually sorted (still using computers though, with a whole bunch of postal monkeys hammering their keyboards, entering recipient addressess). A lot of the letters where the OCR system goes "oh, dunno what to do, *toss arms*" are due to bad handwriting, but even correctly formatted addresses with legible monospace fonts (and no stuff like logos or whatever that might confuse the system) slip through... which goes to show that Being Fuzzy And Smart Is HardTM :) - I'd wager that language recognition is a much harder task than "simple" OCR.
- carpe noctem