Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: I ran sentiment analysis on Show HN comments and got the meanest ones (walzr.com)
78 points by walz on July 16, 2018 | hide | past | favorite | 64 comments


It is funny how polite mentions of crashes are regarded so negatively by it compared to messages. Now I wonder what can be said to get an overwhelmingly positive sentiment while being not just rude but downright horrifying messages. Say "I hope you enjoy eating your family."

Reminds me of the one account of an essay question grading program's unique flaw where it scored higher with every mention of orangutans with no regard to relevance or possibily even grammar.


Do you have a source on the orangutan grader? That sounds like an hilarious read.


Sadly it was a while ago - the closest I could find was a MIT one mentioning essay graders judging by length and connective words allowing for properly structured nonsense to be judged as good writing.

https://www.theguardian.com/education/mortarboard/2013/may/2...

I guess orangutan is a sufficiently long word that replacing most nouns with it would improve your score.


I wrote another sentiment analysis tool. https://goo.gl/UoVo51

Ran it across your comment with these results:

Sentiment Analysis: This text is: positive (+0.5)

funnyregardedcrashespolitewonderoverwhelminglyhorrifyingrudepositiveenjoyflawregard The full range of sentiments in this text is: positive 0.25, trust 0.178571428571429, surprise 0.142857142857143, anticipation 0.107142857142857, joy 0.107142857142857, sadness 0.0714285714285714, negative 0.0714285714285714, fear 0.0357142857142857, anger 0, disgust 0,


The grading program may have been written by a librarian.


Of course, no one would ever look too far into the matter.

  We’ve got the only librarian who can rip off your arm with his leg. People respect that.


When people copy and paste error messages into the comments, they usually contain stuff like "kill", "fatal" or "stopped" which the analysis thinks is negative


Looks like a cool project, but I can't scroll very far down the site before my browser crashes. I've reproduced this several times, here's the terminal output if it helps: $ conkeror https://alpha.trycarbide.com ...... JavaScript strict warning: https://alpha.trycarbide.com/, line 603: SyntaxError: test for equality (==) mistyped as assignment (=)? fault....

To me, this is a good ShowHN comment...on the other hand, it might say something about software error messages.


The further down you scroll the more the negativity becomes hit or miss.


It's not a good comment if you consider that the browser crash they were experiencing wasn't even the website's fault, it was the browser being buggy. The JavaScript error that they copied from the console was an error from inside the Conkeror browser itself (that browser is written mainly in JS). So the OP was complaining to the Show HN creator about the failings of a buggy, non-maintained (in the last 6 years) non-mainstream browser that they choose to use to view the website.

Unfortunately I have seen that type of comment quite a bit on some of the Show HN threads: someone complains that the site doesn't work without JavaScript, or doesn't work with their bizarre non standard web browsing setup.


Sorry for not being clear. From a sentiment standpoint, it's good, and that's the relevant context for my comment. [0] From a technical standpoint, it's gray. People land on both ends of the supported browser spectrum [1]...and all over the middle. Likewise with graceful degradation. Anyway, while Conkeror support is certainly near one end of the technical goodwill spectrum, I think discovery via "ShowHN" is probably better than via an upset customer.

[1]: "Best viewed in IE6/Netscape" was standard at one point. Never mind the recent trend of jackassware "Upgrade to a modern browser" popups for anyone not using Chrome.


Looks like your algorithm is classifying some neutral-toned problem solving type feedback as “mean”. Personally, that’s exactly why I would do a Show HN. Examples

“Looks like a cool project, but I can't scroll very far down the site before my browser crashes. I've reproduced this several times, here's the terminal output if it helps: $ conkeror https://alpha.trycarbide.com ...... JavaScript strict warning: https://alpha.trycarbide.com/, line 603: SyntaxError: test for equality (==) mistyped as assignment (=)? fault.... -16 sentiment, chriswarbo 2 years ago in reply to "Show HN: Carbide – A New Programming Environment"

“just filed an issue - but the error message is pretty obnoxious for a catch all- bound to the $(window) error event is a catch all error that blames me for not having enough data (56 public repos not enough?) This means that anyone who knows this url and decides to look me up will see a message accusing me of being a non-producer if anything goes wrong with the resume -14 sentiment, beezee 6 years ago in reply to "Show HN: My Github rsum"

...those were within the first 10. If this similarly neutral-toned problem solving type report makes me mean in your algorithm’s view, that is a label I shall wear with pride.


> The most negative ones are shown below.

The message that is the third most negative by this metric is the following:

> Looks like a cool project, but I can't scroll very far down the site before my browser crashes. I've reproduced this several times, here's the terminal output if it helps: $ conkeror https://alpha.trycarbide.com ...... JavaScript strict warning: https://alpha.trycarbide.com/, line 603: SyntaxError: test for equality (==) mistyped as assignment (=)? fault....

That is clearly a false positive.


My favorite is #5 from the top, where a user DMCA'd themselves to get yahoo to delete some website they created previously:

> I used to have a Geocities containing weird bad poetry I wrote when I was a teenager. I forgot about it, until years later I stumbled upon it again. I was embarrassed. I asked Yahoo to delete it. But I'd forgotten the password, and I'd used fake personal details (wrong date of birth) to create the account, and I couldn't remember what the fake info was, so they refused to delete it because I couldn't verify that I was who I said I was. What do I do? I hit on a solution. I decided to DMCA myself. I sent Yahoo a DMCA takedown request for my old Geocities, and straight away it disappeared. Mission accomplished.

Again, not negative at all, IMHO.


Yeah, been there done that. I once created a fake neo-nazi like web site with the same name of our Dark Age of Camelot guild. Mythic had a no guild name change policy at the time and we wanted to change the name. I sent off a trouble ticket to customer support with the link to the "newly discovered" site and we had a new name in less then an hour.


I find something deeply poetic about the post that you quoted. It somehow reminds me of haiku.


It's probably picking up on words like "Looks ... cool ... but", "crashes", "I can't", "strict", "warning", "mistyped", "fault".


My understanding is that the scores from sentiment analysis are indicating the confidence that the message is positive or negative, not the degree of negativity. Can anyone with experience with this particular method comment?

That _is_ a negative comment, the site is crashing the browser.


Quick question. How does your sentiment analysis treat the following two sentences?

I fucking hate this thing.

VS

I fucking love this thing.


Looks like it's using hardcoded word/phrase scores based on AFINN-165[0] Your example is actually built in: https://github.com/thisandagain/sentiment/blob/251bea96190ec...

[0]: http://www2.imm.dtu.dk/pubdb/views/publication_details.php?i...


I just ran this though and "I fucking hate this thing" has a -7 score, and "I fucking love this thing" is -1. "Fucking" and "hate" is negative, but "love" is positive and adds to the score. It would be improved if it could tell the difference, "fuck" itself definitely sounds negative but "fucking" can almost mean anything


>this shit is wayyy too fucking cool....next snapchat!

>-8 sentiment, gailees 5 years ago in reply to "Show HN: Vinepeek - watch the world in realtime in 6 second snippets"

Like that?


It would have to somehow know that "fucking" in this case is being used as an adverb similar to "really", e.g "I really love dogs".

Another tough one would be "I don't fucking hate dogs", which actually means you like them. The sentence needs to be parsed together, not word for word :)


"I don't fucking hate dogs"

A double negative is a positive.


This does kind of start to adopt a counter-accusative tone, which could be interpreted as negative. Though it would entirely depend on context. The expletive just makes it sound angry overall.


"I don't fucking hate dogs!!" sounds like something yelled with a raised fist-finger in a dispute about dog shit on the lawn between neighbours in some lowly apartment complex.


Technically this isn't a double negative, but you are right that the negative connotation of "hate" is reversed by "don't".

"I don't have nothing" is a double negative. In "proper" English (whatever that means ;p) , the correct phrasing would be "I don't have anything" or "I have nothing".


When I do my next Show HN, and the negativity gets too much to take, this will be a good resource to turn to, in order to feel a little better about myself (unless, of course, I end up at the top of your list)!


The site is down now. I was going to see if I was on yet another HN leaderboard!


Even despite the false positives as pointed out in other comments, I'm delightfully surprised that the worst it gets is around <15% negative comments. Sometimes, HN seems to cynical to me, at least the comments that float up to the top do. What would be interesting is negative comments weighted by the place in the comment section (since you can't see upvoted scores).


The site is blocked for me at work, but if he didn't include shadow banned comments, then he's missing the biggest pool of potentially negative comments.


> how are you going to avoid head hunters' spam, either as fake candidates to discover new clients or with fake offers for CV mining?

Was given a sentiment of -9, but I'd say the sentiment is closer to 0. Anyway, it's clear there are a ton of false positives, but overall, this was a really neat idea and it would definitely be interesting to further index the posts.


I'm confused how this comment got rated "-10":

I use Backblaze now and once I get my NAS, I’ll probably end up using a B2 based backup. But let’s make an honest comparison. Backblaze does not replicate your data across data centers. The standard S3 storage class does (0.23/gb). The comparible storage class for S3 is one zone infrequent access (.01/gb). B2 still comes out ahead, but I wouldn’t use either one for primary storage. For thier suggested “3-2-1” backup strategy, sure. Then again, just for backup, I could use S3 glacier for $.004/gb. That’s cheaper than B2 and I get multiple AZ storage. The data charges would be higher - but its backup. If catastrophe struck and I lost my primary and my local backups, getting my data fast is the last thing I would worry about.

https://news.ycombinator.com/item?id=17407275


> does not > I wouldn't use > then again > catastrophe > struck > worry

I can see it. If you bag-of-words'd it there are a lot of negative words used and effectively no positive words.


I was about to mention that comment. Especially since it is mine. I actually said that I’m a happy Backblaze customer, and couldn’t see why it was considered so negative.


Huh, so apparently the day with the highest percentage of mean comments is Sunday.

Anyone want to have a guess as to why that may be the case? Personally, I'd expect people to least happy on Monday morning or something, not the second day they usually get a rest that week.

Similarly confused as to why everyone is supposedly so positive on Wednesday...


My guess would be that a larger percentage of respondents are not professionals in the field. The pros are afk.


The difference between the most negative day (14.18%) and the least negative day (10.66%) is very small.


A few guesses:

1. People who got into family fights over the weekend

2. People who loved online for a minute and got an upsetting work email and are dreading the coming work week.

3. People who feel isolated or bored and are online instead of enjoying their weekend.

Would be curious to see if this is random noise though it if it's consistent year to year.


I'm wondering how this compares to the frequency of any comments by day. It could be that more comments are posted on Sunday in general, and the least on Wednesday.


The negativeness is a percent (number of negative comments divided by number of total comments). I don't think the data is too significant though, because they are all + or - 5 percent of each other


Smonday - the period of time where you realise your weekend is coming to an end and you have to go back to work tomorrow


I’m no expert in the field - I’ve only watched a few videos - but the example I’ve seen is where they use a movie’s rating by a person (1-5) and their comments to train a model and then use the model to determine sentiment analysis. Unfortunately, since AFAIK their isn’t a way to determine how many points a post earned except for your own, he couldn’t do that.


For some reason there is a .nobreak class that's actually enabling word breaks. Weird! And it even goes one step further and enables "word-break: break-all", so that the renderer will break all your nice words apart anywhere. That's not nice.

(Yes, this was a half-arsed attempt to match the "negative sentiment tone" without actually being mean ;)


Haha, thanks for sounding nice :)

I added that because some comments had really long URLs, so I had to enable breaks so the page wouldn't be really wide, more so on phones. Didn't realize that it added hyphens to words, thanks for pointing that out, it's fixed now


I consider it a personal failure of character not to have at least gotten an honorable mention ;)


This is pretty cool - I'd love to see similar analysis for truly "contentious" comments, wherein there were an almost equal but large amount of upvoting and downvoting, controlled for accounts that have the ability to do either.


I hope you realize there is a built in sentiment analyzer on HN, based on highly advanced, natural intelligence algorithms.

Your filter actually seems to dig strongly opinionated posts. They are not automatically bad, and they can be quite good.


I think it's really funny how most of the comments in this list are genuine critiques and concerns, that get downvoted by toxic users. I went through and upvoted about half of them.


I would love to see the top nasty comments for the day similar to http://hckrnews.com/ and how it ranks



A comment on my own show HN got flagged, which is kind of funny.


It would also be great to see the most positive comments!


Great job, love the "Y tho?" logo :)


Haha thanks :)


I'm very happy to report that most sentiment analysis is awesomely, incredibly, even beautifully inane.


I'd love to see how HN's 14% negative comments compare to youtube or reddit.


This kind of automated or rule-based analysis is not sufficiently smart to use as any kind of moderation tool. And it won't be until it can interpret semantic content as well as recognize patterns.

Consider the following exchange:

Person 1: Hello! I just wanted to chime in and make you aware of the fact that according to some very cool research <link>, people with sub-equatorial ancestry exhibit markedly lower test scores, in fact very similar to many of the great apes! Your mileage may vary, but in my humble opinion I would never hire someone or work with someone from that demographic.

Person 2: Shut up and go away, you racist prick.

This pattern plays out on twitter fairly regularly, and it's usually Person 2 who gets moderated, despite the fact that the content of their message is actually more appropriate and a net positive for the community (given the context.) Meanwhile, as long as it's polite, actual hate speech can make it through most of these filters.

I've heard this referred to as the "polite Nazi" problem, and it's quite real.


It sounds like what you want is not sentiment analysis but rather some sort of political-correctness analysis, which rather than identifying aggressive language identifies "wrongthink".

I suspect you're the sort of person who would have torn down the signs 4channers put up last year saying "It's OK to be white", because despite the obviously anodyne and correct content of the message, you would have interpreted it as "polite Nazi".


I thought his point was interesting and practical, and that he used a non-controversial example. The problem he's describing is probably impossible to solve because real examples aren't so obvious, but it seems like you jumped to the conclusion that he would be willing to accept the downside of such a system in practice (punishing people who are in fact acting in good faith, and not just hiding hatred behind apparently rational wording), and you seem to be personally attacking him. However, I can only say that by assuming from the way you wrote it that you believe the 4channers you referenced were by-and-large acting in good faith. You should be more clear if that's what you're saying, since that would be a surprising conclusion to me. From everything I saw and heard, that movement seemed to be dominated by the usual hatred, and "It's OK to be white", while ostensibly an addressal of very real problems, was really just a slogan that's unassailable outside of any other context and therefore easy to hide behind.


[flagged]


So let me get this straight:

Implying that someone holds a racist ideology: unacceptable ad-homenim.

Actually holding a racist ideology: just fine, or at least we can't stop it without censoring free thought.

Also I don't know why you keep going on about 4chan. Nobody else is talking about that example (mostly because it was an obvious bad-faith troll and isn't worth talking about.)


You got it: that's exactly correct. Implying someone is racist when they've not said anything racist is unacceptable ad-hominem. I think any civilised person can agree on that.

And the second part is also correct. You can't stop people holding "racist ideologies" without censoring free thought, which short of Black Mirror style sci-fi machines is impossible. But even if such technology did exist you can't even define what "racist ideology" actually means with any level of consensus of precision. You're so far from even being able to think about doing anything here, that even attempting is worthless and will cause vast collateral damage. Therefore the correct solution is to do nothing.


This is fascinating. What other things fall into this intriguing category, where doing a Very Bad Thing is ok, but observing that someone seems to be in favor of a Very Bad Thing is downright terrible?

And how is a society to defend itself against Very Bad Things, when it's worse to talk about than to actually do?

The only reason I can think of why someone might hold such a logically contorted position, actually, is if they actually think Very Bad Thing isn't that bad a thing at all, but won't come right out and say so.


Yes, I do in fact think that blatant racism is "wrongthink" and has no place in civilized discourse. I definitely think it is a more serious issue than using curse words or expressing negative sentiments.

Also, if you think that that 4chan stunt was "anodyne" then it seems you have very little concept of how language or speech acts work, and are unqualified to weigh in on a discussion of them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: