A Kaggle Grandmaster cheated in $25k AI contest with hidden code

paulgb · on Jan 23, 2020

For an HN audience, the "How Bestspotting cheated" post on Kaggle may be a better place to start: https://www.kaggle.com/bminixhofer/how-bestpetting-cheated

7777fps · on Jan 23, 2020

It's an odd way to cheat too, if they realised they had data from the validation set, couldn't they have over-trained a model with the validation set in the training data?

paulgb · on Jan 23, 2020

Yes, that would have been harder to detect if it is allowed. My understanding is that for a kernel competition (like this one), you can't use model weights that you've trained outside the kernel. Oddly, I can't find a rule explicitly prohibiting it.

RandomWorker · on Jan 23, 2020

Agreed, this actually happened in the buildings ASREA comp. the data was found online.

koheripbal · on Jan 23, 2020

Is it me, or has the HN user-sphere changes significantly this year.

I feel as though comments are less informative, more judgemental, and chalk full of emotions.

...as if a younger crowd of users has started using it.

Have the older-wiser HNers started using another site?

ribs · on Jan 23, 2020

It really worries me how many people are so quick to forgive him and tell him so.

In my family if someone cheated they got called a cheater and suffered consequences. At least, they would have, if someone did something like that. But my parents didn’t raise mendacious villains.

Look at this crap on Twitter:

“Everyone makes mistakes. Thank you for the apology”.

“Kagglers will still love to have you back”

“It's great that you realize your mistakes. Looking forward to see your comeback with more cool DS solutions and ethics than before.”

“Thanks for doing this. It's okay to make errors in judgement, we've all been there to varying degrees. Y'all be gonna be fine.1!”

Those are the worst. I’m not so crazy about these below, either, although there’s just a hint of steel in them, at least:

“I’m glad to see that you had a change of heart after sleeping on it and that you will be returning the prize money. I hope you will consider donating to or volunteering at a local animal shelter as well. Atonement here is more than returning the money and apologizing.”

“I hope this can be used as a teaching moment as well. Many people clearly look up to you because of your work. What can we learn from this? Something to ponder in the days to come.”

lultimouomo · on Jan 23, 2020

> In my family if someone cheated they got called a cheater and suffered consequences.

He was publicly shamed, banned from Kaggle and he lost his job (and I guess he's basically unhireable right now).

I wonder what would be your family reaction, if these things don't sound like consequences to you.

xigency · on Jan 23, 2020

Losing his job is one thing, but maybe it's time he got a job doing something else.

sundvor · on Jan 23, 2020

I'm thinking Kaggle should look to Epic/Fortnite for how to handle cheaters.

A lifetime ban would not be out of the place here, considering it went on to win the competition with no admission until caught.

gpm · on Jan 23, 2020

Kaggle should be looking to the police on how to handle cheaters. He stole 10,000 dollars, he defrauded h20.ai, etc. That's "go to jail" level crimes, not just "get banned from a competition and fired" level crimes.

gowld · on Jan 23, 2020

He gamed their poorly designed game. He didn't hack their servers.

ribs · on Jan 23, 2020

I’m not saying that Pleskov is a sociopath, but I’m beginning to see how sociopaths manage to make their way in the world without getting punished.

C1sc0cat · on Jan 23, 2020

Its a down side of all these leetcode type systems where people game them for status.

bitL · on Jan 23, 2020

Imagine you have 100 sociopaths, all of them aiming for the top, all able to recognize each other in a crowd. Now you both compete and cooperate together to get to the top; the ones caught are written off, the uncaught ones sooner or later achieve the goal, even if only 10 out of 100 make it there.

Some headhunters specifically try to identify high-performing sociopaths for top management positions.

seieste · on Jan 23, 2020

> Some headhunters specifically try to identify high-performing sociopaths for top management positions.

[Citation needed]

raverbashing · on Jan 23, 2020

You're not too far off, it usually starts with the family attitude.

mosselman · on Jan 23, 2020

Any sources for this? You can't just claim things like this. At least now without getting some reaction like mine here.

45ure · on Jan 23, 2020

>In my family if someone cheated they got called a cheater and suffered consequences.

You picked a random sample of ratio'd comments from a platform which is big on performative wokeness to strengthen your opinion, which still fails to explain how Pleskov managed to zugzwang his way into pulling off a Kobayashi Maru style move and the failure of the platform to monitor such abuses. Only time will tell, whether or not he has avoided being a part of the Dark Triad; without condoning his behaviour ─ what more can he do to atone for his sins?

luma · on Jan 23, 2020

I'm confused by this reaction here - this was a _brilliant_ hack of the system and I think his work should be celebrated. Was it in keeping with the intention of the competition? Of course not. Were lives threatened by their creative solution to the competition? Also no. At the end of the day it was a fun, inventive approach to a made-up problem.

So what's the problem?

bleuarff · on Jan 23, 2020

That's not a hack, that's blatant cheating, their solution litterally looked at the anwsers. It bypassed the ML model prediction, so that's not a ML solution, which to my understanding was the constraint of the competition. And in the end, that solution is useless for the adoption site, since the objective is to get adoption predictions (the animal has not been adopted yet). I'm confused too, by how can anyone think it's an acceptable behavior.

luma · on Jan 23, 2020

Their solution made use of the data available to them in a "prize" that was nothing more than a made-up competition for fun. Did the rules expressly say somewhere that one could not make use of available datasets?

gpm · on Jan 23, 2020

Kaggle rules routinely restrict use of datasets to approved ones, almost certainly yes but you would have to check to be certain.

The prize was $10,000, not "fun".

zentiggr · on Jan 23, 2020

I think you misunderstood that the team in question scraped prospective test profiles from the website in question outside of the competition data flow. They blatantly cheated the competition process using data that was not supposed to be available to them.

Townley · on Jan 23, 2020

For this particular problem, they were working on pet adoption timing prediction algorithms. The proposed solution wouldn’t work as efficiently in production as one of the other competitors. That leads to inaccurate predictions on pet adoption times, which increases costs for adoption centers, and maybe euthanasia rates among kill shelters. So animal lives would be impacted by this cheat.

More importantly, Kagggle does competitions across dozens of industries. If a culture of getting better at hiding your cheat pervades the platform, that could impact finance, transportation, and medical research. In those scenarios, lives would either be threatened or at least subject to sub-optimal systems.

nomadluap · on Jan 23, 2020

The problem was that there was a prize at stake, and everyone else was playing by the rules.

luma · on Jan 23, 2020

Not trying to be argumentative here, but is what they did actually against the rules? I'm not at all familiar with this competition which might be why I'm not quite so worked up about this, so maybe I have misunderstood something. Did the competition really require that they only train on the provided data?

I compare this to my favorite sport, Formula 1 racing. In F1, teams of engineers with nearly unlimited budgets spend an absurd amount of effort doing everything in their power to bend the regulations (the "formula") to squeeze out some extra advantage.

For example, in this past season Ferrari was suddenly outperforming the pack (and their own recent performance) and it was clear something had changed on the car, they had power in places they didn't before. What finally came down is a clarification of the rules around fuel-rate metering, without directly calling out Ferrari. After the clarification, Ferrari power was back where it used to be. Nothing more was said of the matter by the FIA.

What we all _think_ happened is that Ferrari, knowing the fuel rate meters ran at 10kHz, discovered they could pulse their fuel pump so that the low-end of the flow rate cycle happened during that sampling interval. This means they could increase their overall fuel rate beyond what was technically allowed, due to how that technical requirement was being measured on the car (and reported back to the FIA).

Is it in keeping with the spirit of the rules? Of course not! Does it make for an interesting engineering puzzle on top of an already-exciting sport? Sure does!

Clearly I'm in the minority here, but I think this sort of problem-solving approach can be useful. If you're looking to compete against a field of entrants who are all looking for obvious and well-understood approaches to solving the problem at hand, I think sometimes the best solution to stand out is to look where the other teams aren't looking.

NeutralCrane · on Jan 23, 2020

The issue is that the entire reason the competition exists is because the company is sponsoring it and putting forward the prize money so that the top performing models can then be put into production, thereby solving some problem the company has. This type of cheating is dishonest and against the spirit of the competition, but it also defeats the entire purpose of the exercise. Simply keeping a lookup table of answers for the data isn't machine learning, and will not generalize into a production system. As stated in the article, without these hacks, he wouldn't have even placed in the top 100.

To use your F1 analogy, this isn't the equivalent of tweaking the cars in whatever way possible is within the rules. This is the equivalent of completely cutting across the grass and bypassing 90% of the track, which is indeed illegal and would get you penalized.

throwawayjava · on Jan 24, 2020

> this was a _brilliant_ hack of the system

Jesus fucking christ. He fucking ran MD5 on some shit he pulled down from a web crawler.

Over-training a model on the validation set would be a lot more "brilliant", and even that is a dumb script kiddie level hack. Maybe finding an algorithm that computes weights s.t. the preimage of the training algorithm on the training set matches the result of training with truly random weights using the validation set. That could be a "clever hack". And even then _brilliant_ would be.... a real fucking stretch.

Can_Not · on Jan 23, 2020

That reminds me of another _brilliant_ hack I had my kid do at a youth checkers tournament. I got him to swat the other players pieces off of the board so the opponent would eventually run out of pieces before my child. I think the opposing kids parents were biased against me just because I'm an inventive genius.

mellosouls · on Jan 23, 2020

I'm surprised by some of the overly sympathetic comments here, the guy cheated, not "cheated" etc.

Of course, we're all human, and he's come clean, but his actions potentially had a negative effect on the non-profit and the animals it places; and competing talents were denied their rightful places.

This comment isn't about condemning him or anything, just let's be honest about what happened here; it wasn't ok, or just system-gaming caught out.

archi42 · on Jan 23, 2020

> competing talents were denied their rightful places

Absolutely. What are the odds he (and/or others in his team?) did this just for this one competition? It's possible, but unlikely he/they invented this (or other) code hiding technique(s) just for this single occasion.

Also, the employer did the right thing and kick him out. Now they only have to scrutinize the last few months of his work instead of looking over his shoulders the next few years.

75dvtwin · on Jan 23, 2020

It was briefly discussed about 7 days ago.

https://news.ycombinator.com/item?id=22045696

I posted there, same self-addressed question, that I cannot figure out the answer to...

It seems that intensives to cheat, and environment where 'means justify the ways' -- are overpowering.

For people who are naturally gifted, successful at young age -- why cheat?

Was this historically, always like this?

These insensitive to cheat, to gain unfair advantage, to treat life opportunities without any 'honor code' just seem to be so pervasive now, it seems.

There is a cheating scandal every other week involving most prestigious institutions, competitions, and so on.

These incentives to cheat, basically destroy from inside our commercial model, academics, judicial system, political system and probably military too.

This also creates a new type of powerful currency, and therefore the 'billionaires' in that currency have infinite power -- and that currency is 'dirt on somebody'.

Dirt on somebody who cheated before -- forever makes the cheaters into tools of injustice.

---

Public shaming is reactive, there we need something more proactive at various points. There are needs to be incentives for work verification, as an example.

I also think it is unfortunate but at least civil/commercial law in many countries is pretty much riddled with 'more expensive lawyers produce better results'. And it skews society into basically thinking 'anything goes, really. means justify the ways, and cheating something one can get away with'

archi42 · on Jan 23, 2020

Hello fellow non native speaker. You don't mean "insensitive[s]", but " incentive[s]".

On topic: My university offers "Ethics for Nerds [=CompSci]" lecture. The lecturers have degrees in both CS and Philosophy, and the stated goal is to make compsci students more aware of ethical implications - plus giving them some tools/thinking to assess these implications.

75dvtwin · on Jan 24, 2020

ah, thank you for the correction. It seems that I miss-spelled it different ways too. I guess, my lack of attention to detail, could not be compensated by re-reading my post 3 times :-).

I also took an ethics course, but in there it was mostly about AI impacts on society, and what will happen when people loose jobs that will be automated away...

I should keep up to date on it, as mine was many years ago.

After all, being a computer programmer has to be more than about VC funding, mobile apps, functional programming, AI and kubernetes. :-)

To me, the seeming prevalence of cheating through out the society, and its tacit encouragement, by lack of effective proactive and reactive deterrence - is a cultural, as we well legislative problem.

cowsandmilk · on Jan 23, 2020

> For people who are naturally gifted, successful at young age -- why cheat?

Many are used to being the best. There’s an old saying that 90% of students aren’t in the top 10% at Harvard. When your entire life, you’ve been the smartest kid in your class, it is a tough adjustment when that is no longer true. So they try to find ways to get that top rank again.

How can you ensure your kid can handle this? Make sure they are exposed to situations where they aren’t the best and reward them for giving their honest best.

throwaway100773 · on Jan 23, 2020

This is probably true but not an excuse for the perverted behavior and incentives that result. If 90% of the students at Harvard aren’t in the top 10% but feel entitled to be and thus cheating becomes a more attractive solution than taking the medicine of reality - clearly the penalties for cheating aren’t quite high enough.

AstralStorm · on Jan 23, 2020

Penalties don't work when only a few cheats are found.

In this case, the cheat was obvious, but how do you detection enrollment cheats or totally bought research projects? It's not even funny how little you can do here.

Even plain and obvious plagiarism is being missed.

Tenoke · on Jan 23, 2020

>Was this historically, always like this?

Of course. 20$ bills left on the floor are bound to be picked by someone even if most won't.

..At least as long as the expexted punishment comes down to less than the value of said bills.

jl2718 · on Jan 23, 2020

I have a 5€ note I found blowing in the wind while running at an abandoned construction site. I asked everybody in the vicinity if they dropped it, but I still feel awful about picking it up, because I didn’t and don’t feel I can do justice by dropping it. It’s made me realize I may have moral failures from earlier in my life that I never bothered to commit to memory because I wouldn’t have thought of them that way. A child would have felt glee, and I don’t know when that should have or did change in me.

cr0sh · on Jan 23, 2020

I'm not sure why you should feel guilt here - you actually went above and beyond what most people would have done in the situation. Most people, finding money on the ground that nobody was trying to pick up themselves, would have just picked it up, pocketed it, and walked on.

And honestly, there should be no guilt in that. Now, had there been someone going around saying "hey, I had this money here and the wind blew it out of my hand, have you seen it?" - and they named the denomination or something that made you know that the money you picked up belonged to them - and you didn't say "why yes, I found this over there; here you go!" and handed it back...well, you should feel guilty knowing you had taken their money - even if they didn't see you do it.

But if nobody is actively looking for it, then heck - what can you do, and why feel guilt over it? Sure - you could have left it lying there, and if everyone did that, maybe somebody could retrace their steps to find where they dropped it?

Or for all you know, it was dropped, not noticed, and the wind or a passing vehicle picked it up and flung it hither and thither and even if the person knew they had dropped it, they had no way of finding it. You could feel guilt - but now you are feeling guilty over something that is virtually unknown to you; you don't even know if the person who lost it even knows they did, or if they even care...

But again - it's a different thing if you see it happening. As a kid (I was probably 12 years old), I was once in a bowling alley when I was walking behind a man, and I noticed a large amount of money (bills) fall out of his back pocket, and he didn't even notice one bit. We're talking several 50 dollar bills. I saw it. I saw him walk on. I went over to pick it up. It was a lot of money...

I grabbed it all and ran to him, "Mr! You dropped all of this!" and handed it back to him; he was super thankful. I was glad I did it, because I know most other kids wouldn't have done that - heck, there are adults who wouldn't.

But I had seen the money fall out of his pocket - I knew it was his. Had I kept it, I know I would have been stealing from him, and would have felt guilty. It was the right thing to return it without any expectation. To this day (decades later) I am glad I did what I did - that very well could have been his paycheck for all I know.

Now - had (somehow) that pile of money been there and I saw it, and nobody else had claimed it? Well - even today I'd probably take it to the "lost and found" - somebody loses that kind of money (or item - like a wallet, phone, etc) - that's the right thing to do. Only if nobody ever claims it, and it's returned to you as "unclaimed" - should you take it. Even then, maybe it would be better to donate it to a good cause than to keep it.

Definitely an ethical and moral dilemma - and here I don't know if I've just refuted myself, or whether I have helped anything for you, or if I've just made myself question my own failings or faults. Probably a bit of everything - and maybe that's for the best. So thank you for your post - I think.

vladislav · on Jan 23, 2020

They would have able to win and get away with it if they incorporated the knowledge of the external dataset directly into the ML model, provided they had a reasonable estimate on the fraction of overlap between the external data and the test set. A weak version of this would be to just train on the external data in addition to the provided data. A stronger version would train regularly on the provided training data and in addition overfit on a random subset of some percentage of the external data (with some small random prediction error thrown in to obfuscate), which would get equivalent results to what they did with logic.

rahimnathwani · on Jan 23, 2020

"A weak version of this would be to just train on the external data in addition to the provided data."

In this competition, the training code was run on Kaggle's system, so you'd still need to smuggle in the extra data.

m3kw9 · on Jan 23, 2020

Part of the reason h2o.ai fired him? He was a cheat, ok, but you also cheated so stupidly

alanfranz · on Jan 23, 2020

This.

You've got the testing set. Create random HPs and tune them to fit. The way they cheated is stupid.

And the way the testing set can be obtained is silly.

oakhaven · on Jan 23, 2020

This is a really good point!

Considering the guy was smart (he is kaggle grandmaster), I would really like to know what prevented him from training on the scraped data, and what motivated him to obfuscate the known sample lookup.

Maybe there's some technicality they made it impossible to tune the model on the additional scraped training data.

niceworkbuddy · on Jan 23, 2020

My two nuggets. This reminded me of Ijon Tichy's saying after one of his voyages:

" Thus concluded one of the most unusual of my adventures and voyages. Notwithstanding all the hardship and pain it had occasioned me, I was glad of the outcome, since it restored my faith, shaken by corrupt cosmic officeholders, in the natural decency of electronic brains. Yes, it’s comforting to know, when you think about it, that only man can be a bastard. "

(source: The Star Diaries, Stanisław Lem)

reedwolf · on Jan 23, 2020

Always happy to see a Stanislaw Lem reference.

sytelus · on Jan 23, 2020

I'm not going to defend Pleskov but organizers shouldn't have put out the competition with money attached that can simply be solved by scraping data. Good ML competition in fact should even invite cheats because the end goal is not ML for the sake of ML but rather cracking the prediction problem by whatever shortest path possible.

halflings · on Jan 23, 2020

He scraped the test set's labels. How is that useful? It's not about "ML for the sake of ML", it's the equivalent of stealing the answers to a math test then writing them down. Why should that be rewarded?

The goal of this competition is to build a system (using ML or not) which is useful for predicting how quickly pets will be adopted. Any information used during the competition should be realistically available at inference time for future predictions... clearly, the expected answer cannot be available at the time you're trying to predict when a pet will be adopted.

nl · on Jan 23, 2020

People do scrape test answers and use them to optimise hyperparameters in ML models and blends of them in Kaggle. That's cheating, but at least they built a learning system here.

Gatsky · on Jan 23, 2020

I would have a different interpretation. If he cheated once, he has most likely done it before and since the competition in question. Usually when 'good people' cheat there is a series of escalating transgressions before they get caught.

minimaxir · on Jan 23, 2020

He in fact did cheat previously with a similar technique: https://www.kaggle.com/c/quora-insincere-questions-classific...

Gatsky · on Jan 27, 2020

Right, well there you go...

nl · on Jan 23, 2020

He scraped the test set, then encoded the correct answer in the id.

If one can excuse scraping the data to build a better model, I can't see how one can excuse this.

He wasn't really making predictions at all, just looking up the answer.

And the goal really is a prediction system, not something that looks up previously known answers.

mannykannot · on Jan 23, 2020

"Winning" by anything that can reasonably called cheating, as in this case, does not advance the general state of the art. Innovation is best served through appropriate rules and competition structure.

sytelus · on Jan 23, 2020

Yes, and that's the right thing to do in the academic research setting ("advance the state of the art"). But the public competitions with monetary rewards are not the same setting. I can imagine scenarios where the guy stole the test set from Kaggle servers (i.e. unlawful access) should disqualify him permanently. But the essence of the competition should be the focus on cracking a given problem, not about a specific technique.

One test a good of ML competition: Can it be solved by simply hiring lots of humans to make predictions without incurring significantly more costs than the prize money?

joe_the_user · on Jan 23, 2020

One test a good of ML competition: Can it be solved by simply hiring lots of humans to make predictions without incurring significantly more costs than the prize money?

What value to the organizers, to society or to whatever are you imagining coming out of a free-for-all style competition?

I think the organizers now imagine that the result would identifying good, generic prediction algorithms along with identifying good AI programmers capable of producing general prediction algorithms.

It seems like the contest framework already has become a bit problematic through context winners just being good at contests and not otherwise achieving anything.

But what are you thinking of? There are already hacking competitions btw.

sytelus · on Jan 23, 2020

Again, I'm not defending Pleskov. If he had come forth with the hack, things would have been different. Instead, he pretended that he had ML solution, pocket the money and put an extraordinary effort into making sure that people can't actually figure out his true doings. He was disingenuous, fully self-aware that he was in the wrong and did his best to cover up his tracks. It wasn't fair to other competitor and it was most certainly not fair to the organization trying to do something good. So yes, Pleskov, remains indefensible.

sdenton4 · on Jan 23, 2020

I disagree... The hope with competitions is that we learn something new through massive semi collaborative exploration. Raising the barrier to hosting useful competitions means we learn less.

sjg007 · on Jan 23, 2020

You see the same thing in grad school.

mannykannot · on Jan 23, 2020

Indeed you do, and I know someone who got screwed by a couple of plagiarizing cheaters. This probably contributed to his abandoning his studies and possibly to his suicide. I take a dim view of suggestions that cheating is reasonable, let alone that it is the smart option.

rasz · on Jan 23, 2020

No doubt he is brilliant. Many investment banks will be happy to have him. He will fit right in something like Merrill Lynch or Goldman Sachs.

trhway · on Jan 23, 2020

>No doubt he is brilliant

How? According to the article his model without the cheat rated at ~100th place, and the article mentions him cheating the same way before (by scraping Quora for some Quora related competition).

zkid18 · on Jan 23, 2020

He made a good career in quant finance before diving into ML and Kaggle

Traster · on Jan 23, 2020

I wouldn't necessarily describe him as having a 'good' career in quant finance. He spent 18 months as a quant and then left to go and cheat at ML competitions. It's not a traditional sign of a successful quant - giving up after 18 months.

zkid18 · on Jan 24, 2020

Afair, he worked for about 3 years in two HFT funds. Initially he was hired as a quant researcher but lately moved to business role.

Thorentis · on Jan 23, 2020

> The goal was to create an algorithm that could predict how quickly a pet would be adopted based on its profile details, from its photo to its breed, sex, size, age, and whether it had been vaccinated or not.

> These predictions would be used to optimize and tweak future critters' profiles so that they are adopted as soon as possible.

Sorry but, how is this useful? You can't just change the age of an animal to make it more likely to be adopted. The profile is meant to be an accurate representation of the animal so people know what they're getting. What exactly was the algorithm meant to achieve aside from being a predictor?

singron · on Jan 23, 2020

You can use this to select which pets to put on a platform. For instance, no-kill shelters have to decide which animals they intake since they have finite room. They can save more animals if they pick animals that are likely to be adopted quickly. Obviously, kill shelters have a similar calculus when deciding which animals to cull (and indeed, animals that don't fit in the no-kill shelter go to the kill shelter).

I'm not sure how this website manages "inventory", but they might have similar problems.

endorphone · on Jan 23, 2020

I'm pretty sure so-called no-kill shelters don't outsource their killing by simply refusing less adoptable animals. And if this contest were advertised as "help us decide which animals to kill first" it probably wouldn't gain traction.

This contest sounds ridiculous. It sounds like an attempt to get in on that AI gravy but do so with some sort of feel-good element. Only there is no feel good to it, and the basic premise seems outlandish.

kaikai · on Jan 23, 2020

> I'm pretty sure so-called no-kill shelters don't outsource their killing by simply refusing less adoptable animals.

They do, though. That's how they are able to limit the number of animals they have at any given time.

endorphone · on Jan 23, 2020

They limit the number of animals they have at any given time by not taking in more when they are at capacity. This is very different from running a DNN model on every applicant and refusing to intake those that aren't adoptable enough, which is a preposterous concept.

And just to provide the full picture, most no-kill shelters of course have scenarios where they euthanize -- violent animals, sick animals, etc -- but they don't need a neural network to accomplish this.

This is all neither here nor there, as the contest had positively nothing to do with any of this. Instead they wanted to determine the most adoptable traits so they could adjust the less adoptable traits with the more adoptable traits: The poodle goes through the hair straightener and gets a blonde hair color treatment (clearly I am being satirical) to make it more like a lab, for instance.

hcknwscommenter · on Jan 23, 2020

I mean, isn't this "outsourcing the killing" a given? No-kill shelters simply don't have infinite resources. Some animals are simply not adoptable.

Normal_gaussian · on Jan 23, 2020

Its actually very useful.

A limited number of parameters can be genuinely altered - a better photo can be taken, and vaccinations can be administered for example.

Animal rehoming centres have to balance throughput with cost; reduced per pet costs mean that they are able to expand or support more complex cases.

Whilst keeping a pet in a "space" and feeding it does cost, this cost can easily be significantly less than vaccinating, particularly as some vaccines require a few days hold post vaccine. Similarly if the vaccine will not alter a pets rehoming chance, then it is an unnecessary cost.

Pictures may be more easily applied as a tighter feedback loop (of the 5, use the 3rd) however they may also indicate other issues that could be addressed (over / underweight, coat damage, etc.) and addressing those issues have costs to balance and predict.

pbhjpbhj · on Jan 23, 2020

This sounds like it would generalise to healthcare decisions.

cjauvin · on Jan 23, 2020

Exactly my thought when I read this also, and I think it generalizes to a lot of cases where the value of trying to predict something is dubious at best, when there's nothing obvious you can do with the prediction itself.

Thorentis · on Jan 23, 2020

Sadly, many ML comps involve exactly this. Making a predictor for something, where the prediction has no real world value, and even dubious value at a thought experiment level.

I think one of the biggest skills to have in the ML space, is knowing what is worth training a model to know, and what isn't. Just like in engineering, the most successful products are those that solve a real world problem, no matter how elegantly the others might have been made.

mirimir · on Jan 23, 2020

Unless it's a totally no-kill shelter, the results would presumably help determine which animals to euthanize.

tgv · on Jan 23, 2020

I have my doubts too, since there are quite likely ceiling effects. Perhaps the competition was set up out of desperation, or because the non-profit still had 25k to burn who knows?

But I would be really interested to see if it really has an effect, and if that effect can be sustained.

cowsandmilk · on Jan 23, 2020

> because the non-profit still had 25k to burn who knows?

I think this is the closest to the truth, but that Google spent the money.

Google wants to show usefulness of ML. Marketing person comes up with competition and backs it. Pet adoption org just has to provide some data and says why not? At the very least, it is free publicity for minimal effort.

mxcrossb · on Jan 23, 2020

Well optimizing the photo seems a sensible task for AI.

rasz · on Jan 23, 2020

The purpose was eugenics, on animals this time around.

dlkf · on Jan 23, 2020

A large number of commenters fundamentally misunderstand what happened. They are saying "why did he upload his scraped training data? Why didn't he just train on it and upload the resulting model?" If you are making this argument, it means that you don't understand the contest. User rahimnathwani explains:

> In this competition, the training code was run on Kaggle's system, so you'd still need to smuggle in the extra data.

The question then becomes, how do you smuggle in the data? This is a much more interesting discussion than pontificating about the ethics of Pleskov's actions. In particular, a better understanding of this problem could have ramifications for how Kaggle could combat hacks of this variety. (By contrast, "shame on him" and "aww but he's a nice guy" are both useless, except perhaps as a form of virtue signalling).

It's essentially a cryptography problem. Does anyone know if this has been widely studied?

KaoruAoiShiho · on Jan 23, 2020

AFAIK it's impossible to smuggle it in a way that you can't be caught. Maybe the use of MD5 made it easier to see but if examined I don't think there's a totally hidden solution.

AstralStorm · on Jan 23, 2020

Yes, he could have trained a network to recognize them and return memorized values. It would not be nearly as obvious.

Solving this would require changing the contest so that it comes with algorithm and instructions only (no data files, entropy checks) and is trained by contest operators.

dlkf · on Jan 23, 2020

This would also be very fishy to anyone who inspects the source code. Immediately they would ask "uh where did this giant file of floats come from?"

One approach would be to write some handcrafted rules/features that look likecthey were plausibly a priori, but have the effect of memorizing the scraped data. (I don't know if this is actually possible.)

User KaoruAoiShiho is probably right that any approach along would look out of place. Coupled with the fact that its removal would massively boost accuracy, it's hard to imagine how this would get past a curious reviewer.

Perhaps peer review should be a component of the Kaggle process.

eanzenberg · on Jan 23, 2020

I never understood Kaggle. Most competitions don't require code to be submitted, just predictions to be made on a test set with missing labels. So, you don't even need to apply machine learning and I'd bet money that lots of winners don't and label by hand or outsource. I don't understand the fascination and appeal of these so-called "grandmasters".

MasterScrat · on Jan 23, 2020

> I never understood Kaggle. Most competitions don't require code to be submitted

Your point of view is outdated ;-)

In recent ML competitions, participants do submit code that is run on a held-out dataset - as was the case in the PetFinder.my challenge in question here.

Most competition platforms are migrating to this format, as otherwise you can just label by hand as you said.

Note that this competition went even further: not only was the evaluation code run on Kaggle, the training code was also run there. This means that you couldn't even train a gigantic model then submit it: your model had to be trainable within well defined time and resource constraints, which is a great way to level the playing field.

Of course, there's still some unfairness as people with more resources can try out more solutions before submitting a model to be trained on the platform. No platform has a solution for this yet!

blurps · on Jan 23, 2020

Winners have to submit code most of the times. There are sporadic computer vision competitions where it is possible to hand-label (such as geo-int competitions where people reverse engineered the flight paths), but vast majority it is impossible to hand-label. Nobody becomes a grandmaster without applying world-class machine learning skills.

The "cheater" in this competition is both a world-class data scientist and a reverse engineering hacker. Heck, people used to write papers about how they crawled the ground truth. Now they see their name in the newspaper.

throwawayjava · on Jan 23, 2020

Without very careful contest design, the best performers are obviously going to be over-fitting. Especially if the entire distribution is public. That's exactly what this team did.

This is true of academic contests in general, btw, even without cheating. They stop being interesting/fun/good signals as soon as people start treating them as an independent skill set. Comparing performance on chess games for the first N games between two new players might be a good signal for some general intellectual capabilities. Comparing experienced players against one another is mostly just testing who's spent more time learning about chess.

gojomo · on Jan 23, 2020

Before reading they'd scraped public data that was likely to be the "hidden" evaluation set, I thought they might have cheated using Python introspection: inspect the caller's frame, find some variable already loaded with the expected answer, return that.

Has anyone cheated at Kaggle/similar using that approach?

anoncareer0212 · on Jan 23, 2020

No, and this pattern of thought is bizarre, lacks technical grounding, and more importantly, scruples

samatman · on Jan 23, 2020

The comment you're responding to isn't cracking, it's penetration testing. Hope that helps.

meowface · on Jan 23, 2020

...What? Their post seemed very reasonable. This seems like a valid concern.

NeutralCrane · on Jan 23, 2020

I was pretty surprised by how common this type of behavior is on Kaggle. I work in machine learning and data science, but I don't use Kaggle much because it quickly became clear competitions boiled down to who could eek out the last hundredths of a percent in accuracy from models trained for weeks on multithousand dollar machines, and because the behavior described in the article was surprisingly common.

That said, the site is a fantastic resource for datasets. Lots of fantastic data uploaded both from old competitions and by the community.

QuercusMax · on Jan 23, 2020

Is a "multithousand dollar machine" supposed to be expensive? Any company with even half-decent resources should be able to put hundreds of thousands of dollars of hardware toward training a model.

NeutralCrane · on Jan 24, 2020

It is when the competitions are aimed at the machine learning practitioning public, and not companies themselves. If having a machine with a bare minimum of $3000 of GPU power is the entry point for having a competitive model, then most of these competitions aren't actually a matter of coming up with a unique and clever model. It's simply throwing the biggest neural network you can at it. Which is fine, but it definitely flies in the face of the perception of Kaggle.

shkkmo · on Jan 23, 2020

Am I missing something? It seems like "adoption speed" really isn't something you want to over optimize.

The whole point is to match abandoned animals with suitable homes. Adoption speed seems secondary to post adoption measures of adopter satisfaction and adoptee welfare.

xiphias2 · on Jan 23, 2020

It looks like the shelter has limited capacity, and too many dogs to be taken care of, therefore the faster dogs are adopted, the less need to be euthanized.

Also what you are writing are extremely sparse and weak signals compared to adoption speed.

TrackerFF · on Jan 23, 2020

It's a classic optimization problem. With high speed, but poor suitability, animals are bound to return. With low speed, but good suitability, there's low return rate - but slow turnover, which leads to overcrowding and rejection of new animals.

Like in business and tech, you want a high as posible throughput / inventory turnover rate - but obviously with constraints.

jeffshek · on Jan 23, 2020

Adoption speed is probably the only proxy that could be used. Measuring quality of pet life in suitable homes is rarely collected.

I'm not even certain of how that could be objectively measured for children, let alone pets.

singron · on Jan 23, 2020

They probably don't have a way to measure that reliably, and adoption speed might make more sense to their bottom line.

bitxbit · on Jan 23, 2020

Never understood Kaggle. The Netflix Prize was great. Now it’s just people gaming Kaggle to get a job.

endorphone · on Jan 23, 2020

The Netflix Prize was interesting, and drew a lot of attention to it, but ultimately wasn't that simply stuffed into the trash bin? Not long after that, Netflix basically abandoned both user ratings of significance, and realistic recommendations. Now it's just a nonsense engine with some sort of meaningless overlap or whatever they call it.

thrwaway69 · on Jan 23, 2020

I guess they realised giving user swhat they want isn't necessarily profitable for business.

blurps · on Jan 23, 2020

> into the trash bin?

They did not implement the winning solution as is, but a lot of good stuff came from the winning teams, including techniques (such as SVD) that are in use as of today (or maybe a few years back).

> just a nonsense engine

No, it is safe to assume the recommendation engine of Netflix is close to the state-of-the-art. A lot of money and talent went into it.

endorphone · on Jan 23, 2020

"They did not implement the winning solution as is"

Shortly after that contest finished Netflix removed the five star rating system, and dramatically subdued the recommendation engine. Now the vast majority of the content surfacing is universal beyond some small category filtering (e.g. You like crime dramas and horrors so here's a bunch of the most popular stuff from those categories).

Now they have a "match" rating that I have not met a single person who finds useful (it is almost at the point of farce and seems more like a randomization engine).

A lot of money and talent went into something that Netflix clearly decided just wasn't important or useful enough for them. Now they just push Don't F*ck With Cats on everyone.

bitL · on Jan 23, 2020

> Now it’s just people gaming Kaggle to get a job.

How is that different from "LeetCode PhD" doing "DP Hard" for 6 months to get into FAANG?

omarhaneef · on Jan 23, 2020

Feels like you just answered your own question. (Which happens a lot on HN)

raverbashing · on Jan 23, 2020

I don't see a problem with that. It seems to me most companies sponsoring the contests have legitimate problems they wanted solved.

Now it really illustrated to me how data science is still an incipient field. Even the basic Titanic example has a lot of fake solution and a bit of "cheating" (overfitting).

And don't get me started on the multitude of tutorials and examples that only show the loss decreasing graph but the actual results are not evaluated (and thus might have some glaring defects).

blurps · on Jan 23, 2020

The Netflix Prize winners regularly compete on Kaggle. One of the top 10 Netflix Prize winners joined Kaggle early on (he got a job out of it). Kaggle competitions are a game. Most notorious players already have a good job.

bitxbit · on Jan 23, 2020

I think the original intent was great. And you can learn from it still. However, as the job market for data scientists heated up, Kaggle became something else entirely.

Lev1a · on Jan 23, 2020

One of the Kaggle competitions was used as an assignment for the Statistics course at my university. IIRC it was really fiddly.

bitL · on Jan 23, 2020

It's pretty common these days, many courses at top universities feature a Kaggle competition/datasets.

laydn · on Jan 23, 2020

Slightly off topic: The nature of the competition is a bit worrying to me.

They're essentially letting an algorithm decide which dogs have the best chance of adoption and which to euthanize, aren't they?

Traster · on Jan 23, 2020

Having read the original description of the challenge[1] I don't think you're correct. I think what they're doing is trying to identify the types of photos and descriptions that are successful so they can do more of them. Like: Do you want the dog bounding through a field or do you want them snuggling up on someone's lap in the photo? That sort of stuff.

I mean, you're right, you could just run the tool, find the ones that are unlikely to get adopted and euthanize them, but I don't think there's any reason to believe that's actually their intention.

[1]:https://www.kaggle.com/c/petfinder-adoption-prediction/overv...

nnq · on Jan 23, 2020

THIS! Drop the "slightly"... I can't understand how people focus so much on the competition cheating, and so less on "wait, wtf are they doing here"... I mean, even if they are not deciding whether to euthanize or not based on this, you're still building a system that introduces "good looks" as a factor in a life-and-death decision regarding a living being.

It's not hard to jump from this a system that would use your facebook file to grant or deny medical health coverage or a similar life-and-death thing. Shift the Overton-window a little, push is a few notches further, and you're re-inventing phrenology with deep learning...

This is bone-chilling! I mean the fact that so many people overlook this...

lolc · on Jan 23, 2020

The euthanization is already there. Machine learning can't justify what economy requires. The only thing machine learning promises here (and you're free to doubt the effect) is to improve adoption rates and thus reduce euthanization.

Of course this is highly problematic when applied to humans. But to be clear, some humans are already judged by models (transparent ones so far) in live and death decisions. That's how organ transplant decisions are taken. The candidates with the best prospects get the scarce organs. Similarily, when looking at populations, decisions that protect many are often taken in full knowledge of the danger to a few. Vaccination for example.

Resource optimization problems mix badly with absolute morals.

alanfranz · on Jan 23, 2020

Are you sure? Aren't they just trying to show the most appropriate pets to the most appropriate people in order to make adoptions quicker? Where do you read the euthanization part?

dh5 · on Jan 23, 2020

Unfortunately, most pet shelters get more pets than their facilities are able to handle. Unless they're a no-kill shelter (in order to be considered one they need to kill <10%), they will need to euthanize in order to make room for incoming animals. This is as opposed to promoting adoption, fostering, etc. An algorithm that can prematurely decide what animals will be adopted, or have the best chance to be, can create a perverse incentive to euthanize early the less optimal pets.

quickthrower2 · on Jan 23, 2020

Wouldn’t it be smarter to use the hidden data as training data rather than hard code it in?

blurps · on Jan 23, 2020

This only works when you don't win. You have to upload your code, including model training code, they won't accept a trained model binary.

_pd19 · on Jan 23, 2020

If they were smart, they would have used the whole set for hyperparameter tuning. That would be essentially undetectable.

jeffshek · on Jan 23, 2020

Not if the underlying model was bad, no tweaking of hyperparameters can change that. It's safe to say, he probably did consider this (and it probably didn't work well enough).

tastyminerals · on Jan 23, 2020

Kaggle doesn't make a lot of sense to me. It's a good self education platform but that's it. It is kind of sad to see job interview candidates for ML positions whose only ML experience is Kaggle.

prodent · on Jan 23, 2020

Where/how else would you get that experience if your current job does not involve ML?

m3kw9 · on Jan 23, 2020

Also there may not be much correlation with profiles and speed of adoption. Why don’t they prove that first then have the competition instead of assuming it would be the solution?

sandGorgon · on Jan 23, 2020

I remember reading about this on Twitter because of a reply from h2o.ai account.

https://twitter.com/ppleskov/status/1215983188876709888?s=19

The person was originally employed at h2o.ai and as a consequence of this was fired. Not sure if that was completely appropriate.

Wasn't this a personal participation? Or are there "company teams" on Kaggle ?

daenz · on Jan 23, 2020

>Not sure if that was completely appropriate.

In my opinion, keeping him onboard compromises h2o.ai's image as a trustworthy AI company. To many people, AI is magic, so the last thing you want is the impression that the magic is really a con.

throwawayjava · on Jan 23, 2020

> Not sure if that was completely appropriate.

He has only been at h20 for a few months.

He was almost certainly hired at least in part because of his participation in the contest.

ramraj07 · on Jan 23, 2020

The article says people mention kaggle grandmaster in their CVs and H2o uses this to hire them as well. This is essentially lying on the resume then.

mantap · on Jan 23, 2020

This is a perfectly valid reason to fire someone, even in Europe.

If you admitted to cheating on your degree, and your university found out and revoked your degree, then you'd probably expect to get fired from your job if you no longer have the qualification you said you did. Being a top 10 kagglester is a qualification (and a highly prestigious one at that).

starpilot · on Jan 23, 2020

At will employment.

C1sc0cat · on Jan 23, 2020

"bringing the company into disrepute" is gross misconduct and normally a firing offence, even in countries with more liberal employment laws than the USA

nootropicat · on Jan 23, 2020

How is this not criminal fraud for $10k? He deserves to go to prison. "Boo cheater" would be an appropriate response if he did it purely for ranking, not for money (or something easily sold for money).

elteto · on Jan 23, 2020

How _is it_ criminal though? Please do tell which law his team broke here.

Put down the pitchfork and calm down. The guy is a raging a* and a cheater, but a criminal he is not. He lost his job, was publicly shamed and his reputation is tarnished basically forever. I think he got enough coming his way.

jpdus · on Jan 23, 2020

Original submission:

https://news.ycombinator.com/item?id=22012763

mmhsieh · on Jan 23, 2020

Is there a public list of all known Kaggle cheats?

quickthrower2 · on Jan 23, 2020

Can you write a ML program to predict if someone is cheating based on their submission?

nourse · on Jan 23, 2020

Yes. Well I call it ML, but in reality it just checks HN for articles about their username cheating.

bitL · on Jan 23, 2020

Better, make it based on your e-mail:

"Sorry, you are likely a cheater, we logged you out and banned forever."

mmhsieh · on Jan 23, 2020

If only some organization has a large dataset of ML algorithms and identified cheaters...

jungletime · on Jan 23, 2020

This contest was probably BS anyway. Google wanted to solve an analogous problem, not pet adoption. As such, both parties cheated.

jl2718 · on Jan 23, 2020

Just a reminder how unique the conditions were around the discovery process, which suggests something a lot more common.

fizx · on Jan 23, 2020

Quick, give him a growth-hacking gig!

C1sc0cat · on Jan 23, 2020

Black hat seo will get your site tanked no sane seo agency / client would take some one like that on.

harry8 · on Jan 23, 2020

tl;dr

Competive ML model grading with a common training set using unseen data.

Cheat was to scrape the data that would then be used as unseen by the organisers. Unseen is now seen for this model. Then, instead of training the model with the "unseen" data, which would have been cheating and an advantage it apparently wasn't enough of an advantage so they hard code 10% of the cases to boost metrics and win.

Having more data to train your model is google & facebricks competitive advantage. Their attempts to use that advantage for something actually useful to society rather than just as a method of selling ads seems to have been a complete bust so far. If that is wrong you know better, please link us up.

I'm suspicious that their predictive power to sell ads actually works for the people who buy those ads but I guess we aren't likely to know for sure. I do wonder "who dominated their industry segment in sales by being an early adopter of google ads" I don't know anyone. It's not a great metric but what else do we have?

_pd19 · on Jan 23, 2020

They train winning code from scratch, I think, so you wouldn't have been able to just train the model with the "unseen" data.

harry8 · on Jan 23, 2020

They scraped data from the website of the org that wanted the results and funded the competition. That data was supposed to be unseen for the competitors and used to grade the models. This cheat was to use that scraped data in its training set and, beyond that, hard code some predictions.

It's looking up the answers in the grading sheet while taking an exam.

iudqnolq · on Jan 23, 2020

Your parent is saying kaggle takes the training code and runs it themselves. This makss secretly training on the known right answers impossible.

aratauto · on Jan 23, 2020

This competition allowed submissions to include extra data files that can be used by the model. The cheaters added a file with data from another website that seemed innocent, but secretly encoded extra information (perfect answers) in IDs. For 10% of predictions, the code via a set of obfuscated operation retrieved this information and presented it as the answer.

harry8 · on Jan 23, 2020

There is no such thing as "training code".

You have a model that you have trained on some provided data - the training set. You give kaggle this model. Kaggle grades your model on some different data your model has never seen. The better your model classifies this data it hasn't seen the higher it scores and the more money you win.

So again if you trained your model (code) on a training set that you have illegally obtained. That is broken the rules of the competition to get the additional data. Data that is also going to be used for competition verification and grading, then you have cheated. Doubly so if you just hardcode outputs to boost your score, which they did here.

They took the official training set. Said, we need more. And scraped websites to get a bigger, illegal training set. This is against the rules and is cheating. They got caught.

It really is equivalent to looking up the answer sheet while taking an exam.

Hercuros · on Jan 23, 2020

People are saying that you do not just submit your trained model to Kaggle. You also submit the code that was used to train the model from the training set, which is used in the winning models to train them from scratch on the training set. Of course, that wouldn't have prevented this type of cheating of course, but it does mean that you can't submit a model that was trained on your own private data set.

harry8 · on Jan 23, 2020

You can overfit a model to your hold out set quite easily with repeated trials. It's a trap you have to avoid in normal circumstances! (Feynman: "You are the easiest person to fool"). Even if you have to submit code to generate your model parameters from "the training set" (which hasn't been explained at all well by "People" if that is indeed the case) you could do that overfitting deliberately here with the illegal unseen data as your hold out set. Aside from the advantage of a bigger training set. Aside from the advantage in model selection, which is not done with code from a training set. Aside from the advantage to your feature engineering also not done in code. Aside from the advantage to your regularization choices, bias parameters etc etc.

So yes you absolutely /can/ submit a model trained on your own private data set even if what you submit is a model code that will be re-trained. Even if "the training set" is different to the provided - you still have that scraped data so you can slice it up with the provided training set so that any selected training set does well against the rest. Now the overfit you've just carefully engineered should win against the honest models unless you suck, right? It's kind of risible that they had to go further and hard code certain results, don't you think? Perhaps if they still couldn't win with scraped, illegal additional data then everyone else had illegal data too? Perhaps Kaggle is not a good indicator of how good ML techniques are in practise? Perhaps Kaggle systematically overstates ML effectiveness due to this kind of uncaught cheating in many of their competitions? I bet kaggle won't look too hard at that.

iudqnolq · on Jan 23, 2020

> There is no such thing as "training code".

I'm by no means an expert in ML, but my understanding is there's some code that is run to train the modal. I meant that by "training code". My regrets if my terminology was unclear.

> They took the official training set. Said, we need more. And scraped websites to get a bigger, illegal training set. This is against the rules and is cheating. They got caught.

No, this is wrong

I suggest you look at one of the other comment where people have explained why this is wrong. They did a better job than me.

https://news.ycombinator.com/item?id=22124760

https://news.ycombinator.com/item?id=22126489

https://news.ycombinator.com/item?id=22124193

harry8 · on Jan 24, 2020

No it isn't. See above. Overfittiing to a holdout set is so easy and common you usually need to take steps to avoid it. See my comment directly above. Best.

iudqnolq · on Jan 24, 2020

Thank you. I found your comment extremely helpful.

juskrey · on Jan 23, 2020

Kaggle has nothing to do with real world. It is a system which invites everyone to cheat it and to be cheated.

Similar with school: how many times you, excellent grades holder, was cheating because no one scrutinizes high grade holders too much?

floatingatoll · on Jan 23, 2020

> "For me, it was never about the money but rather about the Kaggle points"

Accumulated integer counts that represent social value and/or standing encourage destructive behavior.

smogcutter · on Jan 23, 2020

...he said, and then refreshed HN.

floatingatoll · on Jan 23, 2020

Propose HN: 'Countless' experiment – hiding all karmic integers: https://news.ycombinator.com/item?id=19745267

Perhaps someday we'll get a week without them.

olalonde · on Jan 23, 2020

I hope not. I actually miss seeing the karma count on individual comments, it made skimming a lot easier.

sillysaurusx · on Jan 23, 2020

No chance. HN’s goal is to have every effective programmer in the world checking it at least once a week. Anything that gets in the way of that is de facto bad.

(I’d love to be surprised, though.)

You can get the same effect with a Tampermonkey script I wrote: https://news.ycombinator.com/item?id=14456200

It even scrambles the karma count in your profile, so there’s no opportunity for karma to affect you.

bitexploder · on Jan 23, 2020

Everything in moderation, right? Including games we play for imaginary Karma. I must say, though, I generally like the way HN does it. It isn't nearly as in your face and I don't find that motivates my behavior very much.

Consultant32452 · on Jan 23, 2020

If I get too many positive karma posts in a row I start worrying I'm censoring myself too much. Positive karma usually means I'm just saying the most mainstream thing and that's boring.

floatingatoll · on Jan 23, 2020

The proposal is to see if site behavior changes as a whole when integers are removed. Individual opt-in doesn't permit such a test, nor does permitting opt-out.

nefitty · on Jan 23, 2020

It’s so fascinating to see how easily a human brain can be hijacked by status. I wonder if there’s a way to harness that power for improving individual performance...

gdsdfe · on Jan 23, 2020

There is and it's very effective ... A company I worked in briefly used to have this badge system to honor people that go the extra mile at work, and people were very keen on getting those even though it didn't have any financial compensation behind it

marta_morena · on Jan 23, 2020

Um that is kindof missing the point. A badge is nothing more than a number expressed in symbols, so you demonstrated just another variation of how to hijack the brain, even if you just read an article showing it to you with numbers...

If "improving individual performance" means to you that you perform better at your job, you are already lost.

gdsdfe · on Jan 23, 2020

I'm sorry but if performing better at your job doesn't improve your individual performance, you need another job!

mmhsieh · on Jan 23, 2020

Napoleon Bonaparte: “Give me enough medals and I'll win you any war”

quickthrower2 · on Jan 23, 2020

Also true if you exclude the bit about medals.

harry8 · on Jan 23, 2020

And this is why everyone in Russia speaks french to this day!

guelo · on Jan 23, 2020

Elementary shchool teachers use it all the time. Just some stickers and a leaderboard in the classroom and it works wonders for behavior and achievement.

geomark · on Jan 23, 2020

I have noticed that it doesn't have much effect after a while. The students are like, meh, someone gets an award every week, and eventually it rolls around to my turn unless I really screw up.

The effect on parents seems bigger, which is a good thing, I suppose, since parents then give their kids to some positive feedback.

thrwaway69 · on Jan 23, 2020

For monthly things, it does have some effect.

I participated in some of them before due to the visibility. People from your class may know you but others, not so much. In the absence of time and other signals, management would often pick someone who is highlighted to represent something.

A lot of big people at those events generally don't mind giving contact information to young kids asking for help as much.

From there, you can effectively build a nice list of people that you can leverage in future for your resume or career.

It's like free consultation you otherwise would need to buy later.

geomark · on Jan 30, 2020

Interesting idea. I have a very confident and talkative kid who is not shy about approaching adults. I bet he could collect lots of contact information.

Polylactic_acid · on Jan 23, 2020

Thats only if the points are simply participation awards. They only have value if winning them is not for granted.

djannzjkzxn · on Jan 23, 2020

Indeed, such mechanisms exist in the form of raises, promotions, bonuses, etc.

oefrha · on Jan 23, 2020

> Be very, very careful when you put a number next to someone's name. Because people will do whatever it takes to make that number go up.

Jeff Atwood on Coding Horror. (He can't be the first person to have said this, though.)

yborg · on Jan 23, 2020

Unless it's their weight.

jliptzin · on Jan 23, 2020

You’d be surprised

puneetchawla · on Jan 23, 2020

Hilarious! I have picked my favorite text from the article. https://youtu.be/r6H14MIJRk4