Pure contrast isn't enough to hint at falsity, I could tell you a technical detail about what I did today and it would have a 0% overlap with any article on CNN.com. In addition, correlation with your pre-existing beliefs is not sufficient evidence for truth, because today you are wrong about at least a few things. If you look at any given point in history and assume you aren't too special, the first thing you'll realize is that one if not many of the things that are repeated daily in our society are almost certainly false if not outright lies: after all, that claim is true about every other historical period and people group, so why shouldn't it be true about us today?
To suggest a specific example, imagine a person living inside WWII-era Germany correlating a propaganda news article with every other news article they have read, all of which are also propaganda. Correlation-based news epistemology isn't even based on a prevailing social consensus, it's based on the consensus of broadcasters.
That is, whoever controls the corpus of "established true facts" controls the determination of whether something is or is not fake news. Which probably gets us to competing corpuses (corpi?) of "established true facts", run by those who push competing meta-narratives. Which looks rather similar to the current situation...
Exactly so. The operator of the machine gets to control what it is told is 'true' and 'false'. That said, assuming a nominally good actor and an inability to have people process every article, this process provides a means to quantitatively analyze for coherence with your controlled corpus.
"Quantitatively analysing the coherence of a controlled corpus," is not exactly a solution to fake news, although it's probably useful for a few other things.
It certainly helps a lot due to the fact that today you can't be expected to practically sift over all the available information yourself because there's simply too much of it.
If we had a system like this that worked, we could at least leverage it to be able to quantify the coherence of a much larger set of inputs and with many controlled corpora at once.
I agree with this too, and certainly articles about different topics will have high contrast and low correlation. IBM patented[1] some work we did at Blekko to figure out the topic of a document from its components and that is an essential first step.
It also fails when there are few streams to compare, as when a new story first breaks. Or when there is coordination of actors.
My suggestion is not that it is fail safe or perfect, only that is a useful application of an ontology which disputes the original author's thesis that they are overrated.
>It also fails when there are few streams to compare, as when a new story first breaks. Or when there is coordination of actors.
All of these things are hallmarks of fake news: it is claimed to be new (hence "news",) is not published very widely (you can't compare a aquariusrising.biz article on Green Space Aliens to a CNN article on the same, because there isn't a CNN article on that), and it's usually coordinated between copycat opportunists. Even worse, none of these things are distinguishable from a grassroots report unless you know the true news a-priori. This better not be a mechanism to ban everything that isn't on CNN.com.
To suggest a specific example, imagine a person living inside WWII-era Germany correlating a propaganda news article with every other news article they have read, all of which are also propaganda. Correlation-based news epistemology isn't even based on a prevailing social consensus, it's based on the consensus of broadcasters.