Maigret: Collect a dossier on a person by username from thousands of sites

AdmiralAsshat · on Oct 2, 2022

Seems incredibly prone to false positives. For example, I can guarantee that I'm not the AdmiralAsshat on Reddit, Gmail, or Twitter, because that username was already taken by the time I tried to sign up for them.

smeagull · on Oct 2, 2022

You always need to balance Precision and Recall. For something like this, you want to be exhaustive.

I've worked on search engines, and depending on who is using them, that balance gets struck differently on the ROC curve. For legal matters, for example, they want every record that might match. For ad-hoc (google style) queries, nobody reads the second page so you care more about Precision @ 20 (or really, at 3)

c7b · on Oct 3, 2022

> You always need to balance Precision and Recall. For something like this, you want to be exhaustive.

Respectfully disagree. You're right in principle if you build a tool like that for yourself. But since this is Open Source, you have to take into account that people who don't understand that will use the tool as well and then use that as "evidence" in whatever arguments they're having with someone.

Closi · on Oct 3, 2022

> Respectfully disagree. You're right in principle if you build a tool like that for yourself. But since this is Open Source, you have to take into account that people who don't understand that will use the tool as well and then use that as "evidence" in whatever arguments they're having with someone.

Respectfully disagree with you respectfully disagreeing - this line of thinking can be used to argue against almost any information sharing.

i.e. Should we stop governments releasing statistics that might be misinterpreted by an uninformed press? Should we stop open access to medical journals because untrained readers might use them for incorrect medical advice? Should we stop companies releasing public annual reports, because investing consumers that are untrained in reading financial documents might misinterpret them?

c7b · on Oct 3, 2022

I'm just stating that I think the data is likely a priori garbage (without a lot of sanitizing work), in the hope that that message spreads faster than the use of the tool. I think having that discussion is both important and preferable to censorship.

d0mine · on Oct 3, 2022

Actually, there is a recent ~"law"? (discussed on HN) that essentially forbids publishing research that may be misinterpreted by the pronoun people (i.e., any research can be banned).

BawsMcGee · on Oct 3, 2022

Do you have a link / source for this?

d0mine · on Oct 4, 2022

https://news.ycombinator.com/item?id=32595767

jhugo · on Oct 3, 2022

Hopefully they read one sentence about what the tool does before using it. If they don't, their arguments will be weak due to their own sloppiness in research, which doesn't seem like a new problem.

c7b · on Oct 3, 2022

the thing is, people who want to use this for defamation don't have an incentive to sanitize the data - the message might stick around regardless of whether it has any merit (we've seen this happening even at the most publicly scrutinized national level, imagine how much fewer defenses smaller players have)

voakbasda · on Oct 3, 2022

Defamation is definitely one problem with this tool. Once there is “evidence”, it can be (ab)used without consequence by someone, because they can simply point at the tool output. If the output is usually “good enough”, it will provide an alibi for their actions: “I didn’t make a mistake, the computer did.”

jhugo · on Oct 4, 2022

Nobody who bothers to check what the tool does will be swayed by this. If someone wants to make shit up, and their audience will not bother to spend 30 seconds looking at their "sources", they could just write a script that outputs whatever they wanted and then point at that script.

voakbasda · on Oct 4, 2022

You are putting a lot of faith in people to do that diligence. Most people will take the “evidence” at face value. And yes, people do write scripts to generate material to defame or hurt others.

jhugo · on Oct 5, 2022

You're making my point. People take evidence at face value, so a program like Maigret makes no difference. Why bother using Maigret when you can just make stuff up?

MonkeyClub · on Oct 3, 2022

Frankly, </Hikaru>

> But since this is Open Source,

You can add the functionality you would like it to have, and share with the others.

For more fine tuned solutions, there is always a DoD contractor available near you :-)

BlueTemplar · on Oct 3, 2022

Somewhat offtopic, I'm curious about this "nobody reads the 2nd page" meme...

I find myself reading pages 2-5 quite often, because page 1 just didn't give enough results, and I doubt I'm that much in a minority ?

(I'm talking about actual generalist searches, not people that use a global search engine as a replacement to bookmarks or directly searching, for instance, Wikipedia.)

mjevans · on Oct 3, 2022

I imagine that each additional page presentation is exponentially less likely to be reached and considered a 'reputable' result by someone normal.

If I'm looking for a specific issue I'll sometimes try out things as deep as 10 or more pages of search results if nothing on the first 100 ish hits selects the issue, but then only if I can't think of any keyword variations to use that might get me a better result match. I don't expect the average user to go even remotely that far.

smeagull · on Oct 4, 2022

There are types of queries that people discount. Navigational queries for instance, that essentially use Google as a spellcheck + DNS.

And many queries that are solved with 0 visits, thanks to the information being pulled out from the wiki, or giving up and rephrasing the query.

After those, then yes, most queries will be answered by the first page. It's what retrieval is optimized for.

dboreham · on Oct 3, 2022

I always read the 2nd page because the 1st is all SEO crap.

daanlo · on Oct 3, 2022

Source: https://www.sistrix.com/blog/why-almost-everything-you-knew-...

TLDR: on the second results page, each result gets <1% click through rate.

So it‘s not nobody, but statistically speaking not very many.

kqr · on Oct 3, 2022

That's not how statistics work. I consider myself a frequent visitor of the second results page, but even for me the CTR of the results on the second page are < 1 %, because I almost always find the thing I want on the first page.

The question ought to be "conditional on not finding the result on the first page, how likely is the user to go to the second page versus balk, or re-try a different query?"

I'm fairly confident that number is higher than 1 %, but I don't have the data.

MonkeyMalarky · on Oct 3, 2022

Or alternatively, a number "significantly less than 1%" is still massive when multiplied by the number of searches google gets.

BlueTemplar · on Oct 3, 2022

But how much is this affected by my caveat inside the parentheses ?

lol768 · on Oct 2, 2022

Yup; for common/short usernames it's all too common for other folks to register using the same identifier.

The end result seems to be that this tool decides you're interested in "dating", "porn", "stocks" and tags you with a "ru" country code - despite not owning any of the accounts that the determination has been based off of.

The README should come with a disclaimer really.

jhugo · on Oct 3, 2022

Isn't it obvious just from reading one sentence about what it does? As it clearly says, it's based on username only, so multiple people's results may be mixed in the dossier if they share a username. We don't need a disclaimer for every thing that a moment's thought can reveal.

voakbasda · on Oct 3, 2022

The people that use this tool to dig up dirt on someone will love to see a mixed bag, because it could be useful to cast shade on their actual target. This tool will be abused on purpose.

jhugo · on Oct 4, 2022

That makes no sense. They could just make stuff up, it wouldn't be any more or less believable.

voakbasda · on Oct 4, 2022

Have you ever been part of a civil lawsuit? The lawyers will absolutely lie through their teeth to make their cases. They will introducw any “evidence” that they think will help them. Would you really trust a jury or a judge to understand and believe that this report is not accurate? Having been through the process, let me tell you straight up: you should not put your faith in the justice system.

jhugo · on Oct 5, 2022

Not sure where the rant about the justice system came from. You are making my point — people willing to lie through their teeth can just make up any report they like, the existence or accuracy of Maigret changes nothing.

eek2121 · on Oct 2, 2022

I intentionally steal nicknames I've seen. I'm an asshole as well, so sometimes it works out. I have 13 different nicknames so far that I use/have used since 1997, though I tend to rotate between all of them regularly. I DO hope they try to use "AI" to track me. That will be fun.

smeagull · on Oct 2, 2022

I have never used the same nickname on multiple sites. Often when I get a new device, I just make a new account. Especially for HN and Reddit.

kdtsh · on Oct 3, 2022

That almost seems a shame because smeagull is a fantastic username.

orange_fritter · on Oct 3, 2022

What's the benefit? It sounds like you might have to explain yourself one day if someone posts something racist/horrible.

BuyMyBitcoins · on Oct 3, 2022

I look at it differently, even if someone isn’t going around posting racist/horrible things, people and tastes change over time. I’ve been a part of fandoms that are now seen as cringey or toxic. I’ve also grown up more as a person and I look back at a lot of my old comments as sophomoric. I write differently, and my opinions on things have changed as well. I’ve had people dig through my post history on sites like Reddit to try and find a “gotcha” based on some remark I made years ago.

All in all, I personally feel like it is a good thing to cycle through usernames throughout life.

paledot · on Oct 3, 2022

> I’ve been a part of fandoms that are now seen as cringey or toxic.

—BuyMyBitcoins

Joking aside, there's definitely value to rotating usernames frequently. I've started using random strings on various sites because I really don't see an up side (for me) to being trackable from site to site and definitely across time. (I use very long random strings for my banking usernames because I don't trust them to have enough bits of entropy in their passwords.)

orange_fritter · on Oct 3, 2022

I'm dropping some hot takes on hacker news tonight, so I agree. Although the idea that you can stay anonymous is probably naive. Between database leaks and AI text analysis, good luck.

oceanplexian · on Oct 3, 2022

It’s not so much of a benefit, as that some of us simply don’t care what others think. Actions speak louder than words and all that.

A Internet comment from 10 years ago might cost you a job, or it might cost a friendship. But at least I’m not living a facade about being a perfect and flawless individual, and that helps me sleep better at night.

antifa · on Oct 3, 2022

Seems like the solution is built-in to their strategy. If you get accused of being racist because a racist on twitter uses the same handle, just make a new handle and start over.

dolmen · on Oct 3, 2022

What about the case where (s)he's the one that posts the dirty things?

numpad0 · on Oct 3, 2022

…that’s an interesting behavior

bee_rider · on Oct 2, 2022

Some annoying people seem to have used my gmail to sign up for things like Twitch. I have all the three factor authentication stuff set up, so I don't think they can get in to click the "verify" links, but I wonder if this sort of tool would be able to verified that the account was... verified. Probably not.

romeoblade · on Oct 3, 2022

My Gmail is first.last name. I get very sensitive documents for a lawyer with my same name who resides in Texas. We've actually had some decent conversations over the years.

Semaphor · on Oct 3, 2022

I have an extremely common (for German-speaking countries) first-last name combination and I have firstlast@gmail.com (not used anymore, but I check it around once a week). I get so many confidential documents or personal photos.

The most hilarious/sad was the insurance provider domcura which advertised how they got some kind of award for their great processes, yet writing to 2 or 3 different service emails that they are sending me confidential documents resulted in nothing until I wrote to their data protection officer.

number6 · on Oct 3, 2022

Hello Hans Maier!

Semaphor · on Oct 3, 2022

Maier is only #34 in Germany, Meyer is #6, and mine, Wagner, is at #7 ;)

https://en.wikipedia.org/wiki/List_of_the_most_common_surnam...

BuyMyBitcoins · on Oct 3, 2022

Here in the United States we choose “John Smith” as the most generic and common name for a man. What name would a German pick?

Curiously, for a supposedly common name I have never once met a John Smith.

Semaphor · on Oct 3, 2022

It depends, probably Max Müller or Hans Müller, just anything Müller. Unlike the US, we don’t normally use a real name as placeholder name [0], the options there would be Max or Erika Mustermann (literally example or pattern man), for an average person it’s Otto Normalverbraucher (Otto Average-Consumer) and Lieschen Müller.

[0]: https://en.wikipedia.org/wiki/Placeholder_name

kataklasm · on Oct 3, 2022

Holy shit. I never made the connection with Otto Normalverbraucher. Thanks for making me click with this little piece of German trivia on the day German history was made! :)

jaclaz · on Oct 3, 2022

I thought it was John/Jane Doe but I now realize that those are only for unidentified people (in police/judiciary contexts).

In Italy - for the record - it would probably be Mario Rossi (as an example on a mockup form), but conversionally it would be Pinco Pallino.

There are a few dedicated English wikipedia pages:

https://en.wikipedia.org/wiki/Placeholder_name

https://en.wikipedia.org/wiki/List_of_terms_referring_to_an_...

https://en.wikipedia.org/wiki/List_of_placeholder_names_by_l...

it is interesting to see the slighty different use in the various languages.

frereubu · on Oct 3, 2022

There was a John Smith who was briefly the leader of the Labour Party in the UK (he sadly died very soon after becoming leader from a heart attack), and I remember him saying that before he became well-known, when he checked into hotels with his real name the receptionist's often eyed him suspiciously, thinking it was a false name to cover up the fact he was checking in with his mistress.

jcynix · on Oct 3, 2022

Peter Müller (i.e. Peter Miller) would be a good generic one.

P5fRxh5kUvp2th · on Oct 3, 2022

I had that happen once.

It was a recipt for a medical purchase, at first I thought I was getting scammed. What tipped me off was the email was sent to firstnamelastname@gmail.com and NOT firstname.lastname@gmail.com. That was the day I realized google would even do that.

I ended up using the phone number in the email to contact the person and forwarded the email. And yes, they had my first and last name :)

romeoblade · on Oct 3, 2022

Yup! I ended up having to look him up via the states bar association's website. Was able to get his bar association email which was different then the gmail. It's the only contact I have in google contacts because it happens about 3 times a year.

He got a congratulations email on a bmw purchase one time. Had a good discussion about cars, we are both gear heads.

bhrgunatha · on Oct 3, 2022

I'm subscribed to so many informal church groups in Alabama originally but "I've" since moved to Texas. I tried quite hard to clear up the confusion but with virtually no action ever taken, so now I just observe.

From those emails, it's nice seeing how people support each other and apparently I'm in demand giving scripture classes.

dolmen · on Oct 3, 2022

I'm french.

But my e-mail address has been used by real people to subscribe to services in Sweeden, Turkey and somewhere South America. At least language helps to sort things.

hota_mazi · on Oct 3, 2022

> What tipped me off was the email was sent to firstnamelastname@gmail.com and NOT firstname.lastname@gmail.com.

Dots don't matter in Gmail, so these email addresses are the same:

https://support.google.com/mail/answer/7436150?hl=en

SpamFork · on Oct 3, 2022

Your gmail is actually firstlastname periods are ignored and so are most things after a +. So your email could be fir.st.las.t+hackernewsapp @

what_the_h321 · on Oct 3, 2022

This reminds me of gail.com: https://discu.eu/q/https://gail.com

drusepth · on Oct 3, 2022

I used to have a similar-enough email to a dentist office in Texas and would get a lot of patient scans / files as well. Reached out a dozen times before it finally stopped.

bee_rider · on Oct 3, 2022

So far I haven't gotten anything that would give me moral compunctions to ignore, thankfully!

JasonFruit · on Oct 2, 2022

I get email all the time for a Jason Fruit in Chicago who uses my email address as a dummy. I even called him once to ask him to stop. He didn't.

bee_rider · on Oct 2, 2022

The annoying this is, my email address is based on my name, which is fairly unique (no relation to my account name here), to the point where I'm nearly 100% certain that it is intentional (gotten from a leak of emails which have signed up for some service or another).

dhosek · on Oct 2, 2022

My wife has a firstname@gmail.com account and she gets a lot of other peoples’ emails including some rather sensitive stuff (bank statements, employment info, etc.)

rightbyte · on Oct 3, 2022

Me too. It is the same guy each time though. He has a dot in his addres and I think it is dropped in many hand written signups.

TEP_Kim_Il_Sung · on Oct 3, 2022

Your current username at geronnimo mail dot kazooie? Fun fact: at email dot com is a functional AOL address, still recognized by all corporate services, even with completely made up user part. e.g. birdperson @ email dotcom

wruza · on Oct 3, 2022

Side question: do people still believe that email scrapers are unable parse “username at email dotcom” or “my username at gmail”? Or is it just a cultural thing?

bee_rider · on Oct 3, 2022

Nah, decided to make a clean break from previous accounts for this site.

alex_duf · on Oct 3, 2022

Easy to solve: do a password reset on the website (in your case, twitch), take over the account, then close it.

bonus point for checking the activity to see if it looks human or if it's taken over by a bot.

ohgodplsno · on Oct 3, 2022

I'm more surprised by the fact that people have used my standard username on not just one, but three porn sites, and that I am now a 31 year old camgirl from Thailand, a 36 year old man from Norfolk looking for love and a 31 year old man from South Africa.

z9znz · on Oct 3, 2022

This tool finds existence of accounts by username. It does not promise that all found accounts belong to the same human.

xenocratus · on Oct 3, 2022

Project description in the README:

> Maigret collect a dossier on a person by username only...

The About field:

> Collect a dossier on a person by username from thousands of sites

"on a person" seems to imply that they'd belong to someone in particular. Obviously if you have any experience of creating accounts you'd know that's unlikely to be the case, and it's not written in a promissory tone. But it does imply it.

GoblinSlayer · on Oct 3, 2022

Sometimes I used other people's usernames too for this reason.

Findecanor · on Oct 3, 2022

Please don't steal. You may be making it difficult for the people you steal from.

I often use the same username(s) on multiple forums where people discuss similar topics because I want other people who visit the same forums to recognise me as being the same guy.

I've found this username taken a few times, and it has bugged me every time. While it is from Tolkien's Elvish, it is obscure and then misspelled (actually a portmanteau). I've had to add a prefix, such as "TheReal" to it.

ZeroGravitas · on Oct 4, 2022

An interesting extension to this tool would be to let you specify the interests and the site you want to sign up to, and it would generate a username that fits that profile and is available on the specific site.

belter · on Oct 3, 2022

The next step is when Google blocks your account based on the behavior of said similar alias, in other parts of the internet tubes....

TheAdamist · on Oct 3, 2022

As a commissioned officer, you may be responsible for everything done in your name.

More rations of rum, i think thats how the british managed it. I wouldnt complain, much.

dolmen · on Oct 3, 2022

This project is apparently a fork of another project called Sherlock [1] which is still active.

Does anyone know the story behind the fork?

[1]: https://github.com/sherlock-project/sherlock

leogout · on Oct 4, 2022

I've tried out both today. As far as I can tell maigret generates nice reports whereas sherlock only gives you the urls and you have to dig yourself int those.

bee_rider · on Oct 2, 2022

Well, that is creepy as hell, but I guess it is obvious that such a tool could exist, and it is better not to have it exclusively in the hands of data-brokers, etc.

I wonder how hard it would be to add the functionality: go back to the email address that has registered these accounts, find any other names they've registered, and search off those. (EDIT: err, wait, I bet that's the "recursive" functionality they mention).

eurasiantiger · on Oct 3, 2022

You have probably already clicked ”accept”[1] on one of those ”cookie notices” which actually make you give your consent to having your data processed and all your profiles and devices linked by hundreds, if not thousands of companies all over the world.

[1]: more accurately, you didn’t go through the list of all vendors and didn’t object to their ”legitimate interests”, if that was even an option.

bee_rider · on Oct 3, 2022

If your point is that it is too onerous to use the modern internet without being spied on, yes I agree.

swyx · on Oct 3, 2022

i think clearbit is making $100m/yr doing this as a service for businesses already

nvr219 · on Oct 2, 2022

I'm a big use-different-username-for-every-website enthusiast. It makes some things difficult but for me it's worth it.

barbariangrunge · on Oct 3, 2022

I guess with password managers, this is easier than ever

tempodox · on Oct 3, 2022

Even if you used the same username on different accounts, you should have different passwords anyway.

glitchcrab · on Oct 3, 2022

I'm pretty sure they meant that using a password manager means you don't have to remember the _username_ for every site.

kataklasm · on Oct 3, 2022

I've started to just generate a 10-char alphanumerical string with the password generator feature and use that as the username on new accounts, starting to update them retroactively, too (by scrapping the old account and making a new one).

JamesDough · on Oct 3, 2022

Thats great, then you can use the same for your password.

marginalia_nu · on Oct 3, 2022

Really, why is that?

quetzthecoatl · on Oct 3, 2022

because if you get your account compromised on one site/forum because of lax security on their end, you don't want all your accounts compromised.

marginalia_nu · on Oct 3, 2022

How will your accounts be compromised if they are all different? A key is no good if you don't know which lock it leads to.

nvr219 · on Oct 3, 2022

Because it might be tied back to your email account or some other common denominator. Using different usernames is good for dox attacks like this but doesn't necessarily help you if your email is compromised. So you need different passwords (AND MFA!!!!)

marginalia_nu · on Oct 3, 2022

What if I use more than one email?

walrus01 · on Oct 2, 2022

But people are commonly lazy, and you'd find that most people who don't even consider resistance to deanonymization as a concern will just use the same name on many sites.

shanusmagnus · on Oct 2, 2022

Speaking personally, I kind of want people to know I'm the same person in different places. I do realize this is a terrible idea from a privacy perspective, and probably a lot of other perspectives. But I think a bunch of people want to use the net this way.

Entinel · on Oct 2, 2022

It's only bad from a privacy perspective if you are sharing things you don't want linked together or are sharing things that can be unintentionally linked back to your real life. If you are the same username across multiple sites and you want those linked together then that's just branding.

trinovantes · on Oct 2, 2022

Using the same username may allow you to manually poison your own profile to make your data less useful than a shadow profile generated through stylometry

Mezzie · on Oct 3, 2022

I do this. I have a handle I've been using for 20+ years (not this one and nothing on HN) and I lie about stupid shit rather frequently.

I also also create one-off usernames in addition to my 'permanent' name. That way everybody sees the permanent name and assumes I'm too stupid/ignorant to use different names, so nobody ever suspects the other names are me. (Using different emails because I do the same thing with email addresses and - when I can afford it - phone numbers).

walrus01 · on Oct 3, 2022

> It's only bad from a privacy perspective if you are sharing things you don't want linked together or are sharing things that can be unintentionally linked back to your real life

You must live in a location where you have very little fear of political oppression. If you're an Iranian or something these days the situation is very different.

vidarh · on Oct 3, 2022

I use recognisable usernames in places (like here) where I'm fine with my comments being linked to my real name, and less obvious ones without explicitly hiding it places where I'd prefer my comments don't show up in Google searches for my name (e.g. Reddit - if you trawl through my reddit profile, linking it to my name is easy, and it wouldn't be a problem if you found it, but I don't get a bunch of reddit comments when I Google my name), and then totally separate user names for anything I actually want to be anonymous for.

My situation is a bit unusual in that I almost certainly have a globally unique name (my last name is a corruption of an uncommon name in Norway; said corruption occurred two places independently, and there are less than 500 people with that last name in Norway, and probably about the same in the US, and to date I've seen no indication of anyone combining it with my first name), my first name plus last initial is also unusual enough that I only rarely can't get it as a username.

NikolaNovak · on Oct 2, 2022

Agree. There's a possibly flawed but possibly real notion of reputation or brand or eminence. Patio11 of course comes to mind :).

BenjiWiebe · on Oct 2, 2022

Count me in as one of those people.

I use the same username (which is my real name as well) pretty much everywhere.

bombcar · on Oct 2, 2022

Use the battery horse staple tool to generate usernames.

Or don’t, I’m not your mom.

thakoppno · on Oct 2, 2022

isn’t it:

`correct horse battery staple`

just wondering how memorable that technique is.

edit: indeed

https://xkcd.com/936/

hifikuno · on Oct 3, 2022

I've started doing this too. I use DuckDuckGo's email forwarder and they provide random email addresses to hide your real one. Every service I sign up with now has a random email address and what ever the email address is that becomes my username. Couple this with a password manager it works well. If I ever lose access to the password manager I am screwed.

lol768 · on Oct 2, 2022

This is technologically impressive, but I was disappointed (but not particularly surprised) to see no mention of the ethics involved in using this sort of project - in either the GitHub issues or the README/docs.

This seems like a great tool for stalking people; particularly the recursive functionality (for tying identifiers together).

I'm not saying the world is worse-off due to the release of software like this - indeed, you could argue that publishing these sorts of OSINT tools allow folks to take a proactive and protective view of their own information and e.g. make changes to their own profiles/privacy settings.

But the question of ethics seems like something all-too-often glossed over in the infosec world. Software doesn't exist in a vacuum.

bityard · on Oct 3, 2022

Software like this _already_ exists. Credit bureaus, advertising firms, data brokers, skip trace databases, three-letter agencies across the globe, and nearly all of the Big Tech companies do their business building centralized profiles of basically everyone and that certainly includes correlating online identities.

The only difference here is that one is more limited and open source.

"Privacy is dead, get over it."

remus · on Oct 3, 2022

Just because similar tools already exist that doesn't mean we shouldn't discuss the ethics of these tools (including the developers themselves).

hnbad · on Oct 3, 2022

There's a difference between being able to buy a gun at the gun store where your gun is registered to your name and you can only pick it up after a multi-day waiting period, and being able to pick up a gun at the free gun dispensary at the push of a button in a back alley with no supervision.

I realize this may be a failure to communicate because of a difference in shared values but I come from the perspective that harm reduction is worthwhile even if you can't fully eliminate the problem. That you can't prevent all crimes doesn't mean you shouldn't try to prevent crime.

Not to mention that the risk profile of having three-letter agencies come after you or having a random obsessive weirdo "collect a dossier" and share it with other obsessive weirdos who get a kick out of making your life miserable because they project their life failures on you is very different. Namely, if you're prominent enough to have an entire state agency try to destroy you, it's probably over an active decision you made (e.g. focusing your life on political activism) whereas for random weirdos to harass you and attempt to drive you into suicide you just need to show up on their radar long enough for them to start making up reasons to hate you.

EdwardDiego · on Oct 3, 2022

Sure, but none of them are readily available to someone you argued with on /r/politics who has taken agin you.

vintermann · on Oct 3, 2022

So, the question is: are you more afraid of random nobodies online, or are you more afraid of already organized, already powerful people who had similar capability already?

hnbad · on Oct 3, 2022

Unless you're actively plotting to overthrow the government, commit violence against politicians or public property, or participate in major civil rights movements, you should be more worried about a random nobody funnelling all their built up hate and frustration into making your life miserable than a three letter agency, yes.

The difference is that the three letter agency has a budget, process and middle management so at the end of the day someone needs to justify why they're expending resources on making your life miserable whereas the random obsessive weirdo just needs to convince a bunch of other obsessive weirdos that you're a garbage person they can turn into their "lolcow" of the day/week/month/year.

The three letter agency is more likely to inconvenience you out of apathy than to actually try to destroy you intentionally. The random nobody is more likely to try to drive you into suicide for a thrill.

fritztastic · on Oct 3, 2022

Definitely nobodies online. I don't think anything I've ever done or will do would put me in the interest radius of big agencies, government or private (outside of perhaps advertising and a few exceptions).

I have however, encountered death threats or obsessive/hostile behavior from people online who decided they have a beef with me for one reason or another. I'm much more inclined to be wary of some unstable individual having the means to find my identity (and therefore my location)- potentially using this to act on their emotions IRL.

EdwardDiego · on Oct 3, 2022

I'm a nobody, and fiercely proud of it. And I pity the NSA employee who has to monitor my Internet usage, they're going to see an awful lot of Javadoc, programming blogs, websites about plane crashes (and I hope they read some AdmiralCloudberg on Medium for themselves), and websites about rocks and faultlines.

In reality, as mentioned by other commenters, I'm not anything they're interested in, at best I'm a datapoint that advertisers try to categorise into a demographic (male, 30 - 50, doesn't give a shit about cars, into tech).

And anyway, they're already doing it, but impersonally.

It's the person who is very personal that I'm not enthused about.

vintermann · on Oct 3, 2022

This, and all the sibling comments, is just sad.

Not only are you happy to never challenge current power in any meaningful way, but you are sure you never will be. If there is a revolution tomorrow and the Trumpists are in charge, or the Marxists, or the Russians or whatever, you're still sure you will be fine with that order too and will never rock the boat.

You're also confident that if you do nothing wrong by them you have nothing to worry about.

Instead, you've been taught to fear the weak, the nobodies. The nameless savage in the night. As if they have any more reason to hate you than the powerful have.

concordDance · on Oct 3, 2022

The answer is often the first because the latter don't have it out for you.

codingdave · on Oct 3, 2022

Absolutely more worried about the "nobodies". I'm boring to big scary organizations. But individuals have a wide potential for varying levels of instability and vengeance. Especially online.

firefoxkekw · on Oct 3, 2022

LOL

Similar tool for phones -> https://github.com/sundowndev/phoneinfoga Collection of OSINT Tools- > https://osintframework.com/

And if you want to pay you can just use https://www.maltego.com/ and https://pimeyes.com/en

It is readily available.

60Vhipx7b4JL · on Oct 3, 2022

It makes a difference if you make it available to the average joe with limited technical understanding or not.

You can't change the NSA, but you can resuce the amount of abuse.

It's the same with guns, USA vs Europe for example.

mtrycz2 · on Oct 3, 2022

> "Privacy is dead, get over it."

I am NOT getting over it.

xena · on Oct 3, 2022

There is a difference between public and publicized information.

motoxpro · on Oct 2, 2022

As a commentary on the conundrum, not so much your post.

I always think it’s strange when people make the argument that it’s good for harmful tools to be out in the open because people can/will change their behavior because of them.

99.999999% of people will never see this repo, never know things like this exist, and don’t visit sites like HN but can, and probably will be, affected by it in some way.

lol768 · on Oct 2, 2022

> I always think it’s strange when people make the argument that it’s good for harmful tools to be out in the open because people can/will change their behavior because of them.

> 99.999999% of people will never see this repo, never know things like this exist, and don’t visit sites like HN but can, and probably will be, affected by it in some way.

Yup, I do think you're right. And honestly, the folks who'd probably most benefit from being able to run one of these tools to empower themselves with data that can help them change their behaviour/privacy settings - they're probably actually the folks least likely to be able to install and run the tool.

NoraCodes · on Oct 3, 2022

I disagree entirely. As someone with a moderately public persona who is also in the LGBTQ+ community, having tools like this is extremely helpful, both to help me keep a grasp on information about myself and for helping less technically inclined people in my community protect themselves. It's all information that a motivated attacker could find easily with or without the tool, and there are a sufficient number of motivated attackers that any moderately public queer person will, at some point, be doxed and subjected to some level of internet harassment.

javajosh · on Oct 3, 2022

Yeah but this tool just lowered the barrier-to-entry for any motivated attacker. What before was an, oh, 20 hour slog through a bunch of sites looking for the same username, is now a 15 minute run of a script.

For an analogy, consider another tool, an app you can side-load and use to unlock any Prius made between 2010 and 2013. Is this beneficial? Certainly some affected Prius owners will be helped by its release. But affected Prius owners will be (probabilistically) harmed. Even if you're one of the group that was helped (probably arguing the exploit was already well known prior to the tool's release, perhaps?) I think it is dishonest to assert that the release of the tool is "good" from a utilitarian sense.

I wish it weren't so.

NoraCodes · on Oct 3, 2022

I definitely see your point, but when there are, at minimum, dozens of people who are willing to undergo that 20 hour slog, I'm glad there's a tool that makes it take mere minutes to find OSINT leaks myself.

I'm not saying you're necessarily wrong in general. It's possible this is bad for the world. But I maintain that for myself and the people I care about, it's more useful than it is damaging.

mod · on Oct 3, 2022

I agree. I think this publication is a net-loss for the world, for sure. People will find and use this code--people who could/would never create it themselves.

It will be used for many reasons, but the stalking is the most obvious.

Not all of my usernames are as resistant as this one, unfortunately.

true_squirrel · on Oct 3, 2022

You have to squint really hard to see "many reasons". Look at the list of sites it checks. Porn / fetish sites, gambling URLs, photo hosting, hobby forums, etc. It's a tool for doxxing and stalking, and little else.

It's essentially like arguing that releasing open-source ransomware toolkit is beneficial. I mean, maybe it's your right, and one can make some strenuous arguments about how it helps the "defense", but really, it just makes it easier to be a terrible person on the internet.

rtev · on Oct 3, 2022

Is it better for only malicious actors to have access to this tooling? If both teams don’t have the same tools, there isn’t a level playing field.

If this wasn’t on GitHub, we wouldn’t be aware of it. However, it would certainly be circulating on underground forums. Is blissful ignorance better?

Springtime · on Oct 3, 2022

I feel this is similar to Firesheep[1], a browser extension from a decade ago that put cookie session hijacking into the hands of everyday users and became wildly popular until its removal. Pre-Snowden it pushed various major sites to implement HTTPS encryption, including Facebook[2].

Granted there the solution was a rather simple one but I feel it's at least worthwhile for more to be conscientious of singular identities online and what info is disclosed publicly with them.

[1] https://en.wikipedia.org/wiki/Firesheep [2] https://threatpost.com/facebook-kills-firesheep-new-secure-b...

archevel · on Oct 3, 2022

I think the argument is more that this tool by virtue of being easily accessible creates more malicious actors. Similar to how most people wouldn't steal a locked bike, but a larger portion would steal an unlocked one. It isn't that much harder to steal a locked bike vs an unlocked one, but the threshold is just a tiny bit higher so more people will attempt it. Conversely this tool lowers the threshold for stalking, so more people, who otherwise wouldn't, will use it maliciously. That isn't the fault of the tool or its developers, but it is something to be aware of when building any tool. When you release it, it may get abused by people for bad purposes.

rtev · on Oct 3, 2022

Assuming your online identity is secure by obscurity is not a realistic option. If all it takes is a flashlight for everyone to become “vulnerable”, we need to just assume the light is always on. Someone handing out flashlights just highlights the bigger problem.

If this truly causes worry, adopt better opsec and create generative usernames for different sites. If not, assume anyone will easily be able to link your bowel issue subreddit comments to your LinkedIn profile.

wruza · on Oct 3, 2022

If this wasn’t on GitHub, we wouldn’t be aware of it

The more availability this type of tool has, the less professional its users are. As a result, while it makes easier for you to see what “they” can see about you, “they” become much more personal and bitchy than credit bureaus or ads companies who already have it. E.g. a credit bureau would never sell your comments or “private preferences” to your boss to step over you in career. It increases attack vectors enormously.

Is blissful ignorance better?

It is in this case, I believe. It’s like spreading free covert time-travel-enabled surveillance devices among general public. Someone will pat it onto your back just for dark fun.

rtev · on Oct 3, 2022

The actual fix is to not reuse usernames. Does distributing password stuffing tools increase the ease of password stuffing? Yes. Is that the problem? No - people reusing passwords is the problem.

wruza · on Oct 4, 2022

It doesn’t work retrospectively. You either start a new web life or live under a risk of accidentally exposing or linking to one of your already vulnerable accounts.

Also, if we don’t raise issues like this, the next actual fix will be “don’t reuse writing styles and vocabularies on different sites”.

rtev · on Oct 4, 2022

I believe both are inevitable if you want to maintain an unlinked identity.

Realistically, there is nothing you can do to avoid that future. The best thing you can do is choose to accept the risk or act to prevent the issue.

wruza · on Oct 4, 2022

This line of thinking reminds me of a Roko’s basilisk situation. Embrace it and be prepared vs. don’t mind this nonsense and just actively stop creating/spreading it. And if it spreads enough, make it a punishable offense like hacking/piracy (e.g. via “canary” links) for it to live only in underground. I hope this project will just fly under the public radar due to non-ease of use by an average person or for a similar reason.

rtev · on Oct 4, 2022

This project is exceptionally minor compared to all the public hacking tools that exist. Are you advocating to start censoring any tool that can be used for malicious purposes?

JustHiThere · on Oct 3, 2022

Wrong approach imho. This means, all the tools developed for ethical hacking are "bad" because they can used for "bad". This is like saying we shouldn't sell knifes because knifes can be used to kill someone. It's just silly.

This tool could be used to teach people, e.g. in OSINT challenges, it could be used to gather information for a pentesting job, it can be used to teach people about best practices online etc ...

ohgodplsno · on Oct 3, 2022

A knife has many uses, from cutting wood to stabbing people. This tool is very much specialized in the stabbing (not necessarily people, as you said, you could stab a ballistic dummy to see the damage it does). Do you really a think a stab-only knife should be sold with no background checks, with no tracking of who buys it ?

nottorp · on Oct 3, 2022

> This tool could be used to teach people, e.g. in OSINT challenges, it could be used to gather information for a pentesting job, it can be used to teach people about best practices online etc

Or just... look yourself up. Like most of the people on this HN thread have done.

motoxpro · on Oct 3, 2022

We could debate where they are but surely there are limits? I wouldn't want everyone to have access to things like Pegasus (https://citizenlab.ca/2022/10/new-pegasus-spyware-abuses-ide...). You could argue Pegasus could be used to hack a phone of a child predator to save a life, just as you could argue this tool could be used to educate. Maybe for you this username tool falls outside of yours, but it looks more like a switchblade than a butter knife to me.

dolmen · on Oct 3, 2022

No mention of ethics on the "Philosophy" page of the documentation.

https://maigret.readthedocs.io/en/latest/philosophy.html

manholio · on Oct 3, 2022

This is simply a demonstration in essential opsec: don't reuse handles across sites (especially uncommon ones), cycle handles often on high leak platforms (such as Hacker News), don't link your identities by creating profiles.

Anything you put online is perpetual and will be used against you, any innocent hobby will or adolescent joke will become a major transgression at the right time in your life and for the right audience. Even slight grammatical idiosyncrasies in the words you type can be used to root you out by motivated parties.

Curate you official public profile like you are preparing to run for office, one day you just might; anything else should be anonymous.

mytailorisrich · on Oct 3, 2022

Ethics is for individuals to ponder about in relation to what they intend to do. There is no need for a tool's documentation to dissert over it.

On the other hand the docs are the right place to warm about potential legal implications.

verisimi · on Oct 3, 2022

Ethics plays no part in this reality. The term 'ethics' is only ever used to justify the unethical, eg ethics committees find that it is fine to use embryos in science. It is about applying the veneer of morality over the top.

Ethical decisions are down to the individual.

Unfortunately, lots of people think ethics is what the law allows, or what government says, or even what teachers teach. Most individuals don't take the time to realise or uncover and then act upon the ethics they have innately - aka following one's heart/conscience.

Worse still, many paychecks depend upon the unethical, so don't expect any critical introspection or changes soon.

faeriechangling · on Oct 3, 2022

This is why I use many different usernames, I must have used maybe three dozen different ones by now. The only time I use the same username is for games because I want people to know what other games I play.

My opsec is pretty awful but I at least try to not be "simply type in the username I always use into google or some tool on GitHub" bad. I at least want somebody to have to use a few hours or pay a data broker to reveal all my secrets because my bet is that nobody cares about me enough to do that.

baxtr · on Oct 3, 2022

Thai won’t be helpful going forward. The next gen of tools will analyze your writing style and make the connection through that.

Karunamon · on Oct 3, 2022

I would be willing to bet that, around the same time those tools become widely distributed (kind of like what stable diffusion did for AI artwork), adversarial tools that emulate or obfuscate writing styles will exist as well.

aunetx · on Oct 3, 2022

I guess to analyse your writing style from thousands of billions of messages on Internet in order to simply link two of your accounts together is... overcomplicated?

I mean of course it could work, but except if there is a very special interest for an account, we may need to wait some decades before a list of "linked accounts" containing it in particular could emerge, right?

fire · on Oct 2, 2022

hah, I thought it sounded like sherlock[1] and lo, it's a fork!

can't find a comparison of them though; any idea how they differ?

1: https://github.com/sherlock-project/sherlock

frumiousirc · on Oct 3, 2022

> sherlock

Comparing the options described in their readme files, maigret adds --html, --pdf, --tags beyond what sherlock supports. And, it adds documentation beyond a README file.

jbm · on Oct 3, 2022

I took a look at my own user names and it's pretty obvious that the results here are mostly worthless.

So many people with empty blogs and the same user name. Unless the blog is auto-created, I don't see the point.

I do find all the .ru tags to be concerning. I can imagine out-of-touch security types at Paypal deciding to sanction me based on it without any recourse.

vosper · on Oct 3, 2022

Yeah I've seen a similar tool fail hard on any wiki site that returns a placeholder page for any username whether or not it actually exists. I've never seen anything except 95% false positives, which is so many that after a few runs you just ignore the tool.

hnbad · on Oct 3, 2022

It's mostly worthless but the recursive search via aliases can give some interesting insights. I ran it on my GitHub username and while the result ended up linking me to countries I have never been to (probably via a yandex account as the username is very common and a lot of the profiles were of other people) it did include my full name and cities I've lived in. It seems the most "dangerous" link in my case was Gravatar.

Interestingly, searching by e-mail gave the most garbage results, likely because most services don't expose the e-mail address directly. I guess it's an advantage that neither my usernames nor my legal name are particularly unique.

GoblinSlayer · on Oct 3, 2022

Paypal is a private company, it can sanction you for lulz too.

basicplus2 · on Oct 3, 2022

Unless all your user names are your email address

StarlaAtNight · on Oct 3, 2022

Actually pretty useful if you input your email address. It crawls through some kind of knowledge graph

I'm seeing stuff I probably need to go clean up from like 10 years ago that's no longer relevant

tartaglia0 · on Oct 3, 2022

I gave it a try feeding it with one of my community nicknames. It was rather generic so I did expect some false positives. Color me surprised when it automatically started a second search iteration using my real name!

Turns out that one of the websites I'm registered to uses Gravatar to display profile pictures. I've naively been using the same WP account to set a picture for all my emails, believing that you couldn't get information about my other accounts using a simple picture.

I guess I was wrong, and I'll probably split my Gravatar account this evening.

Entinel · on Oct 2, 2022

I used to do what this tool does but manually when networking. Find someone I wanted to network with then find all of their online accounts to try to find an "in." Definitely very creepy but I if called it networking not stalking and its okay right.

Swizec · on Oct 2, 2022

You can make it not feel creepy by using the research as a way to seed the conversation rather than replacing it. Ask good questions because you know [part of] the answer, but let the person decide how much they want to say in the moment.

An even better trick is to invite them for a podcast recording. Then all this research is suddenly very normal and expected.

joshka · on Oct 3, 2022

Related... https://bitwarden.com/blog/how-to-use-the-bitwarden-username...

djbusby · on Oct 2, 2022

Remember when KeyBase was gonna make it easy to verify users across profiles? Your KeyBase profile would link out to all of them, no magic needed.

styfle · on Oct 3, 2022

Yeah that or a personal website which often has links to the same person on all social networks.

sbf501 · on Oct 3, 2022

My decision to pick a username that was taken on other sites has successfully improved my anonymity score!

fritztastic · on Oct 3, 2022

>The Maigret database contains not only the original websites, but also mirrors, archives, and aggregators.

Deeply unsettling. Having had my share of encounteres with people online taking personal interest in an unfriendly context, this is worrying. I don't need to run a search of my info to know to use different usernames, but an automated feature that allows just about anybody determined enough to easily find not only linked accounts but archived data. I'm glad I no longer use social media, at least infrequently and not the major ones- but the fact someone can find archived information? Time to make some adjustments.

mcqueenjordan · on Oct 3, 2022

I think you're supposed to do this with Stylometrics[1]. So I guess this is sort of the greedy hacky approach to try and associate users across websites.

My operating assumption of Palantir as a company is that they have a very advanced system of a similar shape that tries to accomplish the same goal of linking accounts across services, but I have no insider knowledge.

[1]: https://en.wikipedia.org/wiki/Stylometry

INTPenis · on Oct 3, 2022

I've been fantasizing about something like this for years.

But my idea was more along the lines of supporting API keys for many different sites, taking your time to configure all these API settings for all these different "site scraping engines" and that would give you the activity of someone you might be tracing. Their comments, their posts and so forth.

I was thinking this would be great for snooping employers who want to know what their WFH employees are up to.

Funny enough I was even imagining support for Steam, and friend request coming from these robots that in rare cases you could even see live as an employee started playing a game.

daitangio · on Oct 3, 2022

Tested docker version: nice but the report is a long list, much like an "index" of the provided username present on a set of sites.

It should provide at least some numbers/graph to "measure" your presence on the web, like I do not now, number of GitHub repositories, instaram photos and so on.

A user with little instagram photos but a lot of GitHub repositories is different from a guy with a lot of Blog posts, Pinterest entries, Twich videos...and so on.

_abox · on Oct 5, 2022

This works pretty well but it's pretty basic. It doesn't for example list all HN comments and ascertain interests etc from there. Just the account info.

It's nice for an initial scan of a target but I expected a more comprehensive report. It's a good start but it could use a lot more.

ergodic1 · on Oct 3, 2022

Someone needs to build a 1password for usernames.

bee_rider · on Oct 3, 2022

Most password managers will already do this, right? Like the iOS and Firefox built in ones will remember a username/password pair.

weaksauce · on Oct 3, 2022

they mean generator

bee_rider · on Oct 3, 2022

Hmm.

There are various fantasy name generators, for D&D and the like.

weaksauce · on Oct 3, 2022

that is built into bitwarden actually

jonas-w · on Oct 3, 2022

Yes also with plus addressed E-Mail, or a catch-all E-Mail generator or if you use Firefox Relay etc a generator for these. But the username generator is just a random English word plus a number.

retox · on Oct 3, 2022

I'd never heard of Firefox Relay, thanks for that.

https://relay.firefox.com/

_Algernon_ · on Oct 3, 2022

I find lastpass's generator acceptable. https://www.lastpass.com/features/username-generator

xena · on Oct 3, 2022

Use a D&D character name generator

walrus01 · on Oct 2, 2022

This also seems like a useful tool for hunting down sockpuppet and nation-state sponsored astroturfing campaigns.

NavinF · on Oct 3, 2022

No, this finds accounts that want to be found. Sockpuppets wouldn't use the same username.

_abox · on Oct 5, 2022

Which could be one thing that identifies them!

dymk · on Oct 3, 2022

Those accounts seem the most likely to be randomly named per-site. Why would they reuse a username?

epakai · on Oct 3, 2022

Sometimes a persona might be created to establish rapport. It's probably pretty rare because of the extra effort isn't commensurate with with returns.

vincnetas · on Oct 3, 2022

This just made me think about treating both username and password fields as "put random strings here and let password manager remember them".

And then give a nickname for account in password manager. And communicate your username to your friends through other channels if you need to.

pvillano · on Oct 3, 2022

Mai re-gret is reusing my first username for multiple accounts when I first joined the internet. I've tried to transfer accounts to separate work, personal, and anonymous emails, but nothing is ever really deleted.

pvaldes · on Oct 3, 2022

Francosphere, so perhaps not so well known in US as in Europe. For context:

https://en.wikipedia.org/wiki/Jules_Maigret

Terrific detective books

throwaway81523 · on Oct 2, 2022

Use a separate password for each site, and also a separate username.

microtherion · on Oct 3, 2022

A number of the sites this is checking seem to be returning a "hit" for any user name you check on them, whether an account exists or not.

Lucent · on Oct 3, 2022

This is going to work great on me, having spent most of my life pouncing on this name on every platform.

29athrowaway · on Oct 3, 2022

Some people have voluntarily done what this service does via Linktree.

VincentKun · on Oct 3, 2022

Yeah I hope this won't get used by whoever want to hire me

kodt · on Oct 3, 2022

Seems like several large forums are missing from the website list.

rishsriv · on Oct 3, 2022

Did something similar as an experiment a few years ago, except I used photos and name strings as fuzzy identifiers across social media profiles.

We also scraped individual reactions from social media apps to get a _very_ detailed profile on what they engaged with (like using the "Angry" reaction emoji when Trump said something stupid vs using the "Angry" reaction emoji when someone AOC said something stupid).

Never released it in the wild for obvious ethnical reasons, but was an interesting technical challenge. Also led to super interesting insights – like learning that videos and text links were watched by entirely different audiences on Facebook and Twitter [1]

[1] https://twitter.com/rishdotblog/status/1483329729302515712

DaMaGeLaB · on Oct 3, 2022

Nice xD

pseingatl · on Oct 3, 2022

maigret: command not found

_Algernon_ · on Oct 3, 2022

Try installing it