Wikipedia is under assault: rogue users keep posting AI generated nonsense

ForgottenFlux@lemmy.world · edit-2 25 days ago

Wikipedia is under assault: rogue users keep posting AI generated nonsense

schizo@forum.uncomfortable.business · 25 days ago

Further proof that humanity neither deserves nor is capable of having nice things.

Who would set up an AI bot to shit all over the one remaining useful thing on the Internet, and why?

I’m sure the answer is either ‘for the lulz’ or ‘late-stage capitalism’, but still: historically humans aren’t usually burning down libraries on purpose.

poszod@lemmy.world · 25 days ago

State actors could be interested in doing that. Same with the internet archive attacks.

narc0tic_bird@lemm.ee · 25 days ago

Best case is that the model used to generate this content was originally trained by data from Wikipedia so it “just” generates a worse, hallucinated “variant” of the original information. Goes to show how stupid this idea is.

Imagine this in a loop: AI trained by Wikipedia that then alters content on Wikipedia, which in turn gets picked up by the next model trained. It would just get worse and worse, similar to how converting the same video over and over again yields continuously worse results.

8uurg@lemmy.world · 24 days ago

A very similar situation to that analysed in this paper that was recently published. The quality of what is generated degrades significantly.

Although they mostly investigate replacing the data with ai generated data in each step, so I doubt the effect will be as pronounced in practice. Human writing will still be included and even curation of ai generated text by people can skew the distribution of the training data (as the process by these editors would inevitably do, as reasonable text could get through the cracks.)

Blaster M@lemmy.world · edit-2 5 days ago

AI model makers are very well aware of this and there is a move from ingesting everything to curating datasets more aggressively. Data prep is something many upstarts have no idea is critical, but everyone is learning about, sometimes the hard way.

sbv@sh.itjust.works · 25 days ago

As for why this is happening, the cleanup crew thinks there are three primary reasons.

“[The] main reasons that motivate editors to add AI-generated content: self-promotion, deliberate hoaxing, and being misinformed into thinking that the generated content is accurate and constructive,”

That last one. Ouch.

TimLovesTech (AuDHD)(he/him)@badatbeing.social · 25 days ago

“[The] main reasons that motivate editors to add AI-generated content: self-promotion, deliberate hoaxing, and being misinformed into thinking that the generated content is accurate and constructive,”

I think the main driver behind people misinformed about AI content comes from the fact that outside of tech people, most have no idea that AI will:

100% make up answers to things it doesn’t know because either the sample size of data they have ingested was to small or was bad. And it will do this with the same robot confidence you get for any other answer.
AI that has been fed to much other AI generated content will begin to “hallucinate” and give some wild outputs, very similar to humans suffering from schizophrenia. And again these answers will be given as “fact” with the same robotic confidence.

Wiz@midwest.social · 24 days ago

And then #2 will be copied by other people and AIs, becoming seen as fact.

Wiz@midwest.social · 24 days ago

And then #2 will be copied by other people and AIs, becoming seen as fact.

drunkpostdisaster@lemmy.world · 24 days ago

It’s over. We lost.

vext01@lemmy.sdf.org · 25 days ago

Slop!