The Internet Archive and its 916 billion saved web pages are back online

gedaliyah@lemmy.world · 21 days ago

The Internet Archive and its 916 billion saved web pages are back online

dohpaz42@lemmy.world · 21 days ago

Maybe it’s time to federate the IA.

Pasta Dental@sh.itjust.works · 21 days ago

One of the rare use cases of a blockchain actually being useful. A federated internet archive that uses a blockchain to validate that the saved data has not been altered by a malicious actor trying to tamper with proofs

That would be really cool but horribly inefficient because of the sheer amount of storage required

RecluseRamble@lemmy.dbzer0.com · 21 days ago

horribly inefficient

The core feature of all blockchain tech.

_NoName_@lemmy.ml · 16 days ago

We don’t need a blockchain for that.

Having multiple servers which store file checksums would have much less overhead, would be easily repeatable and appendable, with no need for unnecessary computational labor. Linux mint currently uses the checksum process for verifying that an ISO downloaded is not altered in any way, and it can work for any file (preferably not humongous files).

Strive for K.I.S.S. whenever possible.

WaterSword@discuss.tchncs.de · 21 days ago

The thing is sometimed articles must be removed from IA (copyright (I disagree with that one) or when information is leaked that could threaten lives), with a blockchain this would be impossible

tehmics@lemmy.world · 21 days ago

this would be impossible

Perfect.

I’d be interested in seeing real examples where lives are threatened. I find it unlikely that the internet archive would be the exclusive arbiter of so-called deadly information

brbposting@sh.itjust.works · 19 days ago

I thought of something but I don’t know if it’s a good example.

Here’s the hypothetical:

A criminal backs up a CSAM archive. Maybe the criminal is caught, heck say they’re executed. Pedos can now share the archive forever over encrypted messengers without fear of it being deleted? Not ideal.

tehmics@lemmy.world · edit-2 18 days ago

Yeah this is a hard one to navigate and it’s the only thing I’ve ever found that challenges my philosophy on the freedom of information.

The archive itself isn’t causing the abuse, but CSAM is a record of abuse and we restrict the distribution not because distribution or possession of it is inherently abusive, but because the creation of it was, and we don’t want to support an incentive structure for the creation of more abuse.

i.e. we don’t want more pedos abusing more kids with the intention of archival/distribution. So the archive itself isn’t the abuse, but the incentive to archive could be.

There’s also a lot of questions with CSAM in general that come up about the ethics of it in that I think we aren’t ready to think about. It’s a hard topic all around and nobody wants to seriously address it beyond virtue signalling about how bad it is.

I could potentially see a scenario where the archival could be beneficial to society similar to the FBI hash libraries Apple uses to scan iCloud for CSAM. If we throw genAI at this stuff to learn about it, we may be able to identify locations, abusers and victims to track them down and save people. But it would necessitate the existence of the data to train on.

I could also see potential for using CSAM itself for psychotherapy. Imagine a sci-fi future where pedos are effectively cured by using AI trained on CSAM to expose them to increasingly mature imagery, allowing their attraction to mature with it. We won’t really know if something like that is possible if we delete everything. It seems awfully short sighted to me to delete data no matter how perverse, because it could have legitimate positive applications that we haven’t conceived of yet. So to that end, I do hope some 3 letter agencies maintain their restricted archives of data for future applications that could benefit humanity.

All said, I absolutely agree that the potential of creating incentives for abusers to abuse is a major issue with immutable archival, and it’s definitely something that we need to figure out, before such an archive actually exists. So thank you for the thought experiment.

Cyborganism@lemmy.ca · 21 days ago

I don’t know if that’s a good idea.

How would you go about implementing the infrastructure for that?

dohpaz42@lemmy.world · 20 days ago

That’s an excellent question. Unfortunately I do not have an answer. But I believe it’s worth discussing some means of redundancy for the IA; even if it’s as simple as rsync to other hosts.