the ethics of archiving

brightbluebug
Casual Poster

⛺︎ My Room

Artifacts:

the ethics of archiving

« on: a Spring day »

thinking about born-digital archival (or archival in general), and the conflict that is wants and wishes of the original creator vs preservation.

this: https://forum.melonland.net/index.php?topic=3586.0 thread from last year piqued thought of this, but what mostly prompted this thread was this: https://huggingface.co/datasets/nyuuzyou/archiveofourown/discussions/3

basically someone scraped ao3 and made a huge dataset of fics, then posted it on huggingface, sort of a gen-ai enthusiast community from what i can tell - so with the intent of serving as training data for generative ai; this was done without the consent of any of the authors nor ao3, & is questionably legal. it was posted about on social media (reddit..?); this angered a lot of fic writers, for obvious (and i personally think rightful) reasons.

some went to the huggingface thread and there ended up being some interesting discussion (along with as par death threats). the ethics of ai, if training it on data is same as human learning, etc. the thread itself was started by someone asking for the dataset somewhere else, as they wanted it partially as an offline archive of the site; one reply to it reads,

Quote

Hey! If you want to archive fic or artwork from AO3, do it yourself instead of supporting a disgusting thief.

which kind of encapsulates what the big question is: is it ethical to archive things, no matter the means nor consent of its creator? if something is made without the intent of archival and rather for something like profit, but fufills the same ends, should it still be evaluated ethically the same way?

i'm not sure where i stand on the creator consent vs archival thing, but i'm leaning towards the former; art being something so creator-centered and intertwined that i think they deserve that respect regarding their work. i would like to know your all's thoughts

a few other thoughts about this:
-if individual/local archives exist, is it necessary for larger/collective archives to be created of those?
-there's a pretty clear clash of internet subcultures on here. its interesting
-is there necessity to archive archives?
-should certain forms of archival be more accepted than others?


« Last Edit: a Spring day by brightbluebug »	Logged

nlolnlolnlo
Newbie ⚓︎

it's the end of the world and i'm driving around
⛺︎ My Room

Artifacts:

Re: the ethics of archiving

« Reply #1 on: a Spring day »

not in a state of mind to provide any sort of insight but just want to say this is something i've considered a lot lately especially in the way that archival overlaps with piracy. intriuged to see what discussions this brings forward


	Logged

nobo
Full Member ⚓︎

drainnnn
⛺︎ My Room
StatusCafe: nobo
iMood:

Itch.io: My Games
RSS:

Guild Memberships:

Artifacts:

Re: the ethics of archiving

« Reply #2 on: a Spring day »

My viewpoint might be a bit old fashioned, but I don't see the Internet as a marketplace but as a public information resource. Things that are put on the Internet are easy to backup and make copies of by design. It's a feature, not a bug.

Over the years and for a long time now people have realized the Internet could be transformed into a marketplace and now expect everyone to retroactively treat it like one. For example, uploading an image to the Internet and saying, "please don't archive this" would be like if Walmart shipped a pallet load of groceries to the park and just left it on the basketball court with a sign that said, "Please don't touch."

I do understand the frustration, because every time someone employs a scraper for public good (read: Aaron Schwartz), they get the books thrown at them legally, and every time someone uses it for personal profit, nothing is done. It is a bullshit double standard. And it's completely not fair so I empathize.


	Logged

https://board.goeshard.org/static/images/button-88-31.3b39bc79220a.png

_ghost_
Full Member ⚓︎

⛺︎ My Room

Guild Memberships:

Artifacts:

Re: the ethics of archiving

« Reply #3 on: a Spring day »

The thing about generative AI, in my opinion, can be summed up with two specific problems.

The first is that the vast majority of datasets, especially ones meant for widespread, public usage, just take indiscriminately from large chunks of the internet with no real care or attention to what that material is. So many kinds of AI programs including gen-AI and algorithms meant to identify and categorize photos are often extremely biased and sometimes trained off of data that constitutes an actual privacy issue. Any photo of you that has been uploaded to a server anywhere has the potential to have ended up in the specific part of the web any giving AI crawler is scraping. Personal records get leaked online all the time. It's not impossible and is in fact likely that personal records end up in AI data sets eventually. This is an internet privacy issue generally, and also a consent issue; being able to opt-in to allowing your photos or records to be used for this should be fully up to you, and the lack of transparency is gross.

The second is just the ways these programs are pushed into businesses in place of actual human workers when the technology is fully just Not There Yet to support full automation but it's there Enough to need a human to fix it's mistakes. In the field of translation, for example, machine translation has existed for years and is now basically just being rebranded as "AI" when there's not been much of a change in how it works, just how well it works. Someone who would get hired to do a full translation ten years ago might now be hired to "Edit" an existing translation done by AI that is filled with so many errors they have to start from scratch, and getting paid less since they aren't being hired to actually do the translating, even if they still are doing it. It's doing a lot to devalue the worker and add meaningless steps to their job. AI can just only do so many things. This isn't even getting into things that become fully meaningless when done by AI (i.e. people using them to write university papers, when the entire point of writing university papers is proving that you know how to research information or understand a text and build an argument using that info).

So the long and short of it is that no, I don't find archiving unethical, and generally speaking even AI-crawling is, conceptually, neutral. The issues with the latter are because of the execution, not the idea. Some people act like archiving someone's online work is a huge deal because they didn't give permission, but by putting it on the internet, you have given other people permission to see and download it. You have actively put it out of your control. I think it's polite to remove something from a web archive if the owner/creator asks, especially if the archived page contains personal information, but archival work is always going to include things that people don't want saved or didn't even consider would be saved. Think of how many artists and writers we don't know the names of because their work was just not preserved. Surely some of them didn't want to be remembered, anyway, but what they would or wouldn't have wanted isn't really relevant anymore.

Also I just don't believe in, like, "owning" an idea or whatever; not crediting sources is unethical but I fully think a copyright system that makes it illegal for two people to write two books that are deemed too similar is stupid, and a lot of people's complaints about gen-AI or web archives are basically just regurgitating copyright law as if it's a morality thing and not a law designed to govern how businesses maintain "ownership" of an idea or image.


	Logged

crazyroostereye
Full Member ⚓︎

I am most defiantly a Human
⛺︎ My Room
iMood:

RSS:

Guild Memberships:

Artifacts:

Re: the ethics of archiving

« Reply #4 on: a Spring night »

Similar to what the others already said. Putting something on the Internet is inherently releasing that Information to be taken by anyone. While respecting Copyright is Important, and that Copyright serves a Purpose. While still agreeing that most Implementations of Copyright are broken. As I believe, that a person who came up with the Idea should have a head start to Market his Idea and to garner Fame and Fortune with his Idea. But also it is only a Head Start and not a practically Permanent Right to it.

But in Particular Archiving and Preservation takes a high Societal Value to me anyway, where nor Law, nor Artist Will has a right to Prevent the Preservation. And that goes for Physical and Digital Mediums. While Distribution of that Archive can be Limited or even Restricted especially in the case of Copyrighted work. The Archives still should reserve the Right to maintain a Copy, even Against the Will of the Artist and Copyright holder.


	Logged

https://crazyroostereye.de/stuff/Crazy31.gif

https://crazyroostereye.de/stuff/Zen31.png

alexela64
Sr. Member ⚓︎

i keep reading the forum post. i am forums o'toole
⛺︎ My Room
SpaceHey: Friend Me!

Guild Memberships:

Artifacts:

Re: the ethics of archiving

« Reply #5 on: a Summer day »

Not sure if this is a valuable addition to the conversation but just in case it is;

I think a lot of this has to do with intent. In my opinion (and this may not be correct) using datasets in ai crawling is decidedly Not archiving related. So to me it depends on whether the dataset was compiled with the intent of use for ai? or if it was done out of a care for preservation of creators' works. 

There is a fundamental difference between someone saving something so that more people can see it and saving something so that they can run it through a tool that effortlessly creates a biased and in no way original work. gen-ai is 100% derivative...

that's just my 2 cents tho


	Logged

Comment music recs on my profile

https://file.garden/ZZOi0LOs0CiXN99M/mimikyu.gif

https://file.garden/ZZOi0LOs0CiXN99M/mimikyu-busted.gif

https://file.garden/ZZOi0LOs0CiXN99M/2.25.24/palestine.gif

Leonia
Casual Poster ⚓︎

⛺︎ My Room
RSS:

Artifacts:

Re: the ethics of archiving

« Reply #6 on: a Summer day »

I don't think that this ethical concern is about archival, despite the fact that companies like Nintendo, Disney, or Nexon would disagree because they all rely on a scarcity of some kind for their profit, or feel incentivized to protect the scarcity involved. Copyright is, in its current state, an absolutely immoral mechanism of government-derived power designed to quash the creative ability of human beings. Before the long road of copyright extensions that were led by massive media companies, like but not exclusive to Disney, copyright lasted 28 years and had to be declared. Before that it lasted 14 years.

Now copyright lasts until the death of the author + 50 years, or 100 years, whichever is shorter. And in the United States of America that's the minimum! Many countries together agreed upon this approach which heavily benefits businesses and does very little for the individual. It also is applied automatically on the creation of works and sometimes even applies retroactively.

Did you know the Happy Birthday song's first publications were before 1923? (source 1 and 2 for that claim) yet the copyright holders claim that the first authorized publication was made in 1935! It took a judge in 2015 telling them their copyright was invalid to stop sending people and companies takedown notices for singing one of the most famous songs in the past 100 years.

This is what copyright law is for and does. It tries to harm individual public human goods like libraries or derivative works like fanfiction and make them struggle to exist in the modern day or fully illegal and only alive thanks to the power of barely being noticed or on the creator's radar at all. And even then, being a secret can only do so much.

I think the issue here is twofold. One is with the way laws protecting corporate interests have changed the global perception of media preservation and the creation of derivative works. The other is the moral and ethical issues that society is already dealing with in generative AI and the various unresolved issues associated with it.


	Logged

https://leonia.neocities.org/Perpeutal_motion.gif

Kolo
Full Member ⚓︎

⛺︎ My Room
StatusCafe: kolo
RSS:

Guild Memberships:

Artifacts:

Re: the ethics of archiving

« Reply #7 on: a Summer day »

I think if a someone asked you not to archive their work, you should respect their wishes. Going against what they explicitly asked feels... imposing, perhaps - that your desire to archive supersedes their desire for their own work. Not everything is created with intent to proliferate and exist in perpetuity. It's OK for some things to disappear and become irrelevant through obscurity.

I think at some points obsession with archiving can come across as a bit entitled when it completely dismisses the original creator's wishes. I've known people who have stopped posting their work entirely because they did not like it archived against their will. And I miss seeing their works dearly. And I think: can't there be a place in this world for things to be fleeting? I think there is value in being part of something temporary.

At the same time I am not opposed to archiving as a whole and I do think it is a net benefit. But also... mmm... a lot of archiving is done mindlessly and blindly. Like the OP's example, snatching tons of data through scrapers for packaging up and sale to neural nets. It would be nice if archiving was a smaller, more intimate act - if it was done by people with motivation and affection for what they were archiving, who you could trust would care for the data they are collecting and take into account ways to preserve and present it that adhere to the original intent and presentation. But I realize that's a tall ask.

Also, to some degree, archiving feels like it ties into the sensation of always being watched online. Your words... your mistakes can exist forever, encapsulated and preserved on a server you did not willingly attach them to. Or on a hard-drive, or in a screenshot. At times it makes me hesitant to speak because those echoes can linger a lot longer than the person that spoke them. I'm not the same me as five years ago, but snapshots of that version of me have become ghosts on old servers. It's a strange feeling.


	Logged

⋆｡˚⋆.°˖✧ there is a dream world many miles inside me ✧˖°.⋆ ˚｡⋆
⋆｡˚⋆.°˖✧ and i go there when i can ... many miles haunting me ✧˖°.⋆ ˚｡⋆

Symberzite
Casual Poster ⚓︎

⛺︎ My Room
Itch.io: My Games

Guild Memberships:

Artifacts:

Re: the ethics of archiving

« Reply #8 on: a Summer night »

Maybe I'm going to be a dissenting voice here, but I think some kind of copyright protection is a good idea. I think it was Mark Twain that petitioned the US government to create the first IP laws since some publishing house would take his brand new books and sell them without his permission or without giving out any royalties. It's just he wanted stuff to be protected for five years before going into public domain. Then things gradually became more and more insane.

As a rule of thumb, I never pirate indie games or comics. Or when I can't obtain them legally due to corporate shenanigans I look for the creator's socials and try to donate them some money directly to offset the laws.

In terms of software, yeah, it's a pain. The sad thing is that archival is the only way to play them period. And if anyone remembers the Stop Killing Games movement, there's a push to make destruction of software permanent. That's my main gripe with Game Maker as a piece of software. I legally bought the full Pro version years ago, then they nullified it when they switched to a subscription format. That... shouldn't be a thing companies can do.


	Logged

https://symberzite.neocities.org/Symberweb__files/gifs/ub_tf2.png

https://symberzite.neocities.org/Symberweb__files/gifs/ub_ps2.jpg

https://symberzite.neocities.org/Symberweb__files/gifs/ub_halflife.png

Corrupted Unicorn
Hero Member ⚓︎

Unicorns aren't meant to stay forever
⛺︎ My Room
iMood:

Guild Memberships:

Artifacts:

Re: the ethics of archiving

« Reply #9 on: an Autumn day »

I'll be kind of brief even if I have strong feelings about this, but most media owners do not care about preservation, just about making a quick buck

and if it were for them, most art & media would be lost.

I'm also acquiring the belief that "asking for permission for EVERYTHING does nothing but bog you down", which is good to take action, but maybe not-so-good for considering other's feelings.

I've never had a request to take something down, but if I get one and the ask is reasonable, sure.


	Logged

https://corruptedunicorn.neocities.org/images/gifs/robotuni.gif

Artifact Swap:

Pages: [1]

« previous next »