Entrance Chat Gallery Guilds Search Everyone Wiki Login Register

Welcome, Guest. Please login or register. - Thinking of joining the forum??
August 22, 2025 - @154.77 (what is this?)
Activity rating: Three Stars Posts & Arts: 27/1k.beats Unread Topics | Unread Replies | My Stuff | Random Topic | Recent Posts Start New Topic  Submit Art
News: :skull: Websites are like whispers in the night  :skull: Super News: Upload a banner! (or else!)

+  MelonLand Forum
|-+  Interests Zone
| |-+  ⛽︎ ∙ Technology & Archiving
| | |-+  the ethics of archiving


« previous next »
Pages: [1] Print
Author Topic: the ethics of archiving  (Read 652 times)
brightbluebug
Casual Poster
*

⛺︎ My Room

View Profile

Joined 2024!
« on: May 11, 2025 @382.05 »

thinking about born-digital archival (or archival in general), and the conflict that is wants and wishes of the original creator vs preservation.

this: https://forum.melonland.net/index.php?topic=3586.0 thread from last year piqued thought of this, but what mostly prompted this thread was this: https://huggingface.co/datasets/nyuuzyou/archiveofourown/discussions/3

basically someone scraped ao3 and made a huge dataset of fics, then posted it on huggingface, sort of a gen-ai enthusiast community from what i can tell - so with the intent of serving as training data for generative ai; this was done without the consent of any of the authors nor ao3, & is questionably legal. it was posted about on social media (reddit..?); this angered a lot of fic writers, for obvious (and i personally think rightful) reasons.

some went to the huggingface thread and there ended up being some interesting discussion (along with as par death threats). the ethics of ai, if training it on data is same as human learning, etc. the thread itself was started by someone asking for the dataset somewhere else, as they wanted it partially as an offline archive of the site; one reply to it reads,
Quote
Hey! If you want to archive fic or artwork from AO3, do it yourself instead of supporting a disgusting thief.


which kind of encapsulates what the big question is: is it ethical to archive things, no matter the means nor consent of its creator? if something is made without the intent of archival and rather for something like profit, but fufills the same ends, should it still be evaluated ethically the same way?

i'm not sure where i stand on the creator consent vs archival thing, but i'm leaning towards the former; art being something so creator-centered and intertwined that i think they deserve that respect regarding their work. i would like to know your all's thoughts :smile:

a few other thoughts about this:
-if individual/local archives exist, is it necessary for larger/collective archives to be created of those?
-there's a pretty clear clash of internet subcultures on here. its interesting
-is there necessity to archive archives?
-should certain forms of archival  be more accepted than others?
« Last Edit: May 11, 2025 @756.30 by brightbluebug » Logged
nlolnlolnlo
Newbie
*


it's the end of the world and i'm driving around

⛺︎ My Room

View Profile WWW

Found A Trick With Life's LemonsJoined 2024!
« Reply #1 on: May 11, 2025 @530.51 »

not in a state of mind to provide any sort of insight but just want to say this is something i've considered a lot lately especially in the way that archival overlaps with piracy. intriuged to see what discussions this brings forward
Logged

nobo
Full Member ⚓︎
***


drainnnn

⛺︎ My Room
StatusCafe: nobo
iMood: nobo
Itch.io: My Games
RSS: RSS

View Profile WWW

First 1000 Members!Joined 2023!
« Reply #2 on: May 11, 2025 @579.44 »

My viewpoint might be a bit old fashioned, but I don't see the Internet as a marketplace but as a public information resource. Things that are put on the Internet are easy to backup and make copies of by design. It's a feature, not a bug.

Over the years and for a long time now people have realized the Internet could be transformed into a marketplace and now expect everyone to retroactively treat it like one. For example, uploading an image to the Internet and saying, "please don't archive this" would be like if Walmart shipped a pallet load of groceries to the park and just left it on the basketball court with a sign that said, "Please don't touch."

I do understand the frustration, because every time someone employs a scraper for public good (read: Aaron Schwartz), they get the books thrown at them legally, and every time someone uses it for personal profit, nothing is done. It is a bullshit double standard. And it's completely not fair so I empathize.
Logged

_ghost_
Full Member ⚓︎
***


⛺︎ My Room

View Profile WWW

Melonland's Local Ghost !Joined 2025!
« Reply #3 on: May 11, 2025 @610.17 »

The thing about generative AI, in my opinion, can be summed up with two specific problems.

The first is that the vast majority of datasets, especially ones meant for widespread, public usage, just take indiscriminately from large chunks of the internet with no real care or attention to what that material is. So many kinds of AI programs including gen-AI and algorithms meant to identify and categorize photos are often extremely biased and sometimes trained off of data that constitutes an actual privacy issue. Any photo of you that has been uploaded to a server anywhere has the potential to have ended up in the specific part of the web any giving AI crawler is scraping. Personal records get leaked online all the time. It's not impossible and is in fact likely that personal records end up in AI data sets eventually. This is an internet privacy issue generally, and also a consent issue; being able to opt-in to allowing your photos or records to be used for this should be fully up to you, and the lack of transparency is gross.

The second is just the ways these programs are pushed into businesses in place of actual human workers when the technology is fully just Not There Yet to support full automation but it's there Enough to need a human to fix it's mistakes. In the field of translation, for example, machine translation has existed for years and is now basically just being rebranded as "AI" when there's not been much of a change in how it works, just how well it works. Someone who would get hired to do a full translation ten years ago might now be hired to "Edit" an existing translation done by AI that is filled with so many errors they have to start from scratch, and getting paid less since they aren't being hired to actually do the translating, even if they still are doing it. It's doing a lot to devalue the worker and add meaningless steps to their job. AI can just only do so many things. This isn't even getting into things that become fully meaningless when done by AI (i.e. people using them to write university papers, when the entire point of writing university papers is proving that you know how to research information or understand a text and build an argument using that info).

So the long and short of it is that no, I don't find archiving unethical, and generally speaking even AI-crawling is, conceptually, neutral. The issues with the latter are because of the execution, not the idea. Some people act like archiving someone's online work is a huge deal because they didn't give permission, but by putting it on the internet, you have given other people permission to see and download it. You have actively put it out of your control. I think it's polite to remove something from a web archive if the owner/creator asks, especially if the archived page contains personal information, but archival work is always going to include things that people don't want saved or didn't even consider would be saved. Think of how many artists and writers we don't know the names of because their work was just not preserved. Surely some of them didn't want to be remembered, anyway, but what they would or wouldn't have wanted isn't really relevant anymore.

Also I just don't believe in, like, "owning" an idea or whatever; not crediting sources is unethical but I fully think a copyright system that makes it illegal for two people to write two books that are deemed too similar is stupid, and a lot of people's complaints about gen-AI or web archives are basically just regurgitating copyright law as if it's a morality thing and not a law designed to govern how businesses maintain "ownership" of an idea or image.
Logged
crazyroostereye
Full Member ⚓︎
***


I am most defiantly a Human

⛺︎ My Room
RSS: RSS

View Profile WWW

Joined 2024!
« Reply #4 on: May 11, 2025 @860.17 »

Similar to what the others already said. Putting something on the Internet is inherently releasing that Information to be taken by anyone. While respecting Copyright is Important, and that Copyright serves a Purpose. While still agreeing that most Implementations of Copyright are broken. As I believe, that a person who came up with the Idea should have a head start to Market his Idea and to garner Fame and Fortune with his Idea. But also it is only a Head Start and not a practically Permanent Right to it.

But in Particular Archiving and Preservation takes a high Societal Value to me anyway, where nor Law, nor Artist Will has a right to Prevent the Preservation. And that goes for Physical and Digital Mediums. While Distribution of that Archive can be Limited or even Restricted especially in the case of Copyrighted work. The Archives still should reserve the Right to maintain a Copy, even Against the Will of the Artist and Copyright holder.
Logged

musicobsessed107
Jr. Member ⚓︎
**

⛺︎ My Room
PicMix: https://en.picmix.com/profile/xxDeadInside2006xx

View Profile WWW

Joined 2024!
« Reply #5 on: August 20, 2025 @267.20 »

thinking about born-digital archival (or archival in general), and the conflict that is wants and wishes of the original creator vs preservation.

this: https://forum.melonland.net/index.php?topic=3586.0 thread from last year piqued thought of this, but what mostly prompted this thread was this: https://huggingface.co/datasets/nyuuzyou/archiveofourown/discussions/3

basically someone scraped ao3 and made a huge dataset of fics, then posted it on huggingface, sort of a gen-ai enthusiast community from what i can tell - so with the intent of serving as training data for generative ai; this was done without the consent of any of the authors nor ao3, & is questionably legal. it was posted about on social media (reddit..?); this angered a lot of fic writers, for obvious (and i personally think rightful) reasons.

some went to the huggingface thread and there ended up being some interesting discussion (along with as par death threats). the ethics of ai, if training it on data is same as human learning, etc. the thread itself was started by someone asking for the dataset somewhere else, as they wanted it partially as an offline archive of the site; one reply to it reads,

which kind of encapsulates what the big question is: is it ethical to archive things, no matter the means nor consent of its creator? if something is made without the intent of archival and rather for something like profit, but fufills the same ends, should it still be evaluated ethically the same way?

i'm not sure where i stand on the creator consent vs archival thing, but i'm leaning towards the former; art being something so creator-centered and intertwined that i think they deserve that respect regarding their work. i would like to know your all's thoughts :smile:

a few other thoughts about this:
-if individual/local archives exist, is it necessary for larger/collective archives to be created of those?
-there's a pretty clear clash of internet subcultures on here. its interesting
-is there necessity to archive archives?
-should certain forms of archival  be more accepted than others?

Yes, it is ethical. My general rule of thumb is that if it's on the internet, it should be free for all to use, including the right of the public to create backups and archives of it. Think of the internet as a digital version of your local public library as opposed to your own personal book collection at home.

It's for the greater good of the web and it's especially important now in this day and age of many older websites becoming unusable or disappearing at an alarming rate.
Logged

alexela64
Sr. Member ⚓︎
****


aint no car thats faster than a: TRANS-AM

⛺︎ My Room
SpaceHey: Friend Me!

View Profile WWW

A Jellyfish in Behalf of haumeaGethSuck At Something September - Did It!Joined 2023!
« Reply #6 on: August 20, 2025 @630.14 »

Not sure if this is a valuable addition to the conversation but just in case it is;

I think a lot of this has to do with intent. In my opinion (and this may not be correct) using datasets in ai crawling is decidedly Not archiving related. So to me it depends on whether the dataset was compiled with the intent of use for ai? or if it was done out of a care for preservation of creators' works.

There is a fundamental difference between someone saving something so that more people can see it and saving something so that they can run it through a tool that effortlessly creates a biased and in no way original work. gen-ai is 100% derivative...

that's just my 2 cents tho
Logged

Leonia
Newbie ⚓︎
*


⛺︎ My Room
RSS: RSS

View Profile WWW

Joined 2025!
« Reply #7 on: August 20, 2025 @791.01 »

I don't think that this ethical concern is about archival, despite the fact that companies like Nintendo, Disney, or Nexon would disagree because they all rely on a scarcity of some kind for their profit, or feel incentivized to protect the scarcity involved. Copyright is, in its current state, an absolutely immoral mechanism of government-derived power designed to quash the creative ability of human beings. Before the long road of copyright extensions that were led by massive media companies, like but not exclusive to Disney, copyright lasted 28 years and had to be declared. Before that it lasted 14 years.

Now copyright lasts until the death of the author + 50 years, or 100 years, whichever is shorter. And in the United States of America that's the minimum! Many countries together agreed upon this approach which heavily benefits businesses and does very little for the individual. It also is applied automatically on the creation of works and sometimes even applies retroactively.

Did you know the Happy Birthday song's first publications were before 1923? (source 1 and 2 for that claim) yet the copyright holders claim that the first authorized publication was made in 1935! It took a judge in 2015 telling them their copyright was invalid to stop sending people and companies takedown notices for singing one of the most famous songs in the past 100 years.

This is what copyright law is for and does. It tries to harm individual public human goods like libraries or derivative works like fanfiction and make them struggle to exist in the modern day or fully illegal and only alive thanks to the power of barely being noticed or on the creator's radar at all. And even then, being a secret can only do so much.

I think the issue here is twofold. One is with the way laws protecting corporate interests have changed the global perception of media preservation and the creation of derivative works. The other is the moral and ethical issues that society is already dealing with in generative AI and the various unresolved issues associated with it.
Logged

Pages: [1] Print 
« previous next »
 

Melonking.Net © Always and ever was! SMF 2.0.19 | SMF © 2021 | Privacy Notice | ~ Send Feedback ~ Forum Guide | Rules | RSS | WAP | Mobile


MelonLand Badges and Other Melon Sites!

MelonLand Project! Visit the MelonLand Forum! Support the Forum
Visit Melonking.Net! Visit the Gif Gallery! Pixel Sea TamaNOTchi