Entrance Events! Chat Gallery Search Everyone Wiki Login Register

Welcome, Guest. Please login or register. - Thinking of joining the forum??
April 04, 2025 - @921.85 (what is this?)
Activity rating: Four Stars Posts & Arts: 50/1k.beats Unread Topics | Unread Replies | My Stuff | Random Topic | Recent Posts Start New Topic  Submit Art
News: :happy:  :pc: There are community newsletters here! :pc: :happy: Super News: Upload a banner!

+  MelonLand Forum
|-+  World Wild Web
| |-+  ☞ ∙ Life on the Web
| | |-+  ⛽︎ ∙ Technology & Archiving
| | | |-+  Website Archival Discussion Thread


« previous next »
Pages: [1] Print
Author Topic: Website Archival Discussion Thread  (Read 109 times)
Corrupted Unicorn
Full Member ⚓︎
***


Obscure Niche Internet Mad Artist

⛺︎ My Room
iMood: moodyunicorn

View Profile WWWArt

Rainbow Noodle Dance!Scrafty, I choose you!First 1000 Members!Joined 2023!
« on: March 14, 2025 @424.02 »

Quick story first: I have a personal Discord with my BF. We're the only two members, and we use it to send eachother memes and interesting bits we find on the Net. However... nothing is forever, even on the Internet  :trash: , and here on the Web Revival we should know better. I've started noticing tweets and posts disappearing with empty thumbnail images. And since I've got an art resource directory going on, I'm starting to worry if one of those tidbits of info disappears...  :sad:

I think many of us know how to preserve our own websites, we've even got a tool for that! But what about other sites we have less control about? How to preserve a website? A webpage? A tweet? A Tumblr post? A video? For everybody, for yourself? Is the Wayback Machine enough?

This topic is to discuss all methods of web archival, in order to make preservation of all information available on the internet more accessible and viable for everyone.  :pc: Feel free to discuss methods and propose your own.  :defrag:
Logged

TheFrugalGamer
Sr. Member ⚓︎
****


⛺︎ My Room
Itch.io: My Games
RSS: RSS

View Profile WWW

Great Posts PacmanFirst 1000 Members!Pocket Icelogist!Joined 2022!
« Reply #1 on: March 14, 2025 @736.25 »

I know a lot of people are intimidated by the command line, but if you're really serious about web archiving, then WGet is a tool that's worth learning:

https://www.gnu.org/software/wget/

It can automatically convert links for you so that websites are browsable locally, and it will traverse through directories any way you specify. Two other tools I use frequently and swear by are:
https://github.com/ytdl-org/youtube-dl for YouTube videos as well as videos on tons of other sites, and https://farnots.github.io/RedditToMarkdown/ for saving Reddit posts in Markdown format. I'm always looking for new tools and ways to make things easier, though.
Logged

Bede
Full Member ⚓︎
***


Your friendly neighborhood boygirl!

⛺︎ My Room
SpaceHey: Friend Me!
StatusCafe: azure
Itch.io: My Games

View Profile WWWArt

Deviantart Llama!Great Posts PacmanFirst 1000 Members!Joined 2023!
« Reply #2 on: March 14, 2025 @819.32 »

For website archival, I like to use HTTrack, which is a free and open-source web crawler that lets you download sites to your computer locally! It was initially released in 1998, but is still being maintained to this day, with the most recent update being in 2024. You love to see it!
Logged

❝ I will walk this path and reach the pinnacle of what Fairy types can do. This is my goal. I've chosen it with my own will. ❞



crazyroostereye
Casual Poster ⚓︎
*


I am most defiantly a Human

⛺︎ My Room
RSS: RSS

View Profile WWW

Joined 2024!
« Reply #3 on: March 16, 2025 @989.34 »

Is there a Reasonable Alternative to the Internet Archive? I love the Archive and wish them great Prosperity, but the biggest Risk of Preservation is not the How the data stored, but who stores it. If the Archive disappears because of whatever reason, what will we do? How much will be lost? Does there exist another Entity that can act as a Parity to the Archive? Is that other Entity Independent enough that If the Archive is violently destroyed, can the other persist?

I am curious if such an Entity already Exists or if something like that can even exist parallel to the Archive. Or if its existence alone a reason for the Archive to go under. As funding must be split somehow between those two then. Which could result in both of them not having enough Money and failing.
Logged

qt
Newbie
*

⛺︎ My Room

View Profile

Joined 2024!
« Reply #4 on: March 18, 2025 @414.93 »

I dig archiving. I keep snapshots of various altchans cooking, and have deeply enjoyed pursuing others collections (and writeups!) covering bits of the net I care about. I come from imageboards and anon-centric spaces :4u: . Things there are very ethereal, threads die fast, communities splinter and fragment, sites die  without warning, and a lot of the culture and history of the space is lost. It's a machine that churns out lost media at breakneck speed. Fortunately there have been numerous attempts to document the history and archive threads and images. Many of these archives go down for extended periods, some never come back, or come back with content missing. Lots and lots of the history is word-of-mouth or poorly sourced, with content going back to the 90s and English versions going back to '02 or '03

As a result of being unable to find stuff I read on archives, wikis, and imageboards from a decade ago I've started archiving archives recently. Mostly focusing on the places that document and archive these spaces directly. Wget is your friend, archiveteam are your guides in this space. A few folks in the space have formed a loose collective and sharing code, disks, and a wiki so all of our contributions can be shared.

If you want to get into archival start by learning to archive. Start archiving. Get some disks. Write some scripts, crawl stuff you care about and save it first! Then find a community of folks doing the same thing and share the burden and learn from their wisdom. An archive that's unknown and impossible to find is of limited use :)


Is there a Reasonable Alternative to the Internet Archive? I love the Archive and wish them great Prosperity, but the biggest Risk of Preservation is not the How the data stored, but who stores it. If the Archive disappears because of whatever reason, what will we do? How much will be lost? Does there exist another Entity that can act as a Parity to the Archive? Is that other Entity Independent enough that If the Archive is violently destroyed, can the other persist?

I am curious if such an Entity already Exists or if something like that can even exist parallel to the Archive. Or if its existence alone a reason for the Archive to go under. As funding must be split somehow between those two then. Which could result in both of them not having enough Money and failing.

No, there's no one out there as large as archive.org. Amongst archivists they're well funded and they've been at it a *really* long time. They were founded by someone with early-days-tech money that knew how to fundraise and never stopped. As of 2021 they have 750 physical servers, 30,000 storage devices (with 20,000 being being hard drives), adding up to over 200 Petabytes of data, growing at a rate of 25% year over year. Reading around the usual sysadmin hangouts enterprise customers that are buying petabytes of storage are paying around $75,000 dollars per petabyte storage node. You could negotiate a volume discount for 200PB I'd imagine :^)

Archive Team are probably the biggest, baddest rouge archivists on the 'net. Even they offload most of their stuff to the Internet Archive. They're a very loose collective, but they have some good folks that have been fighting the good fight (e.g. the textfiles.com guy). They tried for years to grab a significant portion and failed. Storing that much data reliably in a decentralized matter is a really hard problem to solve. Internet Archive themselves are offloading some of it to IPFS and filecoin, but again, decentralization of that amount of data is hard, especially since they seem to be the only real game in town.  The Bibliotheca Alexandrina has a copy up to 2007, but

What you can (and what others are) do(ing) is backup a portion of the collection that you care about. Numerous data harvesters, hoarders, and haulers keep local copies. Digging through the archive team wiki there's a few backups of the wayback machine dating to around
Logged
Pages: [1] Print 
« previous next »
 

Vaguely similar topics! (3)

Website size

Started by RolyBoard ✁ ∙ Web Crafting

Replies: 66
Views: 11037
Last post February 14, 2025 @901.75
by crazyroostereye
Website example page

Started by Icey!Board ✁ ∙ Web Crafting

Replies: 3
Views: 2704
Last post December 16, 2021 @285.10
by cinni
Website status (Check replies for part 2)

Started by Icey!Board ☆ ∙ Projects

Replies: 8
Views: 2965
Last post December 26, 2021 @841.50
by Icey!

Melonking.Net © Always and ever was! SMF 2.0.19 | SMF © 2021 | Privacy Notice | ~ Send Feedback ~ Forum Guide | Rules | RSS | WAP | Mobile


MelonLand Badges and Other Melon Sites!

MelonLand Project! Visit the MelonLand Forum! Support the Forum
Visit Melonking.Net! Visit the Gif Gallery! Pixel Sea TamaNOTchi