GitHub is introducing rate limits for unauthenticated pulls, API calls, and web access

chaospatterns@lemmy.world · edit-2 2 months ago

GitHub is introducing rate limits for unauthenticated pulls, API calls, and web access

brachiosaurus@mander.xyz · 2 months ago

I have a question: why do lemmy dev keep using microsoft github?

VeryFrugal@sh.itjust.works · 2 months ago

Yeah, shoulda use https://gitflic.ru/

atzanteol@sh.itjust.works · 2 months ago

The enshittification begins (continues?)…

kixik@lemmy.ml · 2 months ago

just now? :)

hackeryarn@lemmy.world · 2 months ago

If Microsoft knows how to do one thing well, it’s killing a successful product.

Boomer Humor Doomergod@lemmy.world · 2 months ago

RIP Skype

adarza@lemmy.ca · 2 months ago

we could have had bob or clippy instead of ‘cortana’ or ‘copilot’

Gork@lemm.ee · 2 months ago

Microsoft really should have just leaned into it and named it Clippy again.

triplenadir@lemmygrad.ml · 2 months ago

It was never named Clippy 😉

henfredemars@infosec.pub · 2 months ago

I came here looking for this comment. They bought the service to destroy it. It’s kind of their thing.

douglasg14b@lemmy.world · 2 months ago

Github has literally never been doing better. What are you talking about??

ZeroOne@lemmy.world · 2 months ago

We are talking about EEE

lolcatnip@reddthat.com · 2 months ago

What has Microsoft extinguished lately? I’m not a fan of Microsoft, but I think EEE is a silly thing to reference because it’s a strategy that worked for a little while in the 90s that Microsoft gave up on a long time ago because it doesn’t work anymore.

Like, what would be the purpose of them buying GitHub just to destroy it? And if that was their goal, why haven’t they done it already? Microsoft is interested in one thing: making money. They’ll do evil things to make money, just like any other big corporation, but they don’t do evil things just for the sake of being evil. It’s very much in their business interest to be seen as trustworthy, and being overly evil runs counter to that need.

varnia@lemm.ee · 1 month ago

Good thing I moved all my repos from git[lab|hub] to Codeberg recently.

DoucheBagMcSwag@lemmy.dbzer0.com · 2 months ago

This going to fuck over obtanium?

Lv_InSaNe_vL@lemmy.world · edit-2 2 months ago

I honestly don’t really see the problem here. This seems to mostly be targeting scrapers.

For unauthenticated users you are limited to public data only and 60 requests per hour, or 30k if you’re using Git LFS. And for authenticated users it’s 60k/hr.

What could you possibly be doing besides scraping that would hit those limits?

Disregard3145@lemmy.world · 2 months ago

I hit those many times when signed out just scrolling through the code. The front end must be sending off tonnes of background requests

Lv_InSaNe_vL@lemmy.world · 2 months ago

This doesn’t include any requests from the website itself

chaospatterns@lemmy.world · edit-2 2 months ago

You might behind a shared IP with NAT or CG-NAT that shares that limit with others, or might be fetching files from raw.githubusercontent.com as part of an update system that doesn’t have access to browser credentials, or Git cloning over https:// to avoid having to unlock your SSH key every time, or cloning a Git repo with submodules that separately issue requests. An hour is a long time. Imagine if you let uBlock Origin update filter lists, then you git clone something with a few modules, and so does your coworker and now you’re blocked for an entire hour.

MangoPenguin@lemmy.blahaj.zone · 2 months ago

60 requests per hour per IP could easily be hit from say, uBlock origin updating filter lists in a household with 5-10 devices.

traches@sh.itjust.works · 2 months ago

Probably getting hammered by ai scrapers

potatopotato@sh.itjust.works · 2 months ago

Everything seems to be. There was a period where you could kinda have a sane experience browsing over a VPN or otherwise using a cloud service IP range endpoint but especially the past 6 months or so things have gotten worse exponentially by the week. Everything is moving behind cloudflare or other systems

adarza@lemmy.ca · 2 months ago

you mean, doin’ what microsoft and their ai ‘partners’ do to others?

Also, as Microsoft appears to have recognized scraping for AI training as a problem, are you seizing your own scraping activities on public code and the larger web or is this a case of double standards?

Ricky Rigatoni@lemm.ee · 2 months ago

Yeah but they’re allowed to do it because they have brazillions of dollars.

Ugurcan@lemmy.world · 2 months ago

They literally own GitHub. Brazillions well spent.

db0@lemmy.dbzer0.com · 2 months ago

The funny thing is that rate limits won’t help them with genai scrapers

tal@lemmy.today · 2 months ago

60 req/hour for unauthenticated users

That’s low enough that it may cause problems for a lot of infrastructure. Like, I’m pretty sure that the MELPA emacs package repository builds out of git, and a lot of that is on github.

hinterlufer@lemmy.world · 2 months ago

I didn’t think of that - also for nvim you typically pull plugins from git repositories

NotSteve_@lemmy.ca · 2 months ago

Do you think any infrastructure is pulling that often while unauthenticated? It seems like an easy fix either way (in my admittedly non devops opinion)

Ephera@lemmy.ml · 2 months ago

It’s gonna be problematic in particular for organisations with larger offices. If you’ve got hundreds of devs/sysadmins under the same public IP address, those 60 requests/hour are shared between them.

Basically, I expect unauthenticated pulls to not anymore be possible at my day job, which means repos hosted on GitHub become a pain.

timbuck2themoon@sh.itjust.works · 2 months ago

Quite frankly, companies shouldn’t be pulling Willy nilly from github or npm, etc anyway. It’s trivial to set up something to cache repos or artifacts, etc. Plus it guards against being down when github is down, etc.

Ephera@lemmy.ml · 2 months ago

It’s easy to set up a cache, but what’s hard is convincing your devs to use it.

Mainly because, well, it generally works without configuring the cache in your build pipeline, as you’ll almost always need some solution for accessing the internet anyways.

But there’s other reasons, too. You need authentication or a VPN for accessing a cache like that. Authentications means you have to deal with credentials, which is a pain. VPN means it’s likely slower than downloading directly from the internet, at least while you’re working from home.

Well, and it’s also just yet another moving part in your build pipeline. If that cache is ever down or broken or inaccessible from certain build infrastructure, chances are it will get removed from affected build pipelines and those devs are unlikely to come back.

Having said that, of course, GitHub is promoting caches quite heavily here. This might make it actually worth using for the individual devs.

lazynooblet@lazysoci.al · 2 months ago

Same problem for CGNAT users

NotSteve_@lemmy.ca · 2 months ago

Ah yeah that’s right, I didn’t consider large offices. I can definitely see how that’d be a problem

Boomer Humor Doomergod@lemmy.world · 2 months ago

If I’m using Ansible or something to pull images it might get that high.

Of course the fix is to pull it once and copy the files over, but I could see this breaking prod for folks who didn’t write it that way in the first place

Xanza@lemm.ee · edit-2 2 months ago

That’s low enough that it may cause problems for a lot of infrastructure.

Likely the point. If you need more, get an API key.

lolcatnip@reddthat.com · 2 months ago

Or just make authenticated requests. I’d expect that to be well within with capabilities of anyone using MELPA, and 5000 requests per hour shouldn’t pose any difficulty considering MELPA only has about 6000 total packages.

Xanza@lemm.ee · 2 months ago

This is my opinion on it, too. Everyone is crying about the death of Github when they’re just cutting back on unauthenticated requests to curb abuse… lol seems pretty standard practice to me.

theunknownmuncher@lemmy.world · 2 months ago

LOL!!! RIP GitHub

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 2 months ago

The Go module system pulls dependencies from their sources. This should be interesting.

Even if you host your project on a different provider, many libraries are on github. All those unauthenticated Arch users trying to install Go-based software that pulls dependencies from github.

How does the Rust module system work? How does pip?

adarza@lemmy.ca · 2 months ago

already not looking forward to the next updates on a few systems.

mesa@piefed.social · 2 months ago

Yeah this could very well kill some package managers. Without some real hard heavy lifting.

irelephant [he/him]@programming.dev · 2 months ago

scoop relies on git repos to work (scoop.sh - windows package manager)

mesa@piefed.social · 2 months ago

UnityDevice@lemmy.zip · edit-2 2 months ago

Compiling any larger go application would hit this limit almost immediately. For example, podman is written in go and has around 70 dependencies, or about 200 when including transitive dependencies. Not all the depends are hosted on GitHub, but the vast majority are. That means that with a limit of 60 request per hour it would take you 3 hours to build podman on a new machine.

Björn Lindström@social.sdfeu.org · 2 months ago

@UnityDevice @sxan it doesn’t apply in that particular case since in Go you’ll by default download those modules through proxy.golang.org

UnityDevice@lemmy.zip · 1 month ago

Oh, that’s nice, TIL. But still, there are other projects that do just directly download from GitHub when building, buildroot for example.

Björn Lindström@social.sdfeu.org · 1 month ago

@UnityDevice for sure, I was just nitpicking that Go projects in particular happens to be protected, at least as long as Google keeps providing that proxy…

Ephera@lemmy.ml · 2 months ago

For Rust, as I understand, crates.io hosts a copy of the source code. It is possible to specify a Git repository directly as a dependency, but apparently, you cannot do that if you publish to crates.io.

So, it will cause pain for some devs, but the ecosystem at large shouldn’t implode.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 2 months ago

I should know this, but I think Go’s module metadata server also caches, and the compiler(s) looks there first if you don’t override it. I remember Drew got pissed at Go because the package server was pounding on sr.ht for version information; I really should look into those details. It Just Works™, so I’ve never bothered to read up about how I works. A lamentable oversight I’ll have to correct with this new rate limit. It might be no issue after all.

Ephera@lemmy.ml · 2 months ago

I also remember there being a tiny shitstorm when Google started proxying package manager requests through their own servers, maybe two years ago or so. I don’t know what happened with that, though, or if it’s actually relevant here…

Sunshine (she/her)@lemmy.ca · 2 months ago

!codeberg@programming.dev

XM34@feddit.org · 2 months ago

Codeberg has used way stricter rate limiting since pretty much forever. Nice thought, but Codeberg will not solve this problem, like at all.

onlinepersona@programming.dev · 2 months ago

What? I have never seen a rate limiting screen on codeberg. Ever. If I click too much on github I get rate limited. It happens so frequently, I use https://sourcegraph.com/search when I have to navigate a repository’s code.

Anti Commercial-AI license

IngeniousRocks (They/She) @lemmy.dbzer0.com · 2 months ago

THIS is why I clone all my commonly used Repos to my personal gitea instance.

douglasg14b@lemmy.world · 2 months ago

That’s actually kind of an interesting idea.

Is there a reasonable way that I could host my own ui that will keep various repos. I care about cloned and always up to date automatically?

IngeniousRocks (They/She) @lemmy.dbzer0.com · 2 months ago

Afict, you should be able to follow the instructions for migrating the repo and it will clone it to your instance and track for updates. It’s been a minute since I’ve read up on it though

SmoothLiquidation@lemmy.world · 2 months ago

I recently switched my instance from gitea to forgejo because everyone said to do it and it was easy to do.

lazynooblet@lazysoci.al · 2 months ago

What were the benefits

emmanuel_car@fedia.io · 2 months ago

Mostly people stopped telling them to do it, I guess 🤷‍♂️

bigkahuna1986@lemmy.ml · 2 months ago

Just browsing GitHub I’ve got this limit

Xanza@lemm.ee · 2 months ago

Then login.

k_rol@lemmy.ca · 2 months ago

Just browse authenticated, you won’t have that issue.

adarza@lemmy.ca · 2 months ago

that is not an acceptable ‘solution’ and opens up an entirely different and more significant can o’ worms instead.

adarza@lemmy.ca · 2 months ago

i’ve hit it many times so far… even as quick as the second page view (first internal link clicked) after more than a day or two since the last visit (yes, even with cleaned browser data or private window).

it’s fucking stupid how quick they are to throw up a roadblock.

John Richard@lemmy.world · 2 months ago

Crazy how many people think this is okay, yet left Reddit cause of their API shenanigans. GitHub is already halfway to requiring signing in to view anything like Twitter (X).

plz1@lemmy.world · 2 months ago

They make you sign in to use search, on code anyways.

goferking (he/him)@lemmy.sdf.org · 2 months ago

Which i hate so much anytime i want to quickly look for something

blaue_Fledermaus@mstdn.io · 2 months ago

The numbers actually seem reasonable…

henfredemars@infosec.pub · 2 months ago

Not at all if you’re a software developer, which is the whole point of the service. Automated requests from their own tools can easily punch through this building a large project even one time.

douglasg14b@lemmy.world · 2 months ago

…

60 requests

Per hour

How is that reasonable??

You can hit the limits by just browsing GitHub for 15 minutes.

blaue_Fledermaus@mstdn.io · 2 months ago

Without login

bitwolf@sh.itjust.works · edit-2 1 month ago

Maybe charge OpenAI for scrapes instead of screwing over your actual customers.

GitHub is introducing rate limits for unauthenticated pulls, API calls, and web access

GitHub is introducing rate limits for unauthenticated pulls, API calls, and web access

Updated rate limits for unauthenticated requests - GitHub Changelog