It’s a common problem. People writing bot scrapers for public data, which costs a lot of bandwidth for the public resource, when they could have easily just downloaded the entire dataset from a dedicated link. Finding better ways to tell them “Hey, morons, go download the goddamn ZIP file with all of the data!” saves on that bandwidth and web server CPU.
Company I worked for resorted to just detecting and blocking all bots, which sometimes translated into some funny support calls. “Why can’t I just break your TOS and have bots run wild against your data?!”
It’s a common problem. People writing bot scrapers for public data, which costs a lot of bandwidth for the public resource, when they could have easily just downloaded the entire dataset from a dedicated link. Finding better ways to tell them “Hey, morons, go download the goddamn ZIP file with all of the data!” saves on that bandwidth and web server CPU.
Company I worked for resorted to just detecting and blocking all bots, which sometimes translated into some funny support calls. “Why can’t I just break your TOS and have bots run wild against your data?!”