Causes for the limited database availability
Some of you might have noticed issues with accessing the Vulkan Hardware Database since end of July. This was caused by a targeted denial-of-service (DoS) attack in combination with ever-increasing rampant AI bot crawling.
Note the spike in July
That lead my hosting company to take precautionary measures to avoid negative effects to their own services. One such measure was to block access from certain countries, esp. those where the DoS and AI crawlers originated from.
As a result a lot of people were no longer able to even access the database and were instead presented with HTTP 520 errors. This also caused issues for the client application not being able to check or upload reports.
Taking counter measures
I’ve spend a good chunk of the last two months trying to resolve this by adding in several counter measures and traffic has returned to normal. While analyzing traffic I noticed a huge spike in AI bots trying to crawl every link of my database. At times 95%+ traffic was caused by those bots and since all views are fetching data from a database, that did put a huge strain on the database. Just fo reference, one day an AI bot from a single company did more than 100,000 requests, each of those ending up with one or more database queries.
Those of you that have websites might think “why don’t you use a robots.txt”? Well, I do (and have been for years), but with the internet becoming a scraping ground for AI companies many of these no longer really care for something like the robots.txt, even bigger companies. And on top of those there are AI crawlers that simply don’t care at all. One such new crawler actually did read the robots.txt and then just decided to ignore it and do another 50,000 requests per day.
At time, the database had to deal with roughly a ten-fold increase in requests.
After blocking all those bots and putting the page behind Cloudflare, page hits returned to normal again. As a result my hosting company lifted the geo blocks and the database should be available to anyone again.
I have also started working on code-based things to take load of the database, which hopefully result in some general database performance improvements in the long run.
Counter measures don’t come for free though, so the site (and all others on gpuinfo.org) will likely load slower than before.
Feedback
If you are still seeing issues with accessing the database, please open an issue at the github repository.