• 0 Posts
  • 8 Comments
Joined 11 days ago
cake
Cake day: June 5th, 2025

help-circle
  • Of course, it depends on how you set the crawler, i.e. how deep because of subdomains and how far links to other domains etc. are.

    I crawl 71 pages: My index is currently 4,576,319 documents ( i crawled sites like github too )and occupies just under 14GB

    The results depend on several factors. For example, whether you only use local or p2p. But then it also has a number of settings and you can also explicitly control what is responsible for the results down to the smallest detail. But I have to be honest and say that I haven’t dealt with this at all (especially since it’s a bit complex in some cases) because I want to expand my list of pages to crawl myself first and I only use it locally. I still regularly use duckduckgo to search. However, if you take the time for it you will get what you want in terms of quality of results.

    Ah well, depending on how you set up the crawler, it consumes system resources accordingly. However, you can set and limit the utilization of RAM and storage space. The same goes for network utilization, which is pretty important because otherwise no other connections would be possible besides crawling xD




  • Germany simply wants to be the pioneer in surveillance and would prefer to leave even China behind (there are already future plans for an alternative to the Chinese social credit system). So I was all the more surprised that a no to chat control came or is coming from Germany. But surveillance and censorship are getting worse and worse at the same time as people want to make money from the public’s personal data… We also sovereignly screwed up the EU cloud with lobbying that it is only operated by American companies.
    Well, DNS blocks can be bypassed pretty quickly, but you can already see where it’s all going to end.