Aiming at the practical problems a parallel crawler will face to, this paper advances three types of optimization policy for ChaoCrawler, including collision avoidance, URL indexing and DNS caching.
针对并行爬行器所遇到的实际问题,实现了三种优化策略:冲突规避,URL索引和DNS缓冲。
Aiming at the practical problems a parallel crawler will face to, this paper advances three types of optimization policy for ChaoCrawler, including collision avoidance, URL indexing and DNS caching.
针对并行爬行器所遇到的实际问题,实现了三种优化策略:冲突规避,URL索引和DNS缓冲。
应用推荐