Atrax Mercator web spider crawlor clone

已取消 已发布的 Apr 2, 2009 货到付款
已取消 货到付款

web spider crawlor clone Hello, I am looking for serious high performance search engin spider crawlorexpert software engineer. I am looking for a software engineer who has already worked and has experience in this kind of high performance spider cralor only. This means I am looking for a qualified expert only with refferences in the crawling industry!!! I need a crawlor clone of Atrax software and suplemented why not with Atrax software: extension to Mercator and Mercator software that - combines several Mercators - URL hashing, and off-line URL chec Atrax is a distributed crawler written in C++ and Python, which is composed of a "crawl manager", one or more "downloaders" and one or more "DNS resolvers". Collected URLs are added to a queue on disk, and processed later to search for seen URLs in batch mode. The politeness policy considers both third and second level domains (e.g.: [url removed, login to view] and [url removed, login to view] are third level domains) because third level domains are usually hosted by the same Web server. Tools/languages for implementation: • Scripting languages (Python, Perl) • Java (performance tuning tricky) • C/C++ with sockets (low-level)

C 编程 工程 Java MySQL Oracle PHP 项目管理 Python 软件构架 软件测试

项目ID: #3781132

关于项目

1个方案 远程项目 活跃的Apr 24, 2009

1 威客就此工作平均出价 $1700

samia21

See private message.

$1700 USD 在14天内
(3条评论)
0.0