Create a script to crawl website and collect URLs.
$30-100 USD
已完成
已发布超过 17 年前
$30-100 USD
货到付款
I need a script that would crawl a directory section of a website and would collect complete list of TLD listed there. The site is using redirect where the links look like this: [login to view URL],22132 If you would click on this link - it would open a new URL in frames. So I need to extract a final URL and put it into the plain text file - 1 URL per line. You are free to use any scripting language or technology. The script should be able to run from Linux command line. Script should be configurable and throtabble so that we don't get banned for 'slamming' the directory server. Script should accept the following parameters: - starting URL - time is seconds for each retrieval If you are experienced in this type of work - please reply and I will provide you with the actual URL of the directory.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Linux, Apache, MySQL