Find Jobs
Hire Freelancers

Save all content of webpages (including FRAMES) - 20871

$200-300 USD

已关闭
已发布大约 12 年前

$200-300 USD

货到付款
The project is designed to create a piece of software that will save all content of webpages (including FRAMES) for any given list of URLs. Basically, the software should do the following: • After execution, it should ask the user to paste a list of URLs from Excel • For each URL, it should save the full contents (including content of all FRAMES) of the page located at that URL into a separate folder on the hard drive Now the full details: • The software has to be Windows-based • It can be written using any programming language • The most important requirement is the ability to save ALL contents, in particular content of FRAMES. It should save all files separately (related css files, images, html file and javascript files) – it should basically save it “faithfully” – just as browsers see it (please see the following note) • For this reason, it might be easier (might not be – we don’t know the best way and this is just an option to consider) to create this software in a form of a Google Chrome extension or a Mozilla Firefox add-on, because both Chrome and Firefox can save all contents of pages as they displays them – with frames, images, etc. (Chrome’s default “Save As” does that, while Firefox uses another add-on – “Mozilla Archive Format” – to save pages “faithfully”). However, we are not sure if Chrome and Firefox have any disk write APIs, so this might not work. For your own testing purposes, it might be a good idea to compare the results with the way Chrome saves pages. • The software must have the following adjustable parameters: o Minimum pause between processing next URL (in seconds) – MIN_WAIT o Maximum pause between processing next URL (in seconds) – MAX_WAIT o Download folder (folder on the hard drive) • This is how the software should work: o User starts the software o The software asks for a list of URLs o It should be capable of accepting lists of up to 10,000 URLs o We need to make input easy. We produce links in Excel, so we should simply select a range of cells with URLs (in one column), copy them and paste them into the software. o Then we should be able to set two pause parameters – MIN_WAIT and MAX_WAIT - min and max pause between finishing processing one URL and moving on to the next one. For example, MIN_WAIT =2sec, MAX_WAIT =10sec. Then for each URL that the software is about to load, it should wait a random amount of seconds between the MIN_WAIT and MAX_WAIT number of seconds before attempting to open and save it. o Then we should be able to select the download folder. By default, the software should remember previous choice. o Then we should hit a “start” button and for each URL the software should do the following:  Create a new folder for the contents of this URL within the Download folder. The individual folder’s name should follow this format: “YYYY-MM-DD-HH-MM-SS”, which is basically the time of creation.  Save all contents of this URL into this individual folder.  Add a line to the program log (see below).  Generate a random number of seconds between MIN_WAIT and MAX_WAIT and wait that number of seconds before moving on the next URL o Logging. The software should maintain a log file (text file) of all URLs that have been processed. For each URL it should save one line of text using the following format: “YYYY-MM-DD-HH-MM-SS: URL” – the timestamp should be same as the timestamp in the folder name for any given URL o The software must be able to work “quietly” – either in the tray or (if part of a browser) in the taskbar. Basically, it shouldn’t pop up for each URL or anything like it – the user should be able to use the PC for other tasks while the software is running. o Finally, the software should have a line with progress text to show that, for example, “120 or 1500 URLs processed”. o There should also be a button to stop processing URLs. On click, the software should cancel processing the current URL and stop. • The code must be comprehensively commented – if not every line, but every few lines to explain what they do • The deliverables are: o Uncompiled code with full instructions on how to compile it and run it (including all external libraries or modules that will be used to support this software) o Compiled version of the software
项目 ID: 16886305

关于此项目

6提案
远程项目
活跃12 年前

想赚点钱吗?

在Freelancer上竞价的好处

设定您的预算和时间范围
为您的工作获得报酬
简要概述您的提案
免费注册和竞标工作

关于客户

UNITED KINGDOM的国旗
Sunbury, United Kingdom
0.0
0
会员自3月 18, 2012起

客户认证

谢谢!我们已通过电子邮件向您发送了索取免费积分的链接。
发送电子邮件时出现问题。请再试一次。
已注册用户 发布工作总数
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
加载预览
授予地理位置权限。
您的登录会话已过期而且您已经登出,请再次登录。