1st of all - apologies for the change of budget, the project description is totally different - it is just a modification/upgrade of an existing scraper, not a dev of a new one :)
NOT A BIG PROJECT, BUT AN INTERESTING ONE FOR SURE :)
THE WEBSCRAPER IS ALREADY DEVELOPED ACCORDING TO THE INSTRUCTIONS BELOW, BUT NEEDS TO BE UPGRADED (GUI-UX & SOME FUNCTIONS) (you will find the dev files and documentation in the attached zip file). You can also check the project out on GitHub ([login to view URL]). We need it to upgrade it so that it can adapt to the changes of all the websites it needs to scrape data from and also we want to add this website : www (dot) startupblink (dot) com
web scraping tool and gather data for Startups database. It needs to gather data from known sites (more info in attached documents):
Web scraper should be capable to gather basic info as a lead, such as Startup name and some contact information. Possibly startup description and logo
Web scraper should accept an URL parameter (where to scrap for the data) and depth level (how deep scraper should dig, e.g. how many sub-links, sub-sections per URL and whether should scraper go outside specified URL, e.g. follow external links)
In later stage, same web scraper should be capable to be configured to search for additional leads other than startups - such as: investment entities, service providers, etc...
Background and strategic fit.
All scraped info should be saved into two databases (startups max 3 years old), other companies. There should be a simple way to convert the DBs into CSV files.
This web scraping tool should be configured in such a way that admins can insert starting URL and define what are they looking for, among, for example: startups, investment entities, service providers, etc... as well as list of data they are looking for, such as: company(startup) name, contact data, descriptions, and/or other properties.
From tech perspective, the tools should use some already made Web Scraper, regardless of it's tech. stack... There are some pretty cool Java, Python and Node based web Scrapers.
From tech perspective, it must be easily deployable tool not requiring some additional server resources or specific infrastructure stack which would create an overhead. Basically, what ever can be run from a container or similar environment could work for us, for as long as it is not resource-hungry and cost a lot when operating.
When scraping tool is started, it should find required data from specified URL, then check do we already have found data in our databases, and if not, it should save it into our Startup database
1 Starting URL As an operator I want to be able to input starting point (URL) for web scraping MUST HAVE
Operator inputs starting URL for scraping
2 Search params As an operator I want to be able to input parameters I am looking for MUST HAVE
Operator inputs what type of data, properties is looking for, such as: startup name, startup contact data, startup descriptions, startup logo
The params should be added dynamically because they will vary from URL to URL
Each searching param should accept multiple selectors... On some websites Startup name is titled as "startup name" while on others as "company name" or just "name"... We need to be able to define multiple params names and group them into single title.
3 Depth level As an operator I want to be able to input the depth level for my starting URL MUST HAVE
Operator can select depth level for scraping, choosing from dropdown with values "1, 2, 3, 4, 5, any" defining how deep scraper should dig the starting URL
4 Follow External links As an operator I want to be able to choose whether my scraping tool should follow any external links from my starting URL MUST HAVE
Operator choose Yes or No
User interaction and design
The tool need to have very simple interface for the operators and it requires authorization before the tool can be used.
18 威客就此工作平均出价 €303
Hello, there. I'm good at scraping with python script. I already used that source. So no problem to improve UI and other features. Let me know if possible to work. Thanks David Lucas
Hi, there. I have read your description carefully. I am very interested in your web scraper updating project. I have rich experience with web scraping using Python Looking forward to hearing from you. Best wishes.
Hello there, if you are looking for high quality results for your project, we are ready to cooperate. Feel free to contact us in private for more details. Thanks & Kind Regards [login to view URL]