We need to improve performance of a Python app.
It is a simple spider. Collects data and reports them back to primary app. Its task is to visit URLs input by users, crawl them, collect data and report back.
Currently, can handle only a few thousands URL per day. It must be able to handle 1+ million.
Also, URL checking queue gets blocked for whatever reason and that must be debugged as well. Might be Python issue or other.
==============
SKILLS:
==============
Python,
Terminal access
WHM
cPanel
= optional =
PHP
Laravel
Vue
==============
TASK:
==============
- improve performance of the spider
- debug and fix any issues
==============
DEADLINE
==============
11th May 2018
==============
BUDGET
==============
$120
Primary requirement is Python but PHP, Laravel, Vue knowledge would help.
You will be provided access to GitHub to check out the code.
Also test user account is available so you can see it in action.
Also, please mention word "bee" at the start of your reply to prove that you read this far.
Sir/Madam,
I am an experienced Python developer with 2 years of experience in web scraping using selenium, requests and beautiful soup. I can do this project for you. Please go through my profile. I look forward to working with you on this. Have a great day ahead.
Thank you
Yash
Hello, I'm a python specialist with a vast experience in web apis/apps.
Looking at your requirements I can implement a combination of asynchronous/non-synchronous architectures to boost your spiders crawl speed close to your hardware limitations.
I cannot guarantee an exact performance metric as this will depend on server resources and network connectivity. I expect up to 100 fold improvement in speed.
I offer all clients a one month warranty period to resolve any issues that may be discovered after project completion.
Kind regards I look forward to hearing from you.
I have recently built a working high speed spider and can implement a similar version for you very quickly
bee!
You're asking for a vast improvement but I can certainly take a look.
I'd be looking for ways to enable the crawler to run multiple tasks concurrently, that way you can make it run as fast as you like, just add more hardware!