Screen scrape data (search results) into Postgres database

$30-250 USD

已完成

已发布

超过 9 年前

$30-250 USD

货到付款

I would like to create a list of results from an online search program, that covers the entire database (i.e. screen scraped). The results would retain the same unique IDs and fields as in the search results. They should be inserted into a PostgreSQL database and the script will run on Linux under cron. You must be comfortable with Python on UNIX/Linux (not Windows please). The form whose results are to be scraped is here [login to view URL]

PostgreSQL

Python

UNIX

项目 ID: 6620944

关于此项目

11提案

远程项目

活跃10 年前

想赚点钱吗？

电子邮箱地址

在Freelancer上竞价的好处

设定您的预算和时间范围

为您的工作获得报酬

简要概述您的提案

免费注册和竞标工作

颁发给：

@nitelfreelance

Hi It's easy. I'm a python developer from 14 years ago. I just have an Ubuntu on my machine and I have done many projects on PostgreSQL. The only problem of this project is that you need to give a query to search. To solve this problem and retrive all data we can search for family names from "aa", "ab", to "zy", "zz". (The minimum length of family name to search is 2). Regards Iman

$225 USD 在1天之内

4.9

(29条评论)

5.9

11威客以平均价$229 USD来参与此工作竞价

@anuyadav1

i am well experienced with python scraping on linux.

$200 USD 在3天之内

4.8

(58条评论)

5.9

@dabing1205

I am an expert in web/scrapy, and also interested in your project. Please contact me to discuss more requirement details, Thanks!

$222 USD 在5天之内

5.0

(13条评论)

4.6

@cheapexcell

Ready to start ===================================================================================================

$240 USD 在4天之内

4.8

(8条评论)

3.9

@Darflow

La propuesta todavía no ha sido proveída

$200 USD 在3天之内

4.9

(18条评论)

3.7

@bsoist

I have years of programming and web development experience. I am fluent in Python and succeed at parsing website data in cases where many other programmers struggle. I know you have a strong preference for *nix - and so do I, so I completely understand - BUT in my experience scraping data from a site like the one you linked here is MUCH easier done with a dedicated Windows box using Python to script the Windows COM. That's my experience and I know if might not work for you, but I thought I'd mention it. I can provide detail on how you could use an AWS instance of something to run such a solution. If you are certain you cannot - or do not want to - go that route, I can scope out the site you linked and see how much it would cost to do it using *nix. I know I can do it, but I'll need to spec out a couple of things ( and I know it will cost more than $250 ). You will not be disappointed with my work. U.S. location, Eastern timezone, milestone required.

$277 USD 在3天之内

5.0

(2条评论)

2.2

@ggenellina

So you want the site's whole database to be queried, one page at a time. It may take a while, and due to the site limit of 50 results per query, must be done with refining steps (see note below). I've done this kind of page scraping in the past; I'd use Python + beautifulsoup. One or two working days should be enough to finish it. I will deliver: * the requested tool, a Python script * complete source code in Python * a setup program * and of course, complete support for setting up and using it Note: The site limits any query results to 50 records. So one has to refine the query terms until such size limit is not hit anymore. That is: 'ab' returns more than 50 records, try 'aba' (again more than 50), refine again 'abaa' (no results), 'abab' (2 results), 'abac' and so on. In this scenario, we cannot guarantee completeness, at least theoretically. Consider the case when there are more than 50 records with 'Smith': only the first 50 will be retrieved, but I cannot refine the query more than that.

$200 USD 在2天之内