Find Jobs
Hire Freelancers

automation app

$250-750 USD

已关闭
已发布将近 12 年前

$250-750 USD

货到付款
Here is the outline of how the script will work: For a given input file, see attached Some key information is missing for certain records, such as telephone, company name, url, email address, and address. This script will attempt to find this data and output it. See attached output file. All input and output files will be in CSV format. The script will use 14 ip addresses to send searches to google and bing. A W9 will be required for the developer who gets selected. Specifics of the logic: 1. take anybody that does not have a company name and do an address search. a. Do the search with google, and take the title tag out of the top 10 results. b. Do the same search on Bing and take the title tag from the top 10 results. c. Pattern match the page titles and it should give a pretty unanimous company name 2. Take pattern matched company name, if company name was empty, if not then use the company name we already had. Take the company name and full address and google it. Street names are off on many of the examples. So we would strip out by removing 's, directions and street extensions: This search gives us all kinds of results so we have to score these results: a. go to the home page of each of the pages in the top 10 results. Page analysis: i. the name of the business should be located on this page ii. the name of the business should be located in the title tag b. If the name of the business is located on both, on page and in title then we can be pretty sure this is the website of the company. c. If the company name is not on either go to the next site in the top 10 until we achieve the pattern match. d. Once this sub routine is complete, now we have the URL of the company. See below for example of search. From this pdf in the results it would find the url of the business: 3. get email a. Search for name, email and Url. b. Use common business email address structures and do pattern matching for these anywhere in the resulting pages: i. firstlastname @[login to view URL] ii. [login to view URL] @[login to view URL] iii. first_lastname @[login to view URL] iv. First initial last name @[login to view URL] If one of these structures are found, success, move on to #5. If it is unsuccessful move on to #4. 4. Email search: a. As the regular search for email did not work, now we do the reverse: b. This produces many searches looking for the right name. if a match is found the match is graded: i. If the match is on the company website, +10 ii. If match is in a pdf, +5 iii. If match is in a PPT, +5 iv. If the match is both found by google and bing, +5 The match with the highest grade is the one the script will use. 5. Phone search a. Google search for name, phone, actual email address. b. This usually returns some type of result that has “phone:” on the page, from this we will parse the page pulling back all the digits to the right of the word “phone:” Thanks for your time.
项目 ID: 1680725

关于此项目

1条提案
远程项目
活跃12 年前

想赚点钱吗?

在Freelancer上竞价的好处

设定您的预算和时间范围
为您的工作获得报酬
简要概述您的提案
免费注册和竞标工作
1威客以均价$500 USD来参与此工作竞标
用户头像
I can do it.
$500 USD 在5天之内
5.0 (3条评论)
3.4
3.4
用户头像
I can do it. Regards.
$750 USD 在15天之内
0.0 (0条评论)
0.0
0.0

关于客户

UNITED STATES的国旗
Sacramento, United States
5.0
1
付款方式已验证
会员自3月 19, 2012起

客户认证

谢谢!我们已通过电子邮件向您发送了索取免费积分的链接。
发送电子邮件时出现问题。请再试一次。
已注册用户 发布工作总数
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
加载预览
授予地理位置权限。
您的登录会话已过期而且您已经登出,请再次登录。