database cleaning/merging/deduplication & fuzzy matching

已完成 已发布的 Jan 18, 2014 货到付款
已完成 货到付款

I have a DataBase I'm building (excel) that contains records from many different sources. 77k rows and 50+ columns in total.

I would like to condense it by unique address but keep all the other unique data cells in the rows.

This will require some type of fuzzy matching as the duplicate addresses are not all 100% exact, ie:

300 Water Street suite #3 | Portland | Oregon

300 Water Street | Portland | Oregon

300 Water St | Portland | Oregon

The above examples would all be the same record. Each row may have different corresponding data in the columns that needs to be condensed into one row.

I have normalized the data as much as I can using my limited excel skills and powergrep. I have made sure the states, cities and abbreviations are all consistent for easier duplicate recognition.

I estimate that there is probably 20k actual unique addresses, which is what this should be condensed to, but keeping all the unique cells. making a very rich data set at the end.

I'm not sure if Excel can handle this type of project perhaps you have a better solution using sql or VBA Access or some other db manipulation/deduplication tool.

Let me know via PM how you would best tackle this.

Big Data Sales 数据库管理 Excel 微软Access MySQL

项目ID: #5335522

关于项目

18个方案 远程项目 活跃的Feb 8, 2014

授予:

MDavidCrompton

I have been developing applications in both Access and Excel for 20 years with extensive use of VBA. I have developed several applications for Freelance clients, please see Feedback and examples in Portfolio. I am UK b 更多

$110 USD 在3天内
(39条评论)
5.6

有18名威客正在参与此工作的竞标,均价$139/小时

paris2785

VB, VBA and Databases expert for over a decade. Master in Information Systems. I have delivered similar projects in the past. Please check https://www.freelancer.gr/projects/Data-Processing-Excel/data-translation-e 更多

$78 USD 在5天内
(152条评论)
6.8
tzo

Hello, can help you on this. Using some common tools is not really the best way for it so need to do some custom scripts exactly for this project.

$147 USD 在3天内
(146条评论)
6.3
truongngocthanh

Dear Sir, I can import all your data to a mysql database and process it and filter the duplicate data. I can do it right now for you. Best regards Thanh.

$111 USD 在2天内
(57条评论)
6.3
diamond247

Hello Sir, We are a well built set up with excellent skilled operator with lot of experience in this segment/skill,have complete more than 200 similar job, i have gone through your project description, its really a 更多

$250 USD 在5天内
(150条评论)
6.6
srinichal

I like to discuss more details about the project and deliver the relevant tools to your needs .

$252 USD 在3天内
(38条评论)
6.3
vikas0903

Hi, Approach regarding your Project: I believe it can be done in excel. We may have to run the data matching code multiple times with slight variations in key words. My Background: I have worked as pricing analy 更多

$98 USD 在2天内
(38条评论)
5.7
teeares

Ihave done this type of work before, but may be not as large. I have easily handled upto 16000 records and 15 columns. If Excel can handle it surely, I can. I understand that duplication must be recognised only by the 更多

$100 USD 在10天内
(50条评论)
5.0
Venicebrooks

Greetings, I have taken note of your request to clean a database in your position. I can do that for you since been a software developer I can write a module to achieve your objective. I have been doing this kind of 更多

$100 USD 在3天内
(8条评论)
5.0
happycharle

A proposal has not yet been provided

$111 USD 在3天内
(10条评论)
3.5
vovo4ka

Hello, i'm very interested in this work, since I have a good experience and knowledge working with big database. There are few possible ways to merge, sort data: in the example you showed it might be possible to use 更多

$66 USD 在3天内
(3条评论)
3.2
TechJSolutions

Dear Palmweb, Let me help you, I will use SQL and querying will be easier. Could you send the whole data? I will send the result sample. Thanks

$56 USD 在2天内
(15条评论)
3.3
EfficientIrish

A proposal has not yet been provided

$277USD 在1天里
(0条评论)
0.0
maranaxsl

Hi there, Thank you for placing this project. We belong to Microsoft Partner Network and we have over 15 years experience in MS SQL Server, MS Access, Crystal Reports. We provide 30 days free of charge support on al 更多

$222 USD 在3天内
(0条评论)
1.5
xiddw

Hi, I've previously worked in a similar project for matching similar strings and condense them, so I have experience in this particular task. Also, I've over two years of experience using R. I've strong backgrou 更多

$100 USD 在4天内
(0条评论)
0.0
hi4ppl

A proposal has not yet been provided

$155 USD 在3天内
(0条评论)
0.0
qianshen

Hi, I have done a 3 million patent case last month on Hadoop with Pig to collected all citing for each patent over 10 years. I think your case is somewhat similar. I am quite interested to use excel file as data sourc 更多

$111 USD 在3天内
(0条评论)
0.0