Process 100 million lines of domains and output the UNIQUE lines.

已关闭 已发布的 Dec 8, 2014 货到付款
已关闭 货到付款

We have a list of domains about 100,000,000 in total, I imagine about 50% are duplicates. We need to process the entire list and remove the duplicates.

The final output should be a list of UNIQUE domains..

I have been processing using EMeditor for the past 48 hours on a an i7 PC with 16GB of RAM, and it's still no where near finished.

We need some massive power to process this.

Please do not bid unless you have worked with data of this size before.

Thanks!

数据处理 Excel 微软 SQL 服务器 MySQL PHP

项目ID: #6841247

关于项目

43个方案 远程项目 活跃的Jan 14, 2015

有43名威客正在参与此工作的竞标,均价$145/小时

sureshdevi

I can remove duplicates from 100 million lines of domains list and output the unique lines in sql or text format. I will complete this work in 2 days. Looking for your reply to start this work immediately.

$79 USD 在2天内
(1203条评论)
8.1
OutsourceMan

Hello, This is vaishali from \"Hire WordPress Experts\" and I am here to help you with Process 100 million lines of domains and output the UNIQUE lines.. We have gone through the information provided by you. and I 更多

$99 USD 在25天内
(378条评论)
8.2
gomcodoctor

I have experience to work with 20 million rows, I have dedicated server located in texas, I can process your data within 24 hours, I have many alternative way to process data. I will not demand any penny if I failed to 更多

$50USD 在1天里
(247条评论)
7.3
CodeAmbassador

hi I'd like to help you on this. I have had to accomplish similar goals before and am ready to provide you with a custom, repeatable process in the form of a bespoke application that will run utilising all the power y 更多

$389 USD 在3天内
(88条评论)
7.5
dusmanija

Hello Sir, I can create this list for you but it will need different approach than you were using. Check my profile to see that people who worked with me are extremely satisfied with results and speed. I have 100% comp 更多

$99 USD 在3天内
(152条评论)
7.2
sobujprantor

Hi ready to work with you if you think you need a good worker then you can hire me...............Thanks.

$277 USD 在3天内
(223条评论)
7.0
wildlily980

Hello I can do it Please provide me the domains list. ......................................................... Best Regards Bill Lee

$100 USD 在3天内
(61条评论)
6.9
greggfletcher

Hello, removing the duplicates won't take more than 48 hours. Regards.

$200 USD 在3天内
(82条评论)
7.0
svteam

Hi, I'm an expert in database and data processing with very good feedback and completion rate. I'm very interesting in your project and willing to do it for $100. I used to process big databases up to 24 millions re 更多

$100 USD 在3天内
(62条评论)
6.2
wilyjose

A proposal has not yet been provided

$200 USD 在3天内
(47条评论)
5.8
binaryromel

Hi there, I'm exert in Database Management. I can do this. Please PM me for further discuss. Thank you, FARZANA PINKY.

$222 USD 在2天内
(117条评论)
6.2
paulcristia

What is the file format of the data ? I propose a custom 3 step approach: 1. filter data, separate files for each letter; 2. sort alphabetically each fille; 3. process each file and keep unique records. Filtering is f 更多

$149 USD 在5天内
(215条评论)
5.7
f1x3r

Hi, I am experienced systems administrator, I worked for a company processing large amounts of similar data (traffic logs for telco) - I performed analysis and reporting of that data in both databases and flat files. W 更多

$80 USD 在3天内
(15条评论)
4.4
razvand70

Hi, expert web/data scraper here with over 17 years experience in programming and RDBMS - please see my reviews. I'm using Perl for this kind of jobs. I'm able to finish fast.

$244 USD 在3天内
(5条评论)
3.5
gperete

Software developer since 2000, specializing in Visual Basic, SQL Server, Crystal Reports, and VBA for MS Office. please send a sample of your list of domains

$100 USD 在3天内
(4条评论)
3.3
smb310570

Just tell me that where these "100 million lines of domains" resides. Either these are in a text file or any other format (please specify). I will make a program / macro that will read this file and remove the duplica 更多

$250 USD 在3天内
(13条评论)
3.5
curver

Hello, I can help you with your project. If it's OK with you, could you share the file containing 100M domains? I can give you the exact number how many unique domains. Thank you.

$222 USD 在2天内
(2条评论)
3.4
honghas

Hi! I am interested in your project. I am a software developer and I am working in data + web analysis so I strongly believe that my abilities fit to your requirements. I look forward to working with you!

$150 USD 在3天内
(1条评论)
3.1
cheungkc

Hi I worked on US Health Care dataset of 20 million rows. 100 million rows not yet. If you could let me work on the file now, I can confirm if my solution work prior to award. If not successful, no charge pleas 更多

$166 USD 在3天内
(6条评论)
3.1
Ogeeon

Greetings! Software developer with expert knowlege of data processing, at your service. Let me code a script for you that will go through your input and output a list of unique domains. I'm ready to start working no 更多

$30USD 在1天里
(2条评论)
3.0