Find Jobs
Hire Freelancers

Merge, Normalize, Remove Duplicates - 3 very large data sets (over 1million records each)

$30-250 USD

已关闭
已发布超过 8 年前

$30-250 USD

货到付款
I have 3 large data sets in Excel. Each file has over 1 million records. The files are similar, but not exactly alike (different columns). I'll provide how to map the columns to for merging the data. I'll also provide how to normalize the data once merged. Lastly I'll need duplicates removed. I'll need the data packaged in Excel files, CSV files, Access files. Data sets will be provided to willing bid. Thanks!
项目 ID: 8389701

关于此项目

25提案
远程项目
活跃9 年前

想赚点钱吗?

在Freelancer上竞价的好处

设定您的预算和时间范围
为您的工作获得报酬
简要概述您的提案
免费注册和竞标工作
25威客以平均价$144 USD来参与此工作竞价
用户头像
Coursera Data Science & R Certified Successfully Completed Freelance Project https://www.freelancer.com/jobs/r/programming-Hadoop/ https://www.freelancer.com/jobs/project-7399544/ (Hadoop pig and impala queries) • Applying Analytics using R Programming Language. RHadoop rmr2,rhdfs,rhbase • Applied Time Series Analytics(Arima) for Oil Client for Predecting Oil Production from Oil Plant. DataMining on Facebook , LinkedIn, twitter accounts • 6 years of IT experience in BI & Big Data Hadoop DWH Solutions for Banking , Oil & Gas domain. Data Streaming expertise using Apache Kafka, apache hadoop, apache spark, apache storm, exp on Big Data ,Hapdoop & R ,Apache Suites like Solr , HIVE , HBASE , CouchDB ,MongoDB,Redius,Neo4j, Kafka integration on Hadoop and Ubuntu • Data Mining LinkedIn, Facebook, Twitter • Expert in ETL Tools such as Informatica , SAP BODS , Pentaho, Talend • Excellent analytical and programming skills(Java / Python /C++) with a good understanding at the conceptual level and possess excellent presentation, interpersonal skills with a strong desire to achieve specified goals along with excellent communication skills. • Building Information Views ,Stored Procedures, Triggers, Materialized Views, Cursors, Partitioning, Exception handling, Optimization on DB likes Oracle SAP HANA. • Expertise in Software Development by applying SDLC practices
$111 USD 在3天之内
4.9 (5条评论)
4.7
4.7
用户头像
Hi I have more than 3 years of experience of developing big data application. I have developed many projects using hadoop, hive, pig, zookeeper open source technologies. Please elaborate your job so that I will finalize the bid amount and time. I am open to meet your time line and budget.
$250 USD 在3天之内
5.0 (2条评论)
2.7
2.7
用户头像
Dear Sir, Greetings from RaajVeer! I understand your job and ready to start immediately on your terms and on your budget. I have experience working on excel and normalization of data using functions and macros; I gain this experience working on so many email marketing project and email marketing list. I would like to invite you for a quick discussion. Sharing View and feedback on the same is highly appreciated. Awaiting For your response. Thanks, RaajVeer S. Tomar, Quick Search The Web Dominators
$147 USD 在3天之内
5.0 (1条评论)
2.1
2.1
用户头像
Hello, I have experience with processing large files (tens of gigabytes) and managing large databases (over 1 TB). I also have experience with tabular data formats (CSV, XLS, ODS). I usually parse and process files using Python. Can you give me some samples or the whole files so I can get a better idea of what is required? Best wishes, Ionut
$150 USD 在3天之内
5.0 (1条评论)
1.1
1.1
用户头像
I am professional database developer who has a certified degree in Computer Sciences. I have been dealing ETL process of huge size of databases since the start of my career.
$140 USD 在3天之内
5.0 (1条评论)
1.1
1.1
用户头像
Hi there! Have done this type of work before! Please feel free to ping me with additional information or with questions if any! Thanks! -Steve
$125 USD 在2天之内
0.0 (0条评论)
1.1
1.1
用户头像
I can do that. I mean you will be telling me which column are identical in the files and how you wanna to normalize them. I can extract the duplicate entries and give you the unique entry file in any format you want.
$200 USD 在3天之内
0.0 (0条评论)
0.0
0.0
用户头像
I was working with veyr big files (csv, excel) and I have worked to normalize this type of files, I can send the file wherever you want
$155 USD 在3天之内
0.0 (0条评论)
0.0
0.0
用户头像
Hi, I have more than 14 years of VBA/Excel exp and I am expert in this kind of work. I have just completed something similar a month ago. I have completed more than 250 projects. Please look at the feedback left by my employers to know more about my work. Waiting for your positive response. Thanks.
$100 USD 在3天之内
0.0 (0条评论)
0.0
0.0
用户头像
I am currently employed as a Sr. Quality Analyst for a large mining company in North America. I currently administrate production and cost databases for my company as well as create and administrate my own databases for my company. I am well versed in manipulation of very large datasets from different databases and spreadsheets.
$111 USD 在4天之内
0.0 (0条评论)
0.0
0.0
用户头像
I have the expertise in handling large amounts of data in Excel by using scripts for merging and removing duplicates based on the conditions that you provide. I have experience of handling such data in the past. Ready to start work immediately
$155 USD 在3天之内
0.0 (0条评论)
0.0
0.0
用户头像
I have experience relevant to this such as data migration, use ETL tools for data integration, manipulation,dump to file formats such as CSV,txt,excel. Happy to discuss.
$222 USD 在10天之内
0.0 (0条评论)
0.0
0.0
用户头像
Hi, I have almost 8 years of oracle pl/sql development experience .I worked as developer in Nucleus S/W Exports ltd company in Noida, India where i have worked on projects in banking domain (for SCB , ABN AMRO, Bank of Bahrain) wherein I dealt with export and import of huge files using different methods - sql loader , bulk insert, external tables . Using my previous experience , I can certainly work on your project requirement. Please share further details about the project. Rgds Purnima Chopra
$111 USD 在5天之内
0.0 (0条评论)
0.0
0.0
用户头像
I work with a database that gets a billion new records every day so I do not expect to find problems with tor data sets. What I plan to do is to load the records in an Oracle database, male there the normalization and merge and then download the results in csv format, using Microsoft tools to get tor other desired formats. I expect normalization and merge rules to be at most of medium complexity and very clear from your side. If the dataset includes special characters price will increase 20e and if you aparece me the Access formar William decrease 10e. Do not hesitate to contact me for further details.
$111 USD 在4天之内
0.0 (0条评论)
0.0
0.0
用户头像
Lets start. Having a team of Professionals. We provide high quality work with accuracy. Would you like to discuss more about your current requirements?
$100 USD 在3天之内
0.0 (0条评论)
0.0
0.0
用户头像
Hi. I have 2 years of BigData expirience and I worked with different frameworks and technologies in this branch. Also I had a lot of data transformation tasks during work on porjects. I will do this job just for minimal costs in order to rate my profile, because I don't have any reviews on such projects on this site. Also, please, clarify me the point "I'll need the data packaged in Excel files, CSV files, Access files.". Do I need to save result in all of this formats or just in one of them?
$30 USD 在1天之内
0.0 (0条评论)
0.0
0.0
用户头像
I'm graduated system analyst and own over 4 years of experience in the area. I work focusing on quality and customers, finding the best solutions, always.
$166 USD 在5天之内
0.0 (0条评论)
0.0
0.0
用户头像
I work with large sheets like this a lot for my regular job. I was just merging a 2 million row sheet and a 1 million row sheet the other day. Removing dups is easy. Based on what I know now, should be pretty straight forward.
$111 USD 在2天之内
0.0 (0条评论)
0.0
0.0
用户头像
Hello, I have access to few very powerful servers, I think that the best option would be to unzip this Excel and work on file level, creating instances to work on some part of this file, create normalized sets and merge them, then sort by each column and remove duplicates. But I would have to see this sets to know what I'm working on. I'm not scared of number of records but number of columns...
$147 USD 在3天之内
0.0 (0条评论)
0.0
0.0
用户头像
I am a software engineer with more than 10 years of experience in the industry. I am new one on this site, and I need first projects, so this could be a great opportunity for both of us. I am very proficient in databases and with the Excel, but Big Data is my passion. For this kind of task I usually use R programming language which gives me easiness of data manipulation, and a variety of outputs.
$120 USD 在3天之内
0.0 (0条评论)
0.0
0.0

关于客户

UNITED STATES的国旗
Sarasota, United States
5.0
280
付款方式已验证
会员自3月 22, 2007起

客户认证

谢谢!我们已通过电子邮件向您发送了索取免费积分的链接。
发送电子邮件时出现问题。请再试一次。
已注册用户 发布工作总数
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
加载预览
授予地理位置权限。
您的登录会话已过期而且您已经登出,请再次登录。