Find Jobs
Hire Freelancers

Hadoop Spark/Scala Project

$30-250 USD

已完成
已发布超过 6 年前

$30-250 USD

货到付款
Homework A) Using Hadoop hdfs & Spark-scala programming Source dataset: [login to view URL] download data for 1999,2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 1) Download and combine all data for the years specified about 2) Data Cleanup: Find and remove /filter out outliers & bad data 3) Perform statistics analyis on the data: counts /averages /sums / min /max 3) Using spark/scala programming on the entire dataset, what percent (%) is a) on-time flight b) cancelled flight c) Delays flights d)TOP 5 Causes of delays e) Most causes of flight delays f) Airlines with the most delays to a destination g) Airline with the most cancellations h) Airline with the most on-time i) Flight on-time / delays and cancellation national averages J) Perform some visualization in Tableau (Send me output data file,I will do visualisation myself) K) All of the above Code in a separate PDF file B) Create 10-15 pages (in word) to include the following topics: 1) Data source 2) Description the data and its schema 3) Data pre-processing required (parsing, filtering, etc.) 4) Any bad data issues encountered 5) Describe Your Spark algorithm 6) Describe any other ecosystem or additional tools used 7) Describe the output 8) How did you verify that your output is correct? 9) discuss the Performance/scale characteristics 10) what would you have done differently if you did this again? 11) Draw a conclusions from this excercise Please NOTE: This must be your original work. Someone else code cannot be copied from online and used in this project. Doing so will cause you an F grade in this course Deliverable Timeline: 1) Code in separate document -- Deliver by NOV 25 2) Documentation (10-15 pages in word) -- Deliver by NOV 27 3) Output dataset file --- Deliver by NOV 30 Deadline: NOV 30 for all of the above NB: Your personal hadoop cluster or I can provide access to cloud based hadoop cluster with data files already download onto HDFS folder
项目 ID: 15652432

关于此项目

6提案
远程项目
活跃6 年前

想赚点钱吗?

在Freelancer上竞价的好处

设定您的预算和时间范围
为您的工作获得报酬
简要概述您的提案
免费注册和竞标工作
颁发给:
用户头像
I am a data scientist and have experience with Big data Technologies like Spark and Hadoop. I have previously worked with this dataset and can complete all your given task before date specified. Relevant Skills and Experience Spark, Hadoop, NoSQL, Scala, Python, R, Tableau Proposed Milestones $180 USD - Project Milestone
$135 USD 在3天之内
4.9 (11条评论)
4.8
4.8
6威客以平均价$168 USD来参与此工作竞价
用户头像
Hi, I am java expert and have experience on Big data, so this kind of data processing will be done perfectly. Hope to discuss with you. regards, Relevant Skills and Experience Java Proposed Milestones $133 USD - Init
$133 USD 在5天之内
5.0 (2条评论)
2.8
2.8
用户头像
I have worked with spark and made recommendation system earlier so I will be able to fulfill your task
$55 USD 在10天之内
0.0 (0条评论)
0.0
0.0

关于客户

UNITED STATES的国旗
North saint Paul, United States
5.0
29
付款方式已验证
会员自2月 16, 2011起

客户认证

谢谢!我们已通过电子邮件向您发送了索取免费积分的链接。
发送电子邮件时出现问题。请再试一次。
已注册用户 发布工作总数
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
加载预览
授予地理位置权限。
您的登录会话已过期而且您已经登出,请再次登录。