Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Tuesday, October 2 • 3:30pm - 4:00pm
What Do Real-Life Hadoop Workloads Look Like?

Sign up or log in to save this to your schedule and see who's attending!

 Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important new workloads have emerged which feature many small, short, and increasingly interactive jobs in addition to the large, long-running batch jobs for which MapReduce was originally designed. These new workloads have not yet been described. We fill the gap with an empirical analysis of MapReduce traces from six separate business-critical deployments inside Facebook and at Cloudera customers in e-commerce, telecommunications, media, and retail. Our key contribution is a characterization of new MapReduce workloads which are driven in part by interactive analysis, and which make heavy use of query-like programming frameworks on top of MapReduce. These workloads display diverse behaviors which invalidate prior assumptions about MapReduce such as uniform data access, regular diurnal patterns, and prevalence of large jobs. A secondary contribution is a first step towards creating a TPC-like data processing benchmark for MapReduce.

 


2012 Keynote & Breakout Sessio...
avatar for Yanpei Chen

Yanpei Chen

Software Engineer, Cloudera
Yanpei Chen works at Cloudera, a leading vendor of enterprise Hadoop. He leads their efforts in customer workload management, performance optimization, and large scale testing. Yanpei holds a doctorate from the University of California, Berkeley, where he worked on workload-driven design and evaluation of big data systems in general, and had industrial collaborations with several different companies.  | |


Tuesday October 2, 2012 3:30pm - 4:00pm
Conference Theater Registration Floor - Grand Hyatt Hotel