Data Innovators Doug Cutting

Published on March 25th, 2014 | by Travis Korte


5 Q’s for Big Data Legend Doug Cutting

The Center for Data Innovation spoke with Doug Cutting, Chief Architect of Palo Alto, California-based software company Cloudera and co-creator of the Hadoop data processing framework. Cutting discussed when businesses should consider an enterprise big data software solution and how he feels about Hadoop’s massive popularity.

This interview has been lightly edited.

Travis Korte: For those who may be unfamiliar, could you give a brief introduction to Cloudera, who uses it, and for what?

Doug Cutting: Cloudera provides software, support and services for enterprise data management. The Apache Hadoop open-source project heralded a new way of storing and processing data. It demonstrated that enterprises could manage their data with greater flexibility at an order of magnitude less cost. The architecture Hadoop delivers is one where, instead of moving data to applications, applications can be brought to the data. Gone are silos, replaced with a general-purpose, centralized storage and compute resource that supports the full range of analysis and processing needs, including SQL, NoSQL, search, streaming, etc.

Cloudera fills the gap between the raw, open-source software and sophisticated enterprise needs. Enterprises need support: someone they can call when things don’t go as expected. Cloudera also provides operational tools to ease installation, configuration, and monitoring of the software. We provide advanced data management tools that secure systems, audit actions, and track data provenance.

We call this combination an enterprise data hub (EDH). It provides an organization with a single place they can confidently manage all of their data, implementing their extract-transform-load (ETL) engine, data warehouse, online archive, production search system, etc. With an EDH, folks can extract much more value from the data their business generates.

TK: What are some of the primary obstacles for organizations hoping to implement large-scale data processing initiatives?

DC: The EDH is a new platform. Lack of familiarity slows adoption. Cloudera offers training courses to help folks through this transition, but there’s still a learning curve. There are still also some missing features and rough edges here and there, as the EDH catches up with prior technologies whose implementations have had more time to mature. Lastly, there’s just inertia. Institutions are reasonably reluctant to fix what’s not broken. Folks should deploy an EDH as they need it, when existing solutions are hampering their business either through inflexibility or cost, not just because it’s the shiny new thing.

TK: Cloudera helps bring Hadoop to the enterprise. Have you, or do you have any future plans to, work with government agencies in the same way?

DC: Cloudera has a substantial government business. We have customers in the defense and intelligence communities as well as in the civilian sector.

TK: Hadoop is now deployed in an impressively broad array of fields. Do you still get surprised by new use cases? Any favorites you’d care to share?

DC: I love finding when products I use are powered by Hadoop. For example, I’m a customer of Netflix, Chevron, and Citibank, who all use Hadoop. Then there are companies that I would never have guessed would come to use Hadoop, like John Deere, Caterpillar & BNSF. Lastly there are those that are just awesome, like Skybox Imaging.

TK: Relatedly, you’ve watched as Hadoop has grown from a humble search utility into a global driver of data processing. Was there a point when you thought “this might be huge,” or did you have that sense all along?

DC: I am as surprised as anyone at how popular Hadoop has become. When Google published its papers I realized that a general-purpose scalable computing platform like they described would be useful to lots of folks outside of Google and that an open-source implementation would be a great avenue to deliver this. But I was mostly thinking about those few folks building search engines and doing academic web research, not the Fortune 1000. I’d not yet realized the extent that technology and the data it generates would permeate nearly every industry, that the web was just the vanguard and that the rest of the economy would soon follow. It’s been quite a ride so far!

Tags: , , , , , , ,

About the Author

Travis Korte is a research analyst at the Center for Data Innovation specializing in data science applications and open data. He has a background in journalism, computer science and statistics. Prior to joining the Center for Data Innovation, he launched the Science vertical of The Huffington Post and served as its Associate Editor, covering a wide range of science and technology topics. He has worked on data science projects with HuffPost and other organizations. Before this, he graduated with highest honors from the University of California, Berkeley, having studied critical theory and completed coursework in computer science and economics. His research interests are in computational social science and using data to engage with complex social systems. You can follow him on Twitter @traviskorte.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to Top ↑

Show Buttons
Hide Buttons