Jack Norris, CMO for MapR Technologies, examines how an enterprise data hub provides an alternative to expanding data silos and spurs innovation within larger organisations.
Organisations are increasingly becoming data driven. For commercial enterprises, data is effectively a competitive weapon that underpins innovation and differentiation. Data-driven companies are rapidly gaining market share and big data is no longer a nice to have, but a necessity. Hadoop is at the centre of the big data revolution and is changing how data is stored, processed and analysed. Hadoop represents a new data and compute stack that provides huge operational advantages and is being used to change how organisations compete.
Data across the enterprise is growing quickly both in terms of volume and data types. Historically, new applications and data sources have resulted in the creation of dedicated information silos. Organisations are struggling today with multiple fast growing data sources with machine-generated log files, sensor data and social media as just a few examples. Instead of erecting more specialised processing and analytic silos to deal with this growth, visionaries are instead deploying enterprise data hubs.
Structured and unstructured data
An enterprise data hub provides a nexus for data sources. The hub may contain data from CRM systems, websites and housing systems as well as external data such as social media, and a myriad of other unstructured data including text and video.
One of the initial uses for an enterprise data hub is to offload processing and data storage from more expensive systems. For example, a data hub can act as an offload area for the Extract, Transform and Load (ETL) processes that prepare data for analysis within data warehouses. Instead of loading large volumes of raw data into a data warehouse and performing complex transforms there (an ELT process), significant speed and cost savings can be realised by performing the transformations directly on the Hadoop cluster.
Offloading to Hadoop
Additional savings are realised by offloading ‘cold’ data from a data warehouse. The typical cost per terabyte of data contained in a data warehouse is £10,000 and more. In contrast, data can be offloaded to Hadoop for a few hundred pounds, as long as the Hadoop platform has the requisite data availability and protection features so that the data can be stored long-term with confidence.
An enterprise data hub can also support a range of analytics that are performed directly on the data. In essence, Hadoop allows you to load all these different data sets into an expandable cluster of servers and then distribute computational, analysis or indexing workloads across the different servers and data sets. These are applications that combine operational processing as well as analytics to solve a pressing business problem.
Business intelligence & behavioural insights
For example, the digital marketing intelligence provider ComScore, uses Hadoop to process over 1.7 trillion internet and mobile records every month. ComScore uses this data to produce reports that allow its clients to gain behavioural insights into their mobile and online customers. The move to Hadoop removed several key bottlenecks resulting in a tenfold increase in computation speed.
Another example is Cisco which uses Hadoop as part of its business intelligence processes across large, globally distributed data sets, including structured and unstructured information. The complete infrastructure solution focused around Hadoop lets Cisco analyse service sales opportunities in a tenth of the time and at a tenth of the cost, and to generate $40 million in incremental service bookings in the last year.
It’s no accident that organisations such as ComScore and Cisco have invested in Hadoop as the basis for platforms capable of delivering new insights. The tangible cost benefits, reduction in complexity and ability to scale are all key benefits of the data hub and compelling reasons to examine the Hadoop technology.
Jack Norris is the CMO of MapR Technologies.