A Conversation with Ted Bardasz on the EMC Big Data Solution
Attivio is excited to be a part of EMC’s new Big Data Solution. It’s not generally available yet, so we thought we’d have a chat with Ted Bardasz from EMC to give you a look at what this new platform offering is and how Attivio fits in.
Q: Tell me about your role at EMC.
TB: I am the Senior Director of Product Management in the Converged Platform Division, responsible for our Big Data and Native Hybrid Cloud solutions.
Essentially I am responsible for the overall strategy and roadmap for both solutions. I see my job as defining and building these solutions to address the burgeoning market needs for digital business transformation. I do that by weaving together the significant assets of EMC, along with key partners, to bring compelling solutions for Big Data and Cloud Native Application development to our customers in the market.
Q: Tell me about EMC’s Big Data Solution.
TB: The succinct answer is that it’s a fully engineered, turnkey solution that delivers a Big Data platform that empowers organizations to speed “time to business value” of their Big Data initiatives. But we see it as more than that – one of the key ingredients at the core of Digital Transformation – enabling Big Data Analytics based business execution.
The longer answer is that it’s a platform designed to address the full analytics lifecycle. We focus on several key aspects of that lifecycle and work to bring together a well-orchestrated workflow, while addressing some of the key bottlenecks in that workflow.
It’s an open architecture - we deliver the solution with a set of integrated components, but the architecture is open to the inclusion of new or alternative software components, in a manner designed to respond largely to the nascent market of Big Data.
Customers get a choice in terms of emerging analytics capabilities they want to take advantage of particular to their Big Data needs.
Q: What market forces are driving the Big Data Solution initiative?
TB: A recent Forbes.com article stated that “Data and the ability to turn data into business value will become increasingly important in any sector.”
Companies recognize the tremendous amount of data being generated that they have access to and the latent value in that data. They also recognize the complexity in putting together a suitable environment that can extract that latent value and turn it into business value, including the challenges related to finding the resources and managing that environment over time.
The key market force driving EMC’s Big Data Solution is empowering companies to easily recognize the business value from their Big Data assets. It’s essentially a buy versus build proposition – where the end result is a powerful platform supported by a single support line. We’ve been working on this solution for over two years now.
Q: Can you tell us about some of the early adopter successes you see with the Big Data Solution?
TB: I’ll give you a few that support three key reasons for the solution:
First, there’s Pechanga, a casino in Temecula, California, that was looking to understand their customers at a deeper level - from the services and products they bought to how they play. Pechanga uses a Player Card which tracks a lot of what the customer does at the casino and hotel. They wanted to leverage all this information to engage their customers at a deeper level through additional service offers, with a large focus on providing such a personalized experience that they will want to return to Pechanga, vs. the alternatives. Pechanga realized significant benefit from EMC’s Big Data Solution in the form of much deeper customer interaction and getting return business. This is a perfect example of how our Big Data Solution supports improving customer experiences.
The second example is Inovalon, a data service provider to the healthcare and insurance market. Their whole business model is based on their ability to deliver data to that market with additional value-add in terms of the insights they can generate. Big Data is Inovalon’s business. They wanted to unlock all the power of their data scientists and enable them to be focused on delivering the business value Inovalon can bring to the market without having worry about infrastructure or the integration of the required analytics tools or workflows. They also wanted to remove some of the key bottlenecks in the analytics workflow. In this example, it’s about delivering added value on top of the data.
Finally, there’s Sisters Healthcare. Sisters had initial success by running a couple of applications in the cloud. But they wanted to bring these apps in house and build an infrastructure they could support in-house. This change would save them US$ 2 million a year (think about getting rid of provider charges and high costs). In this case, it is about delivering an environment that is easy to maintain and enables a significant reduction in operational expense.
Q: What are the main components of the EMC Big Data Solution architecture?
TB: There three main components in the EMC Big Data Solution portfolio. The first is the underlying infrastructure. Here you have the Data Lake - either Isilon or ECS, which provides a scalable repository for the Data Lake. The next layer EMC’s Big Data Systems, in the form of our Converged Infrastructure (Block, Racks, and Appliances) comes into play. These CI offerings enable us to apply the compute and networking capabilities on top of the Data Lake to host the analytics. Once we had the infrastructure part of the Big Data solution we focused on a couple of key engineered components: the Data Curator, the Platform Manager, and the Data Governor.
So let me tell you about each of these.
Data Curator - 50-80% of a data scientist’s time is finding, blending, wrangling and cleansing data to run their analytics. The Curator is designed to address this need by looking at and alleviating those bottlenecks. It allows the Big Data Solution user to go out and attach and assess data sources (i.e., using Attivio’s semantic data catalog capabilities) so they can sample data sources, figure out how it fits into the analytics model they are building and decide whether they want to bring that data into the system. The ability to go out and rapidly attach data sources and assess them, see how they complete the model and bring them in is a huge advantage of the Big Data Solution. We coupled Attivio’s ability to find and assess data sources with Zaloni, the other key partner we work with in the Curator. Zaloni focuses on providing a workflow engine (Bedrock) for ingesting, blending, and transforming that data into the Data Lake. This is done once the data science professionals determine, through Attivio’s DSD, that it’s a data source they want to bring in. There is a very nice co-ordinated workflow between Attivio and Zaloni that allows customers to land that data in the Data Lake from virtually any internal or external content store – on premise or even in the Cloud.
Data Governor - The market is also looking for capabilities in security, lineage, and quality. In the Big Data Solution, we secure the data in the Data Lake leveraging our partner Blue Talon for deep security. Blue Talon provides role- and attribute-based security, right down to the cell level with complete redaction, obfuscation, and tokenization of sensitive data based on attributes and policies. The lineage portion of the platform leveraged Zaloni. As data is ingested, its lineage is tracked, and the metadata captured. This way, the data scientist working with the data can tell where the data came from, and what was done to it as it came into the Data Lake. It’s very important to understand whether they can trust that data. The third part, quality, is about the completeness and integrity and of the data.
Platform Manager - The Platform Manager is the framework (that presents itself as a portal) that supports the different analytics tools and data sets that data scientists deploy into their workspace. Think about it as the “shopping catalog” for the data science professional to use for populating their secure sandbox area in which they will to do their work.
Q: Are enterprises struggling to bring together unstructured content and structured data?
TB: This is certainly an issue for many organizations. The source and structure (or lack thereof) of the many kinds of data is a challenge not to be taken lightly. Again referring to the Forbes.com article, “So big data isn’t just about the volume of data; it is equally about the variety of data to which we now have access.” From videos to text to machine data to audio files, it is about variety, not just quantity.
There is a tremendous, largely unaddressed opportunity as companies look to bring together their content. One example is Financial Services. Trading activity is structured data, but then there is a wealth of unstructured information that when combined helps support better decisions and helps detect fraud or money laundering activities.
In healthcare, you have clinical data results (structured) and unstructured information such as doctor evaluations and other patient record information. The two need to be correlated to provide the best view of the patient and their care.
Many organizations see the value of both types of information, but don’t see how to close the gap between the two.
Leveraging Attivio to point at the unstructured content enables you to index it, turning it into a structured content repository you can then blend easily with your operational structured content. This enables you to get insights across the two dataset types.
Q: One of the key selling points of the EMC Big Data Solution is its self-service capabilities. How important is self-service as it relates to finding the right data and getting actionable insights?
TB: It’s very complex to bring together the applications and the infrastructure to stand up a Big Data environment. One thing that frustrates a data scientist is that when they have an insight or a hypothesis they want to pursue, there is a delay in having their environment stood up. It’s a lot of filing of tickets with IT and interacting with IT to get that environment set up.
One result is that there is a lot of shadow IT. The data scientist will go out to the cloud and stand up their Big Data environment. The EMC Big Data Solution’s self-service approach will make IT the hero. It gives them a policy-driven environment where they can offer self-service to data scientists, who can populate the data they need and test their ideas.
At the same time, IT also becomes the hero to the business because this self-service capability happens in a policy-driven environment. Quotas, security, and governance are applied to ensure the data scientist isn’t looking at data they shouldn’t, or pushing data to the cloud that’s proprietary to the business.
Summary
Big thanks to Ted for giving us an inside look at EMC’s Big Data Solution. You’ll be hearing much more about it in the coming months. For now, check out Attivio’s semantic data catalog and see why EMC wanted to have it as part of its platform.