Big Data Profiling – Finding and Understanding Information at Scale

You’ve got Big Data? Sure, everyone’s got it. But how many organizations have indexed it all? The volume, variety, and velocity of data in a typical enterprise has by far outpaced the ability to catalog it in an orderly, easy-to-retrieve fashion. Along comes automated big data profiling.

Your data lake needs a survey, along with the data warehouse and all the silos. Many enterprises “see” only about 10 percent of their data. The other 90 percent is hidden, dark. It goes unused because it’s too difficult and time-consuming to comb through the dark data and find the connections.  

Naturally, everyone looks to IT. Why don’t they have a master index of all the organization’s data? Probably because most big data profiling is done manually, and that’s slow going. What’s also slow is the line of business users out the door who are just waiting for data sets.

From Data Chaos to Semantic Metadata Catalog

Whether your data is in a data warehouse, a data lake, on file servers, or a mashup of systems, you need an automated system to make sense out of the chaos. Otherwise, all of your information has only untapped potential, and you are missing opportunities.

Automated data profiling by Attivio does what manual processes can never do: it indexes data at scale. It spiders every one of the organization’s Hadoop clusters, data warehouses, and even the silos. It sees the difference between database records — things like customer names and phone numbers — and also data with hard-to-recognize structure. It gathers metadata as it goes and adds its own to form a semantic metadata catalog, a searchable index that translates DBA-speak to business user-friendly vocab.

Big Data Profiling - Making Data Accessible

You say tomato, I say to-mah-to. Boston. Bahston. It’s all about understanding the meaning behind the word, or the data. A column name to a DBA probably doesn’t have any meaning to a business user building data dashboards. And the business user probably doesn’t know how to create SQL queries to find the data that can help answer a business question.

For Attivio, big data data profiling is about creating a semantic understanding of the data, so the data can be used to solve a business problem. Humans help the machine learn. Active learning augments machine learning with human-tagged data to guide the machine through choices that machines alone don’t make well. That new data lets machine accuracy fine tune itself.

At last, the road-blocked business user who asks “where’s that data?” can actually get an answer as easily as ordering it on Amazon — fast and in plenty of time to go into any meeting a hero.

