When evaluating analytics and Business Intelligence solutions, people often ask whether the software supports unstructured data.

I have a standard reply to this:

“In analytics there is no such thing as unstructured data, just data that structure has not yet been applied to”.

Now, I’m not just being a smart-ass (although that’s probably a part of it). You really can’t do any analysis — in the traditional sense — directly on unstructured data. The analysis you’re looking to do will be on meta-data that’s associated with or derived from the unstructured data in question.

Structured data

The simplest way to describe structured data is that it is any data that would make sense to organize and display in a table or a set of connected tables such as a relational data model. Structured data can be multi-dimensional, hierarchical, or highly complex, making it hard to conceptualize as tabular per se, but nevertheless having clearly defined attributes that describe and define entities in the model.

Anything you will logically put or view in a two-dimensional data table, e.g. in a spreadsheet or a database program is by definition structured data.

Some will refer to data formats such as XML and JSON or tagged collections of documents as semi-structured data, but for the intents and purposes of analytics, these are typically well structured sources that lend themselves nicely to analysis.

Unstructured data

Unstructured data is essentially everything that does not fit the description of structured data above.

There are several types of unstructured data that people want to analyze, such as:

So, do you support unstructured data?!

As explained above, what people call “analysis of unstructured data” is in fact analysis of structured meta-data about unstructured data assets. Associated meta-data is already structured and lends itself well to analysis. So is derived meta-data, but it may be harder and more expensive to obtain.

Any analytics software can analyze this data. In most analytics software, there are relatively straight forward ways to read associated meta-data, and to obtain simpler kinds of derived meta-data such as text-length or file size.

But if you’re looking for more sophisticated derived meta-data such as sentiment analysis, language detection, or image recognition you’ll want to look into dedicated software that can apply this meta-data. In some cases it is useful if this software is nicely integrated with your analytics software, but in most cases the structured meta-data will land in a database or a data-lake before it is ever imported into analytics software.