By Lawrence Dinga, MSc., CISSP
Corporate data volumes grow by 40% per year on average as businesses across all industries are gathering and storing more and more data on a daily basis. About 80% of this data is unstructured and is always in the form of word processor, spreadsheet and PowerPoint files, audio, video, sensor and log data, or external data such as social media feeds.
By definition, unstructured data refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner, that is to say, unstructured data is information that doesn’t reside in a traditional row-column database. The term big data is closely associated with unstructured data. Big data refers to extremely large datasets that are difficult to analyze with traditional tools. Big data can include both structured and unstructured data, but according to IDC, 90% of big data is unstructured data.
There is as much value in unstructured data if the organization can make fact-based decisions from it. This data is a valuable source of facts for business, legal, and regulatory needs. Gaining better visibility into this data can help the organization manage business risks, protect intellectual property, maintain regulatory compliance, and quickly respond to adverse events such as cyber security breaches. Having all the facts at its disposal, the business can quickly and comprehensively answer the fundamental questions of any matter such as; is the claim true, what is our risk exposure, do we want to pursue it, and what strategy will we take.
However, Most of this unstructured data is stored in hard-to-analyze file types and repositories. Conventional business intelligence (BI) and data warehousing technologies were not designed from the ground up to manage and analyze unstructured information. Suppliers of these technologies have been adding support for unstructured data management to their tool sets, while some IT organizations have built their own platforms for converting unstructured data into structured records, for example, through knowledge management systems. But that can be a time-consuming and expensive process.
For this human-generated and unstructured data to rocket fuel the business, the organization needs to invest on tools that can make this data accessible and searchable. Being that this big data is constantly growing massively, the tool should be capable of processing data from the industry’s widest variety of data formats including enterprise email systems, archives, file shares, hard drives, forensic images, mobile devices and cloud repositories. The tool should have an engine that has unique capabilities to index, search, analyze and extract intelligence from massive volumes of data at the binary level with unmatched speed and forensic rigor.