Digital data grows swiftly – at a rate of about 60 percent per year for most businesses. It doubles overall every 18 months. It grows so fast that every now and then we have to invent new words just to describe how much of it there is. So far, the sequence of words (in multiples of 1,000 bytes) is kilobyte, megabyte, gigabyte, terabyte, petabyte, exabyte, zettabyte and yottabyte. Today there are roughly 11 zettabytes of data in the world. At the current rate, we’ll reach a yottabyte of data in 10 years.
If, as IT strategist Warren Cammack famously said, “Data is the new oil,” then perhaps this is a good thing because:
• We will not run out of it. Data just keeps multiplying.
• It’s cheap to store. These days, you can hoard it on a disk for 2¢ per gigabyte.
• It’s valuable. The more you have, the more value you likely can extract from it.
That final point is the raison d’etre for data. It’s valuable when we process it, and we process it all the time. We watch TV, we make phone calls, we surf the Web, we buy and sell things, we travel . . . just about everything we do involves processing data some how, somewhere. But when Cammack proclaimed data to be the new oil, he wasn’t referring to any of that. He was talking about the value of mining massive volumes of data – so-called Big Data.
Connecting Math and Data
Mathematics is no stranger to the world of data. It has been applied in computing to a whole series of problems for decades. However, math is taking center stage in the Big Data revolution. There’s gold in those big mountains of data and that gold can only be extracted with statistical techniques, principally the machine-learning algorithms, although the latest heroes of Big Data are the data scientists – who are, by the way, mathematicians among other things.
This is not to suggest that Big Data technologies are only about data analytics. These systems have also carved out roles in data archiving, image recognition, extract/transfer/load operations (ETL), content management for media, data cleansing, and many other areas. Nevertheless, Big Data analytics is the jewel in the crown, whether you are processing internal log files, unstructured data, or external data collected from the multitude of sources out there on the Web.
Where the Action Is: Analytics
Analytics extracts and distills the value in data, and is fast becoming an imperative for businesses of all sizes. According to a BMO Capital Markets report, the market is spending more than $50 billion on Big Data and advanced analytics. The specific areas of application are many and varied:
• analyzing customer behavior and preferences
• lowering the cost of new customer acquisition
• reducing churn
• improving upselling, cross-selling, and customer targeting
• measuring campaigns more precisely
• allocating investments across sales channels more accurately
• improving channel management, and more.
Analytics is not the only area where Big Data is transforming business but it’s where most of the action is. Research from both market and academic sources suggests that enterprises which include analytics in their operations have up to 6 percent higher productivity rates than their peers do. In many businesses – including brick-and-mortar retail, e-commerce, social media, telecoms, banking, insurance, and healthcare – analytics is now deciding who has the competitive edge.
More, Better, Faster
Since we’ve had data analytics software for decades, you may wonder why Big Data analytics is suddenly such a big deal. One reason is that companies have begun to produce analytics and data management software that is designed to run in a parallel manner, which means it can often run at unprecedented speeds.
Another factor: Computer hardware costs have continued to plummet, making it possible to better exploit memory storage and thus further accelerate software performance. The upshot has been that analytics software can now run at least 1,000 times faster than it once did. This allows companies to run analytics on ever vaster amounts of data.
In short, analytic technology has become a lot less expensive and a lot more efficient, allowing users to do more, better, faster.
Connecting Big Data and Data Algebra
Nevertheless, the question remains as to how to integrate all those zettabytes of data. Clearly, if you can extract real value from a particular heap of data, you can get more value – possibly much more – by combining it with other heaps of related data, which is often there for the asking. The government alone has masses of free data about every topic imaginable. For example, government weather data can be useful for retailers. So can population data and consumer data.
The problem is that there is no common language that can be used to easily integrate data, or there wasn’t until very recently. Early in 2015, Algebraix Data decided to publish the results of a mathematical R&D project it had been working on for more than five years. The company had produced a fundamental advance in applying mathematics to data, which it called “the algebra of data” (aka data algebra).
Data algebra has many uses. One of them is to provide the foundation for a common language for data, because data algebra can express any data structure of any kind.
This provides a very valuable piece of the data-integration jigsaw puzzle. Data algebra’s ability to function as a common language for data will dramatically facilitate data integration. And that will be particularly useful when it comes to sharing data files or databases between organizations or different applications.
Making Data Integration Far Easier do Do
To spread the word, Algebraix commissioned a book on data algebra that I co-authored with Gary J. Sherman, PhD, the mathematician who invented data algebra. Called The Algebra of Data: A Foundation for the Data Economy, the book can be downloaded free on this site. To encourage developers and users to do some hands-on experimenting with data algebra and eventually add to its many applications, Algebraix has also provided open-source access to data algebra in an online Python library; find it on GitHub and PyPi.
Of course, work needs to be done to create the standards that will enable data to be exchanged easily. However, data algebra is now available for everyone to experiment with. Given its existence and its mathematical rigor, it should be possible for IT vendors, users, and developers to agree on standards that will unite all the different ways of storing and accessing data that currently exist.
This is pretty exciting stuff, as every math-savvy developer and IT expert – not to mention every mathematician – will instantly understand.
In my next blog post, I’ll explain what data algebra is and how it specifically applies to the data-sharing problem that has beset the industry for decades.