What is data algebra, exactly? This question probably instantly came to mind when you first heard the words “data algebra.”
It is not a simple question to answer if you’re not mathematically inclined – and if you’re not, I’m writing this blog post for you. So I’m going to assume that you have a reasonable knowledge of software and data; otherwise you likely wouldn’t be reading this article at all. Still, I’ll cut out as much of the math as I can.
But before I take mathematics totally off the table, you need to know this: Data algebra is a bona fide algebra that can be used to describe and manipulate computer data. Created by a seasoned mathematics professor, it is based entirely on classical set theory. The math is described in The Algebra of Data, a book that I co-wrote with Gary J. Sherman, PhD, who created data algebra. (The book is available on Amazon or as a free download here.)
This is not the first time anyone has tried to apply algebra to data. A previous attempt was made by E.F. Codd, the computer scientist whose ideas gave birth to the relational database and the now-ubiquitous SQL language for getting at data. Unfortunately, Codd’s effort was limited in scope and, from a mathematical perspective, it went astray anyway.
But let’s not worry about that. Instead let’s use the idea of a relational database as a vehicle for explaining data algebra.
Relational Tables and Algebra
The relational model for managing data thinks of individual items of data as attributes. For example, “name: John.” The “name” is a metadata description of the value “John.” Rows in a table consist of sets of such attributes. A simple row might be: “name: John, age: 30, gender: male.” Once you have more than one row, you have a table with metadata column headings:
Now, to represent such data using data algebra, you would do it in a similar way. The basic algebraic unit of data is a couplet. “Name: John” is an example of a couplet. As the term suggests, couplets have two elements, in this case a value and description. In data algebra, a table row would be an example of a relation, or a set of couplets. “Name: John, age: 30, gender: male” is a relation. In data algebra, a table would be an example of a clan, which is a set of sets.
By using couplets, sets of couplets, and clans (sets of sets of couplets), you can represent the tables of a relational database. There is more to relational database than that, of course, and there is much more to data algebra. But now that we have a starting point, we can expand on what data algebra is.
Data as a Computer Sees It
Let’s focus on the nuts and bolts: data held in memory. Down in the guts of your computer are stored bits of data with a specific memory address. This is a couplet of the form “value: memory-address.” The value has a type, such as integer, float, character string, and so on. If you include this, you get ((value: memory-address): type). So your original couplet becomes part of a couplet that includes type. The value may also have a metadata tag to describe its meaning. If you include it, you get (((value: memory-address): type) metadata).
So now your first couplet has become part of a second couplet that is part of a third couplet. And naturally, you can have sets of such couplets (relations) and sets of sets of them (clans).
Don’t worry if this seems a little complicated. I’m not asking you to manipulate these algebraic representations. I’m just pointing out this: You can use data algebra to represent data in the same way that a computer stores data in its memory. And that means you can write programs that treat data algebraically at any level.
Data Algebra Can Dance
Tables were both the virtue and the curse of relational databases. Their virtues were that they were relatively easy to understand visually and that representing data as tables turned out to be entirely adequate for many kinds of applications.
But there were curses, too. Tables have turned out to be hopeless at representing graph data or nested data (such as documents) or irregular data structures (such as objects) or semantic data (such as the meaning of text). Even in the occasional situations where such data can be forced into tables, the resulting representation is confusing rather than useful.
This is where data algebra can dance. I’m not going to illustrate this for fear of throwing you off with math symbols, so you’ll have to take my word for it (or read the book, which shows it’s provably correct). But trust me: All of these different data structures (graphs, nests, objects, etc.) can be represented algebraically and, when the need arises, can also be manipulated algebraically, changing them from one useful form to another.
The Lingua Franca for Data
If you add up what I’ve described so far, data algebra has the following virtues:
• It can represent data held in tables in relational databases.
• It can equally well represent irregular data, such as might be found in some files on your PC or in XML files.
• It can represent graphical data, where related data items are linked together to form meaningful graphs.
• It can represent the semantics (the meaning) in text.
• It can represent nested structures.
• It can represent data in any way that a computer needs to represent it.
Take all these virtues together and what you have is a lingua franca – a common language – for describing all data in any of the myriad structures in which it can be stored or manipulated.
If you want to know what makes data algebra important, it is exactly that. It provides a common language for data. And that is something the world has needed for decades.
To spread the word, Algebraix commissioned a book on data algebra that I co-authored with Gary J. Sherman, PhD, the mathematician who invented data algebra. Called The Algebra of Data: A Foundation for the Data Economy, the book can be downloaded free on this site. To encourage developers and users to do some hands-on experimenting with data algebra and eventually add to its many applications, Algebraix has also provided open-source access to data algebra in an online Python library; find it on GitHub and PyPi.