Thursday, November 29, 2007

In Support of the Relational Model

Every few years, a wave of people claim that the relational model falls short of modeling data structures of modern applications. This includes some really smart people recently such as:

- Jim Starkey
- Brian Aker
- MIT researchers

This trend of criticizing the relational model started almost immediately after E. F. Codd published his seminal paper, "A Relational Model of Data for Large Shared Data Banks" in 1970.

Some data modeling tasks are just complex by nature, and it is necessarily a difficult, time-consuming analysis to figure out how to structure the storage for a given application in a manner that is both logically complete and also efficient.

The complexity does not originate from the technology; the complexity originates from nontrivial real-world scenarios, which are full of complexity, inconsistency, and special-case exceptions. It isn't that the database modeling technology is insufficient, it's that the job is genuinely hard.

This is an example of what I call a Waterbed Problem. When you're lying on a waterbed, and you push down with your hand to make part of the bed lower, the water inside is displaced and rises in some other area. Even if you get a lot of friends to help you with the waterbed (okay this is starting to get racy but don't worry), the water is still mostly non-compressible, so naturally no matter how hard you push, you only displace the water, you can't make it shrink.

The waterbed analogy applies when we try to simplify a computer organization problem. We "push down" on the complexity of the task by inventing new methods for describing the problem. Sometimes this leads to nifty things like domain-specific languages. But generally there is a trade-off. We make one part of the problem easier to solve, at the cost of making another part of the problem less flexible, or harder to solve. The analogy is that the complexity rises in some other area, in response to our attempt to push the complexity down in the area we're working on currently.

We might even create a simple solution for the problem at hand, but the new solution is simply unable to address some related problems. This might be analogous to a leak in the waterbed, but probably I've exhausted this analogy by now.

Flexibility of the data modeling solution is not the only criterion. It must also support rigidity where needed. If you can't define the structure of your data model enough to disallow invalid data, then you're asking for trouble. If the data management is just a pool of data elements and relationships, and any enforcement of structure is dependent on application logic, then your data integrity is at risk of bugs in that application, or susceptible to invalid changes made via a different client application.

I believe the relational model strikes a good balance between flexibility and rigidity, and that's why it has been a good choice for general-purpose data modeling for so long. Other techniques have advantages in certain areas, but they always have a balance of disadvantages in other areas.

There is no magic bullet that can make the data modeling process easier. Not object databases, not hierarchical databases, not semantic databases, not network databases. Different data modeling paradigms can describe the data relationships differently, but they can't make the problem simpler.

To respond to the smart folks who say that the relational model is inadequate, I concede that the relational model is the worst form of DBMS except for all those other forms that have been tried from time to time (with apologies to Winston Churchill).

No comments: