Thursday, May 21, 2009

EAV FAIL



The photo above illustrates (by counter-example) an important characteristic of a normalized database: each logical "type" of attribute belongs in a separate column.

Just because three values happen to be numeric doesn't mean it makes sense to SUM() them together. But if dissimilar attributes are stored in the same column, it's tempting to treat them as compatible in this way.

This also shows a fallacy of the Entity-Attribute-Value antipattern. In this design, all attribute values are stored in a single column.

CREATE TABLE EntityAttributeValue (
entity        VARCHAR(20) NOT NULL,
attribute     VARCHAR(20) NOT NULL,
value         VARCHAR(1000) NOT NULL,
PRIMARY KEY (entity, attribute)
);

INSERT INTO EntityAttributeValue (entity, attribute, value)
VALUES
('New Cuyama', 'Population',          '562'),
('New Cuyama', 'Ft. above sea level', '2150'),
('New Cuyama', 'Established',         '1951'),

SELECT SUM(value) FROM EntityAttributeValue
WHERE entity = 'New Cuyama';


The Entity-Attribute-Value design does not support or conform to rules of database normalization.

To be clear, the proper way to design a database is to put different attributes in different columns. Use column names, not strings, to identify the attributes.

CREATE TABLE Cities (
 city_id          SERIAL PRIMARY KEY,
 city_name        VARCHAR(100) NOT NULL,
 population       INT UNSIGNED NOT NULL,
 feet_altitude    SMALLINT UNSIGNED NOT NULL,
 year_established SMALLINT UNSIGNED NOT NULL
);