Entity Integrity
Entity integrity means that each object represented by a row in the table should be distinguishable from any other object. In other words, there should be such a set of attributes whose unique combination of values would allow telling one object from another. Such an attribute set unambiguously identifying the object is called a candidate key. It can’t contain NULL marks because it won’t let us identify the object (like, say, a book with an unknown title).
A table can have several candidate keys. For instance, a person can be identified by her/his passport number, insurance policy number, VATIN, driver’s license number, etc.
To enforce entity integrity, SQL lets you define the PRIMARY KEY and UNIQUE specifications. A table can only have one primary key, but several unique indexes. That is, you can define the PRIMARY KEY specification for one of the candidate keys, and UNIQUE for the rest of them.
What can serve as the primary key in our case? Since a maker can produce several models and thus turn up in the Product table data several times, the maker column isn’t suitable for the role of the primary key. Similarly, the type attribute isn’t unique either.
The only attribute having no duplicates is the model code, and this is our sole contender to be the primary key. There are no other candidate keys in the table. To proof that, we can scrutinize all other column combinations and demonstrate they don’t ensure object identification. In particular, the value combination of the three columns in the example above is unique, yet fails to identify model 1232 nevertheless.
Let’s create a primary key. SQL allows changing the structure of an existing table by means of the ALTER TABLE statement. However, to avoid too much information at one time, we’ll just re-create the table, that is, delete it and create it anew with a primary key. Deleting a table isn’t difficult (destroying is easy, building is hard, as the saying goes) – you just need to carry out the DROP TABLE <table name>
statement. So then,
DROP TABLE Product;
create table Product (
maker varchar(10),
model varchar(50) PRIMARY KEY,
type varchar(50)
);
We included the primary key specification into the column definition. However, we could achieve the same effect by using a separate constraint:
CONSTRAINT <constraint name> PRIMARY KEY (<list of columns making up the primary key>)
Our code will then look like this (again, we delete the previously created table first):
DROP TABLE Product;
create table Product (
maker varchar(10),
model varchar(50),
type varchar(50),
CONSTRAINT product_PK PRIMARY KEY (model)
);
Now, the DBMS itself will make sure the primary key doesn’t have any duplicate values or NULL marks. If we try to add data using the INSERT statement from before, we’ll get the following error message: :
If we correct the input error by providing a proper model code for the printer
insert into product values
('A', '1232', 'PC'),
('A', '3001', 'Printer'),
(NULL, NULL, NULL);
we’ll get another error caused by the uncertainty of the primary key value:
The statement
insert into product values
('A', '1232', 'PC'),
('A', '3001', 'Printer'),
(NULL, '2000', NULL);
will work fine.
As you probably have noticed, the error message about the violation of the primary key constraint refers to the constraint name (‘product_PK’). But what happens if no name is specified? This has been the case in our first table definition, as we included the PRIMARY KEY specification into the model column definition. By the way, it’s also possible not to specify the constraint name in the second table definition (then, the CONSTRAINT keyword has to be omitted, as well):
DROP TABLE Product;
create table Product (
maker varchar(10),
model varchar(50),
type varchar(50),
PRIMARY KEY (model)
);
Why is it so important for us to know the constraint name? In addition to being helpful in finding out the cause of an error, this name is required for removing the corresponding constraint when we change the structure of an existing table. If we don’t specify a constraint name, the DBMS assigns one on its own. And we can learn this name, which must be unique within the scope of the database, from the information schema view – the standard method for obtaining metadata. E. g., the primary key constraint name created by the last script can be retrieved from a table in the information schema view using a regular query:
select CONSTRAINT_NAME
from INFORMATION_SCHEMA.TABLE_CONSTRAINTS
where TABLE_NAME='Product' and CONSTRAINT_TYPE ='primary key';
On my system, the constraint name is PK__Product__0B7E269E30F848ED. In all likelihood, you will get a different name, because it is generated by the DBMS.
An example of a compound primary key is the primary key in the Outcomes table (Ships database). Here, only the pair of columns {ship, battle} can be unique, because a ship can take part in more than one battle, and there are several ships participating in a battle. However, the same ship can’t be mentioned in a specific battle more than one time, and that’s exactly what the primary key will forbid to do. The Outcomes table with the aforementioned primary key can be created as follows:
create table Outcomes (
ship varchar(50),
battle varchar(20),
result varchar(10),
primary key(ship, battle)
);
It should be mentioned that the following notation
create table Outcomes (
ship varchar(50) primary key,
battle varchar(20) primary key,
result varchar(10)
);
is incorrect, since there is only one PRIMARY KEY specification possible.
If a table can have only one primary key, what should we do when our data model requires several attribute combinations not to have duplicate values? In other words, how do we create alternative keys if, e. g., we need to add a column named out_id with unique values to the Outcomes table?
For this purpose, SQL provides the UNIQUE specification. The statement for creating the Outcomes table with the additional column named out_id could look like this:
create table Outcomes (
ship varchar(50),
battle varchar(20),
result varchar(10),
out_id int,
primary key(ship, battle),
UNIQUE (out_id)
);
As mentioned before, several UNIQUE constraints can be defined for a table.
There is another aspect this constraint differs from PRIMARY KEY. The column UNIQUE is applied to can contain a NULL mark, but only one. You may ask – what about object identification in this case? Let’s recall the example with the books being sought for by unique title. Since only one NULL mark is allowed to exist, we can find the book with the unspecified title by eliminating all books with known titles.
Should we need an alternative key not allowing NULL marks, we can achieve this by applying, in addition to UNIQUE, the NOT NULL constraint. That’s what we’re now passing on to.