Integrity Constraints
In the previous section we discussed the structural part of the relational data model. As
stated in Section 2.3, a data model has two other parts: a manipulative part, defining the
types of operation that are allowed on the data, and a set of integrity constraints, which
ensure that the data is accurate. In this section we discuss the relational integrity constraints
and in the next chapter we discuss the relational manipulation operations.
We have already seen an example of an integrity constraint in Section 3.2.1: since every
attribute has an associated domain, there are constraints (called domain constraints) that
form restrictions on the set of values allowed for the attributes of relations. In addition,
there are two important integrity rules, which are constraints or restrictions that apply to
all instances of the database. The two principal rules for the relational model are known as
entity integrity and referential integrity. Other types of integrity constraint are multiplicity,
which we discuss in Section 11.6, and general constraints, which we introduce in
Section 3.3.4. Before we define entity and referential integrity, it is necessary to understand
the concept of nulls.
Nulls
Null Represents a value for an attribute that is currently unknown or is not applicable
for this tuple.
A null can be taken to mean the logical value ‘unknown’. It can mean that a value is
not applicable to a particular tuple, or it could merely mean that no value has yet been
supplied. Nulls are a way to deal with incomplete or exceptional data. However, a null is
not the same as a zero numeric value or a text string filled with spaces; zeros and spaces
are values, but a null represents the absence of a value. Therefore, nulls should be treated
differently from other values. Some authors use the term ‘null value’, however as a null is
not a value but represents the absence of a value, the term ‘null value’ is deprecated.
For example, in the Viewing relation shown in Figure 3.3, the comment attribute may be
undefined until the potential renter has visited the property and returned his or her
comment to the agency. Without nulls, it becomes necessary to introduce false data to
represent this state or to add additional attributes that may not be meaningful to the user. In
our example, we may try to represent a null comment with the value ‘-1’. Alternatively,
we may add a new attribute hasCommentBeenSupplied to the Viewing relation, which
contains a Y (Yes) if a comment has been supplied, and N (No) otherwise. Both these
approaches can be confusing to the user.
Nulls can cause implementation problems, arising from the fact that the relational model
is based on first-order predicate calculus, which is a two-valued or Boolean logic – the
only values allowed are true or false. Allowing nulls means that we have to work with a
higher-valued logic, such as three- or four-valued logic (Codd, 1986, 1987, 1990).
The incorporation of nulls in the relational model is a contentious issue. Codd later
regarded nulls as an integral part of the model (Codd, 1990). Others consider this approach
to be misguided, believing that the missing information problem is not fully understood,
that no fully satisfactory solution has been found and, consequently, that the incorporation
of nulls in the relational model is premature (see, for example, Date, 1995).
We are now in a position to define the two relational integrity rules.
ntity Integrity
The first integrity rule applies to the primary keys of base relations. For the present, we
define a base relation as a relation that corresponds to an entity in the conceptual schema
(see Section 2.1). We provide a more precise definition in Section 3.4.
Entity integrity In a base relation, no attribute of a primary key can be null.
By definition, a primary key is a minimal identifier that is used to identify tuples
uniquely. This means that no subset of the primary key is sufficient to provide unique
identification of tuples. If we allow a null for any part of a primary key, we are implying
that not all the attributes are needed to distinguish between tuples, which contradicts
the definition of the primary key. For example, as branchNo is the primary key of the
Branch relation, we should not be able to insert a tuple into the Branch relation with a null
for the branchNo attribute. As a second example, consider the composite primary key of
the Viewing relation, comprising the client number (clientNo) and the property number
(propertyNo). We should not be able to insert a tuple into the Viewing relation with either
a null for the clientNo attribute, or a null for the propertyNo attribute, or nulls for both
attributes.
If we were to examine this rule in detail, we would find some anomalies. First, why
does the rule apply only to primary keys and not more generally to candidate keys, which
also identify tuples uniquely? Secondly, why is the rule restricted to base relations? For
example, using the data of the Viewing relation shown in Figure 3.3, consider the query,
‘List all comments from viewings’. This will produce a unary relation consisting of the
attribute comment. By definition, this attribute must be a primary key, but it contains nulls
(corresponding to the viewings on PG36 and PG4 by client CR56). Since this relation is
not a base relation, the model allows the primary key to be null. There have been several
attempts to redefine this rule (see, for example, Codd, 1988; Date, 1990).
Referential Integrity
The second integrity rule applies to foreign keys.
Referential If a foreign key exists in a relation, either the foreign key value must
integrity match a candidate key value of some tuple in its home relation or the
foreign key value must be wholly null.
For example, branchNo in the Staff relation is a foreign key targeting the branchNo attribute
in the home relation, Branch. It should not be possible to create a staff record with branch
number B025, for example, unless there is already a record for branch number B025 in the
Branch relation. However, we should be able to create a new staff record with a null branch
number, to cater for the situation where a new member of staff has joined the company but
has not yet been assigned to a particular branch office.
General Constraints
General Additional rules specified by the users or database administrators of
constraints a database that define or constrain some aspect of the enterprise.
It is also possible for users to specify additional constraints that the data must satisfy. For
example, if an upper limit of 20 has been placed upon the number of staff that may work
at a branch office, then the user must be able to specify this general constraint and expect
the DBMS to enforce it. In this case, it should not be possible to add a new member of staff
at a given branch to the Staff relation if the number of staff currently assigned to that branch
is 20. Unfortunately, the level of support for general constraints varies from system to
system. We discuss the implementation of relational integrity in Chapters 6 and 17.
In the previous section we discussed the structural part of the relational data model. As
stated in Section 2.3, a data model has two other parts: a manipulative part, defining the
types of operation that are allowed on the data, and a set of integrity constraints, which
ensure that the data is accurate. In this section we discuss the relational integrity constraints
and in the next chapter we discuss the relational manipulation operations.
We have already seen an example of an integrity constraint in Section 3.2.1: since every
attribute has an associated domain, there are constraints (called domain constraints) that
form restrictions on the set of values allowed for the attributes of relations. In addition,
there are two important integrity rules, which are constraints or restrictions that apply to
all instances of the database. The two principal rules for the relational model are known as
entity integrity and referential integrity. Other types of integrity constraint are multiplicity,
which we discuss in Section 11.6, and general constraints, which we introduce in
Section 3.3.4. Before we define entity and referential integrity, it is necessary to understand
the concept of nulls.
Nulls
Null Represents a value for an attribute that is currently unknown or is not applicable
for this tuple.
A null can be taken to mean the logical value ‘unknown’. It can mean that a value is
not applicable to a particular tuple, or it could merely mean that no value has yet been
supplied. Nulls are a way to deal with incomplete or exceptional data. However, a null is
not the same as a zero numeric value or a text string filled with spaces; zeros and spaces
are values, but a null represents the absence of a value. Therefore, nulls should be treated
differently from other values. Some authors use the term ‘null value’, however as a null is
not a value but represents the absence of a value, the term ‘null value’ is deprecated.
For example, in the Viewing relation shown in Figure 3.3, the comment attribute may be
undefined until the potential renter has visited the property and returned his or her
comment to the agency. Without nulls, it becomes necessary to introduce false data to
represent this state or to add additional attributes that may not be meaningful to the user. In
our example, we may try to represent a null comment with the value ‘-1’. Alternatively,
we may add a new attribute hasCommentBeenSupplied to the Viewing relation, which
contains a Y (Yes) if a comment has been supplied, and N (No) otherwise. Both these
approaches can be confusing to the user.
Nulls can cause implementation problems, arising from the fact that the relational model
is based on first-order predicate calculus, which is a two-valued or Boolean logic – the
only values allowed are true or false. Allowing nulls means that we have to work with a
higher-valued logic, such as three- or four-valued logic (Codd, 1986, 1987, 1990).
The incorporation of nulls in the relational model is a contentious issue. Codd later
regarded nulls as an integral part of the model (Codd, 1990). Others consider this approach
to be misguided, believing that the missing information problem is not fully understood,
that no fully satisfactory solution has been found and, consequently, that the incorporation
of nulls in the relational model is premature (see, for example, Date, 1995).
We are now in a position to define the two relational integrity rules.
ntity Integrity
The first integrity rule applies to the primary keys of base relations. For the present, we
define a base relation as a relation that corresponds to an entity in the conceptual schema
(see Section 2.1). We provide a more precise definition in Section 3.4.
Entity integrity In a base relation, no attribute of a primary key can be null.
By definition, a primary key is a minimal identifier that is used to identify tuples
uniquely. This means that no subset of the primary key is sufficient to provide unique
identification of tuples. If we allow a null for any part of a primary key, we are implying
that not all the attributes are needed to distinguish between tuples, which contradicts
the definition of the primary key. For example, as branchNo is the primary key of the
Branch relation, we should not be able to insert a tuple into the Branch relation with a null
for the branchNo attribute. As a second example, consider the composite primary key of
the Viewing relation, comprising the client number (clientNo) and the property number
(propertyNo). We should not be able to insert a tuple into the Viewing relation with either
a null for the clientNo attribute, or a null for the propertyNo attribute, or nulls for both
attributes.
If we were to examine this rule in detail, we would find some anomalies. First, why
does the rule apply only to primary keys and not more generally to candidate keys, which
also identify tuples uniquely? Secondly, why is the rule restricted to base relations? For
example, using the data of the Viewing relation shown in Figure 3.3, consider the query,
‘List all comments from viewings’. This will produce a unary relation consisting of the
attribute comment. By definition, this attribute must be a primary key, but it contains nulls
(corresponding to the viewings on PG36 and PG4 by client CR56). Since this relation is
not a base relation, the model allows the primary key to be null. There have been several
attempts to redefine this rule (see, for example, Codd, 1988; Date, 1990).
Referential Integrity
The second integrity rule applies to foreign keys.
Referential If a foreign key exists in a relation, either the foreign key value must
integrity match a candidate key value of some tuple in its home relation or the
foreign key value must be wholly null.
For example, branchNo in the Staff relation is a foreign key targeting the branchNo attribute
in the home relation, Branch. It should not be possible to create a staff record with branch
number B025, for example, unless there is already a record for branch number B025 in the
Branch relation. However, we should be able to create a new staff record with a null branch
number, to cater for the situation where a new member of staff has joined the company but
has not yet been assigned to a particular branch office.
General Constraints
General Additional rules specified by the users or database administrators of
constraints a database that define or constrain some aspect of the enterprise.
It is also possible for users to specify additional constraints that the data must satisfy. For
example, if an upper limit of 20 has been placed upon the number of staff that may work
at a branch office, then the user must be able to specify this general constraint and expect
the DBMS to enforce it. In this case, it should not be possible to add a new member of staff
at a given branch to the Staff relation if the number of staff currently assigned to that branch
is 20. Unfortunately, the level of support for general constraints varies from system to
system. We discuss the implementation of relational integrity in Chapters 6 and 17.
Reviewed by Shopping Sale
on
22:56
Rating:
No comments: