Physical design – translate the logical description of data into the
technical specifications for storing and retrieving data so as to provide adequate
performance and insure database integrity, security, and recoverability
Input
- Normalized relations
- Attribute definition – data
type, integrity control, error handling
- Data statistics: data-volume,
frequency of use
- Usage map (Fig. 6-1)
- Requirements: response time,
data security, backup. recovery, retention, and integrity
- IT specifications: hardware, DBMS, …
Data statistics – relative vs precise
numbers
- Data volume – size of
business with growth adjustment
- Access frequency – timing of
events, transaction volumes, reporting activities, querying activities
Composite usage map
- Data volumes – lower left
hand corner of an entity
- Frequency of use – dashed
arrow
Decisions
- Data type of each attribute
– minimize storage space and maximize data integrity
- Grouping of attributes into
physical records – denormalization, horizontal
and vertical partitioning
- File organizations –
sequential, indexed, hashed
- Database architectures
- Queries handling strategies
Data decisions
- Data type – storage space,
domain, validation, manipulation
- Code table, compression
& encryption
- Data integrity – data type,
default, range, null (missing or unknown), referential
- Index -- a table to
determine the location of rows in a file that satisfy some condition
Index -- for information retrieval
- Unique index -- primary key
- Secondary index -- any
attributes
- Bitmap index -- a table of
bits in which each row represents the distinct values of a key and each
entry in the table indicates whether or not the record for that bit column
position has the associated field value (Figure 7-7)
Record decisions – choosing the sequencing of fields into adjacent
storage locations to achieve efficient use of storage and data processing speed
- Partitioning
- Replication
- Denormalization
– the process of transforming normalized relations into unnormalized physical record specifications
Denormalization
- Combining tables to avoid
doing joins (tools: function-relation matrix)
- Horizontal partitioning –
placing different rows of a table into separate files (~ supertype/subtype relationship)
- Vertical partitioning –
placing different columns of a table into separate files, repeating the
primary key in each of the files
- Record partitioning – a
combination of both horizontal and vertical partitioning as in distributed
database processing
- Data replication – same
data are stored in multiple places in the database
Candidates for denormalization
Candidate
|
Why
|
How
|
Benefits
|
1:1 relationship
|
- Attributes have been
mistakenly identified as entities
- Data from both
relations is accessed together, created and deleted at the same time
|
· Combine
the two relations into one
|
· Reduce the
number of relations
· Reduce the
number of joins
|
m:m relationship
|
- One of the relation
has little data apart from its primary key
|
· Merge data
from one of the primary relations with the intersection relation
|
·
eliminating one relation out of three
|
Reference data
|
- Multiple instances of
reference data in one relation
- Code description is
the natural user view
|
· Duplicate
the code description in the relation
|
· No
reduction in number of relations
· Remove the
need to join target entities with code table
|
Detailed data
|
- Multiple rows of data
fanned out from some least significant attribute, e.g., monthly balance
for a budget item of a department in an organization
|
· Discard
the least significant attribute and use one column per possible value of that
attribute, e.g., month vs Balance_1, …, Balance_12
|
· Reduce the
number of rows in the relation containing the most detailed data
|
Derived attributes
|
- Frequent use of data
calculated from its component data
|
· Store the
calculated value together with the component data in a relation
|
· Reduce
time consuming calculations at run-time
|
Joining two tables with many rows for every access is as bad as joining more
tables with few rows
Denormalization is a
practical necessity if performance is important
Design tables that match closely the user's view of data
Query optimization
- Parallel processing -- the
same query is run against each portion of the database (horizontal
partition) in parallel on separate processors. The intermediate results
from each processor are combined to create the final query result.
- Query plan -- statistics are
kept about the database structure, number of distinct values for fields,
and the best strategy for accessing each index and table
e.g.,
S(S#, Sname) has 1,000 tuples
SP(S#,P#) has 10,000 tuples;
50 of which have P#=P2
Get the names of suppliers who supplies P2
Plan1:
Join S with SP over S# giving T1
Restrict T1 where P#="P2" giving T2
Project T2 over Sname giving T3
Plan2:
Restrict SP where P#="P2" giving T1
Join with S over S# giving T2
Project T2 over Sname giving T3