Database Design Principles (Reprint) Fourteen Tips

test · Posted on 4/9/2015 9:00:54 PM

1. The relationship between the original document and the entity

It can be one-to-one, one-to-many, and many-to-many relationships. In general, they are one-to-one relationships: i.e. a pair of original documents
should and only correspond to one entity. In special cases, they may be one-to-many or many-to-one relationships, i.e., one original document corresponds to multiple realities
body, or multiple original documents corresponding to an entity. The entity here can be understood as a basic table. After clarifying this correspondence, design for us
The entry interface is very beneficial.
〖Example 1〗: An employee resume information corresponds to three basic tables in the human resources information system: employee basic information table and society
Relationship table, work resume form. This is a typical example of "one original document corresponds to multiple entities".
2. Primary and foreign keys
In general, an entity cannot have neither a primary nor a foreign key. In the E-R diagram, the entities in the leaf part can define the primary key,
It is also possible not to define a primary key (because it has no children), but it must have a foreign key (because it has a father).
The design of primary and foreign keys occupies an important position in the design of global databases. When the design of the global database is completed, there is a
American database design experts said: "Keys, keys everywhere, nothing but keys", this is his database design experience
It also reflects his highly abstract ideas about the core of information systems (data models). Because: the primary key is a highly abstract entity, and the primary key is associated with
A pair of foreign keys that represents a connection between entities.
3. The nature of the basic table
The basic table is different from the intermediate table and the temporary table because it has the following four characteristics:
(1) Atomicity. Fields in the base table are no longer decomposable.
(2) Primitiveness. The records in the base table are records of the original data (the underlying data).
(3) Deductive. All output data can be derived from the data in the base table and the code table.
(4) Stability. The structure of the basic table is relatively stable, and the records in the table should be kept for a long time.
After understanding the nature of basic tables, when designing databases, basic tables can be distinguished from intermediate tables and temporary tables.
4. Paradigm standards
The relationship between the basic table and its fields should meet the third paradigm as much as possible. However, database designs that meet the third paradigm are often not
The best design. In order to improve the operational efficiency of databases, it is often necessary to reduce the paradigm standard: appropriately increase redundancy to achieve space for time
The purpose of.
Example 2: There is a basic table for the storage of goods, as shown in Table 1. The presence of the "Amount" field indicates that the table is not designed to be satisfied
The third paradigm is sufficient, because "amount" can be obtained by multiplying "unit price" by "quantity", indicating that "amount" is a redundant field. However, increase
The redundant field of "amount" can improve the speed of query statistics, which is the practice of exchanging space for time.
In Rose 2002, there are two types of prescribed columns: data columns and calculated columns. A column like "amount" is called a "calculation column", and
Columns like "Unit Price" and "Quantity" are called "data columns".
Table 1 Table structure of the commodity table
Product Name Product Model Unit Price Quantity Amount
TV 29 inch 2,500 40 100,000

5. Understand the three paradigms in layman's terms
Understanding the three paradigms in layman's terms is of great benefit to database design. In database design, in order to better apply the three paradigms, just
Three paradigms must be understood in layman's terms:
The first paradigm: 1NF is an atomic constraint on attributes, which requires attributes to be atomic and can no longer be decomposed;
The second paradigm: 2NF is the uniqueness constraint on records, requiring records to have a unique identification, that is, the uniqueness of the entity;
Paradigm 3: 3NF is a constraint on field redundancy, that is, no field can be derived from other fields, it requires that the field is not redundant
。
No redundant database design can do it. However, a database without redundancy is not necessarily the best database, sometimes to improve luck
To achieve efficiency, it is necessary to reduce the paradigm standard and appropriately retain redundant data. The specific approach is to adhere to the third paradigm when designing conceptual data models
, the work of reducing the paradigm standard is considered in the design of the physical data model. Lowering the paradigm is to add fields that allow for redundancy.
6. Be good at identifying and correctly handling many-to-many relationships
If there is a many-to-many relationship between two entities, the relationship should be eliminated. The way to eliminate it is to add a third real between the two
body. In this way, what used to be a many-to-many relationship has now become two one-to-many relationships. The attributes of the original two entities should be reasonably distributed
Go to the three entities. The third entity here is essentially a more complex relationship, which corresponds to a basic table. Generally speaking, numbers
The library design tool cannot recognize many-to-many relationships, but it can handle many-to-many relationships.
Example 3: In the "Library Information System", "book" is an entity, and "reader" is also an entity. These two entities are the same
The relationship between books is a typical many-to-many relationship: a book can be borrowed by multiple readers at different times, and one reader can borrow more
This book. To this end, a third entity should be added between the two, which is named "borrowing and returning books", and its properties are: borrowing time and borrowing
It also has a logo (0 means borrowing a book, 1 means returning a book), in addition, it should also have two foreign keys (the primary key of "book" and the primary key of "reader"), so that
It connects with "books" and "readers".
7. The value method of the primary key PK
PK is an inter-table connection tool for programmers, which can be a string of numbers without physical significance, which is automatically added by the program to 1. Yes
is a physically meaningful field name or a combination of field names. But the former is better than the latter. When PK is a combination of field names, suggest a field number
Don't count too much, as the index not only takes up a lot of space, but also slows down.
8. Get data redundancy right
The repetition of primary and foreign keys in multiple tables is not a concept of data redundancy, and many people are not aware of it
。 The repetition of non-key fields is data redundancy! And it is a low-level redundancy, that is, repetitive redundancy. Advanced redundancy is not field-based
Repeatedly, but derivatives of fields.
Example 4: The three fields of "unit price, quantity, and amount" in the product, "amount" is derived from the "unit price" multiplied by "quantity"
It is redundancy, and it is a kind of advanced redundancy. The purpose of redundancy is to increase processing speed. Only low-level redundancy will increase the number
inconsistency of data, because the same data may be entered multiple times from different times, places, and roles. Therefore, we advocate for advanced redundancy (pie
redundancy by nature), and opposes low-level redundancy (repetitive redundancy).
9. There is no standard answer for E--R diagrams
There is no standard answer to the E--R diagram of an information system, because its design and drawing method are not unique, as long as it covers the business required by the system
The scope and functional content are feasible. Instead, it is necessary to modify the E--R diagram. Although it does not have a single standard answer, it does not mean that it can be arbitrary
Design. The criteria for a good E-R diagram are: clear structure, concise association, moderate number of entities, reasonable attribute allocation, and no low-level redundancy.
10. View techniques are useful in database design
Unlike basic tables, code tables, and intermediate tables, views are virtual tables that depend on the real tables of the data source to exist. Views are for programmers
A window using the database is a form of base table data synthesis, a method of data processing, and a kind of user data confidentiality
means. In order to perform complex processing, increase the speed of computing, and save storage space, the definition depth of the view should generally not exceed three layers. Like three floors
If the view is still not enough, you should define a temporary table on the view and then define a view on the temporary table. In this way, the depth of the view is defined repeatedly
No restrictions.
For certain information systems related to national political, economic, technological, military and security interests, the role of views is even more important. These
After the physical design of the basic table of the system is completed, the first layer of views is immediately established on the basic table, and the number and structure of this layer view are the same as the basic table
The number and structure are exactly the same. And it is stipulated that all programmers are only allowed to operate on the view. Only the database administrator, with
The "safety key" held by multiple personnel can be operated directly on the basic table. Readers are invited to think: why is this?
11. Intermediate tables, statements and temporary tables
An intermediate table is a table that stores statistics, it is designed for data warehousing, output reports, or query results, and sometimes it does not have a primary key with
foreign keys (except data warehouses). Temporary tables are designed by programmers to store temporary records for personal use. The base and intermediate tables are maintained by the DBA
Temporary tables are automatically maintained by the programmer himself.
12. Integrity constraints are manifested in three aspects
Domain integrity: Use Check to implement constraints, and in the database design tool, there is a Ch when defining the value range of the field
eck button, through which the value city of the field is defined.
Referential integrity: Implemented with PK, FK, and table-level triggers.
User-defined integrity: It is some business rules that are implemented with stored procedures and triggers.
13. The method to prevent database design patching is the "three less" principle
(1) The fewer tables in a database, the better. Only if the number of tables is reduced can it be said that the E-R diagram of the system is small and fine, and it is removed
The duplicate and redundant entities form a high degree of abstraction of the objective world, and systematic data integration is carried out to prevent patching design;
(2) The fewer fields in a table that combine primary keys, the better. Because of the role of the primary key, one is to build the primary key index, and the other is to serve as a sub-table
foreign keys, so the number of fields in the combination of primary keys is reduced, which not only saves running time, but also saves index storage space;
(3) The fewer fields in a table, the better. Only a small number of fields indicates that there is no data duplication in the system
There is little data redundancy, and more importantly, readers are urged to learn to "change rows", which prevents fields from being pulled into the main table in the subtable
, leaving many free fields in the main table. The so-called "column change row" is to pull out part of the content of the main table and build a separate one
Sub-table. This method is very simple, some people just don't get used to it, don't adopt it, and don't implement it.
The practical principle of database design is to find the right balance between data redundancy and processing speed. The "three less" is a holistic overview
Thought, comprehensive views, cannot isolate a certain principle. The principle is relative, not absolute. The "three more" principle is definitely wrong. Try
Think: If the same function of the system is covered, the E--R diagram of 100 entities (1,000 attributes in total) is definitely better than the E--R diagram of 200 entities (2,000 attributes in total)
The E--R diagram is much better.
Advocating the principle of "three less" is to let readers learn to use database design technology for systematic data integration. The steps for data integration are to put to:
The file system is integrated into an application database, the application database is integrated into a topic database, and the topic database is integrated into a global comprehensive database.
The higher the degree of integration, the stronger the data sharing, and the less information islands are present
The number of primary keys, and the number of attributes will be smaller.
The purpose of advocating the principle of "three less" is to prevent readers from using patching technology to constantly add, delete and modify the database, so as to make enterprise data
The library has become a "garbage heap" of arbitrarily designed database tables, or a "mess" of database tables, and finally causes the basic tables and generations in the database
The code tables, intermediate tables, and temporary tables are cluttered and countless, resulting in the inability to maintain and paralyze the information systems of enterprises and institutions.
The "three more" principle can be done by anyone, which is the fallacy of the "patching method" to design databases. The principle of "three less"
It is a principle of less but fine, which requires high database design skills and art, which not everyone can do, because this principle is eliminated
The theoretical basis for designing the database using the "patching method".
14. Ways to improve database operation efficiency
Under the given system hardware and system software conditions, the methods to improve the operation efficiency of the database system are:
(1) In the physical design of the database, reduce the paradigm, increase redundancy, use less triggers, and use more stored procedures.
(2) When the calculation is very complex and the number of records is very large (such as 10 million), the complex calculation must first be outside the database
After the file system method is calculated and processed in C++ language, it is finally added to the table. This is the experience of telecom billing system design.
(3) If a table is found to have too many records, such as more than 10 million, the table should be split horizontally. The practice of horizontal segmentation is:
Divide the table's record horizontally into two tables based on a certain value of the primary key PK of the table. If a table is found to have too many fields, such as exceeding
Eighty, the table is divided vertically, and the original table is split into two tables.
(4) System optimization of the database management system DBMS, that is, optimization of various system parameters, such as the number of buffers.
(5) When using data-oriented SQL language for programming, try to adopt optimization algorithms.
In short, to improve the operation efficiency of the database, it is necessary to optimize the database system, the database design, and the program implementation
, these three levels work hard at the same time.
The above fourteen skills are gradually summarized by many people in a large number of database analysis and design practices. For these experiences
Readers should not be rigid or rote, but should digest and understand, seek truth from facts, and master flexibly. And gradually do: send in the application
exhibition, application in development.

[Source] Database Design Principles (Reprint) Fourteen Tips

Related Posts

Sections viewed