|
Since SQL Server 2005, Microsoft has provided a variety of high availability technologies to reduce downtime and increase the protection of business data, and with the continuous release of SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012, there are many high availability technologies in SQL Server to meet different scenarios. Before I start this article, I'll start with a brief overview of what determines which high availability technology to use.
What does it rely on to decide which high availability technology to use? Many companies need all or part of their data to be highly available, such as online shopping websites, online product databases must be online 24/7, otherwise in a highly competitive market environment, downtime means lost customers and revenue. For example, in a call center that relies on SQL Server, if the database goes down, all callers can only sit there and reply to the customer "Sorry, system failure", which is also unacceptable. Of course, in an ideal world, all critical data will be online at all times, but in the real world, there will be various reasons for the database to be unavailable, because it is impossible to predict the time and form of disaster, it is necessary to take measures in advance to prevent various emergencies, so SQL Server provides a variety of high availability technologies, these technologies mainly include: clustering, replication, mirroring, log delivery, AlwaysOn availability groups and other such as file group backup and restore, Single-instance high-availability technologies such as online rebuilding indexes. The use of high availability technology is not to choose a familiar technology for direct use, but to consider the business and technology comprehensively. Because there is no single technology that can achieve all functions. How to adopt these technologies based on your specific business and budget is what is known as a high availability strategy. When designing a high availability strategy, you should first consider the following factors: - RTO (Recovery Time Objective) - that is, recovery time objective, means how much downtime is allowed, usually expressed by a few 9s, for example, 99.999% availability means no more than 5 minutes of downtime per year, 99.99% availability means no more than 52.5 minutes of downtime per year, and 99.9% availability means no more than 8.75 hours of downtime per year. It is worth noting that the calculation method of RTO takes into account whether the system is 24*365 or just 6 am to 9 pm, etc. You also need to pay attention to whether the maintenance window is counted as downtime, and it is easier to achieve higher availability if database maintenance and patching are allowed during the maintenance window.
- RPO (Recovery Point Objective) – Also known as the recovery point objective, means how much data loss is allowed. Usually, as long as you make a good backup, you can easily achieve zero data loss. But when a disaster occurs, depending on the extent of the database corruption, the time it takes to restore data from a backup will cause the database to become unavailable, which will affect the implementation of RTO. An early and more famous example is a banking system in Europe and the United States, only considering RPO, there are only full backups and log backups in the system, full backups every 3 months, log backups every 15 minutes, when a disaster occurs, only through full backups and log backups can restore data, so although there is no data loss, but because it took two full days to restore data, the banking system was unavailable for 2 days, so a large number of customers were lost. Another opposite example is a domestic online video website, using SQL Server as the back-end relational database, the front-end uses No-SQL, and regularly imports No-SQL data into the relational database as a backup.
Budget – RTO and RPO are collectively known as SLAs (Service Level Agreements), and when designing a high availability strategy, you need to measure how well you meet SLAs based on your business, depending on your budget and measuring the cost of different SLAs in the event of a failure. Generally speaking, it is difficult to achieve high SLAs with a limited budget, and even if high SLAs are achieved through complex architectures, complex architectures also mean high operation and maintenance costs, so it is necessary to choose the right technology within the budget to meet the SLAs. Therefore, in general, the large framework for high availability can be determined by several order-taking questions: - What is the downtime that shareholders are willing to accept?
- What downtime is acceptable for managers?
- What is the budget provided for a high availability scenario?
- How much is the loss per hour due to downtime?
Cold, warm, and hot Depending on the degree of data synchronization between the host and the standby, backups can be divided into three situations: cold backup, warm backup, and hot backup.- Cold backup: The standby server is configured to accept the data of the primary server, and when it fails, manually restore the data to the primary database, or reconfigure the connection string or permissions of the program to bring the backup database online.
- Warm backup: The primary server data will continuously transmit logs to the backup server (at irregular intervals, it can be 15 minutes, 30 minutes, 1 minute, etc.), in this way, the primary server to the backup server is usually updated asynchronously, so the data of the primary server and the backup server cannot be guaranteed. In addition, this scheme typically does not implement automatic fault monitoring and failover.
- Hot backup: The data of the primary server is automatically synchronized on the backup server, and in most cases, automatic fault monitoring and failover are included, and the data consistency of the primary server and the backup server can be guaranteed.
As cold to warm to hot backups, costs skyrocket.
High availability features supported in SQL Server The high availability features supported in SQL Server are closely related to the version, and the Enterprise edition supports all high availability features, including: - Failover cluster
- l Database image
- l Transaction log transmission
- l Database snapshots
- l High availability upgrades
- l Hot load memory
- l Online indexing operations
- l Database partial online (only master file group or master file group and additional NDF files are restored)
For specific versions of which high availability features, see:http://msdn.microsoft.com/zh-cn/library/cc645993.aspxIt is worth noting that the free Express version can serve as a witness server for database mirroring, resulting in cost savings. Failover cluster Failover clusters provide high availability support for the entire SQL Server instance, which means that a SQL Server instance on a node on the cluster fails over to other nodes on the cluster due to hardware errors, operating system errors, etc. High availability is achieved by multiple servers (nodes) sharing one or more disks, and failover clusters appear in the network in the same way as a single computer, but with high availability characteristics. It is important to note that since failover clusters are based on shared disks, there is a single point of disk failure, so additional protections such as SAN replication need to be deployed at the disk level. The most common failover cluster is a two-node failover cluster, including the master and slave.
Transaction log transmission Transaction log shipping provides database-level high availability protection. Logging is used to maintain one or more standby databases (called "secondary databases") of the corresponding production database (called the "primary database"). Before a failover occurs, the secondary database must be fully updated by manually applying all unrestored log backups. Log delivery has the flexibility to support multiple standby databases. If multiple alternate databases are required, log delivery can be used separately or as a complement to database mirroring. When these solutions are used together, the principal database of the current database mirroring configuration is also the primary database of the current log shipping configuration. Transaction log delivery can be used to do cold and warm backups.
Database mirroring Database mirroring is actually a software solution that also provides database-level protection, providing near-instantaneous failover to improve database availability. A database mirror can be used to maintain a single standby database (or "mirror database") for the corresponding production database (called the "principal database"). Because the mirror database is always in a restore state, but the database is not restored, the mirror database cannot be accessed directly. However, for read-only loads such as reports, you can use the mirrored database indirectly by creating a database snapshot of the mirrored database. Database snapshots provide clients with read-only access to the data in the database when the snapshot is created. Each database mirroring configuration involves a "principal server" that contains the principal database, and also involves a mirror server that contains the mirrored database. The mirror server continuously updates the mirror database with the principal database. Database mirroring runs in synchronous operation in high-security mode or asynchronous operation in high-performance mode. In high-performance mode, transactions do not need to wait for the mirror server to write logs to disk before they can be submitted, which maximizes performance. In high-security mode, committed transactions are committed by both partners, but the transaction lag time is extended. The simplest configuration of database mirroring involves only the principal server and the mirror server. In this configuration, if the principal server is lost, the mirror server can be used as a standby server, but it can cause data loss. High security mode supports standby configuration high security mode with automatic failover. This configuration involves a third-party server instance called a "witness server" that enables the mirror server to be used as a hot backup server. Failover from the primary database to the mirror database typically takes a few seconds. Database mirroring can be used for both warm and hot backups.
copy Replication is not strictly a feature designed for high availability, but it can be applied to high availability. Replication provides database object-level protection. Replication uses a publish-subscribe model, where data is published by the primary server, known as the publisher, to one or more secondary or subscribers. Replication provides real-time availability and scalability between these servers. It supports filtering to provide a subset of data to subscribers, while also supporting partition updates. The subscriber is online and available for reporting or other functions without query recovery. SQL Server offers four replication types: snapshot replication, transactional replication, peer-to-peer replication, and merge replication.
AlwaysOnUsability group AlwaysOn Availability Groups is a new feature introduced in SQL Server 2012. Database-level protection is also provided. It also expands the limit that database mirroring can only be 1:1, so that one primary replica can correspond to up to 4 secondary replicas (in SQL Server 2014, this limit is expanded to 8), of which 2 secondary replicas can be synchronized as hot backups and primary replicas in real time, and the other two asynchronous secondary replicas can be used as warm backups. In addition, secondary replicas can be configured as read-only and can be used to take on the load of backups. Because of this, database mirroring is marked as "obsolete" in SQL Server 2012.
High availability strategy design After understanding the basic concepts of high availability and the high availability technologies provided in SQL Server, let's take a look at the design of a high availability strategy. Planning a high availability strategy can be divided into four phases: Gather requirements The first step in deciding on a high availability strategy is undoubtedly to gather business requirements to establish SLAs. RTO and RPO are the most critical parts, and on this basis, establish realistic expectations for availability requirements and establish a realistic high availability strategy based on these expectations. Assessment Limits Assessment limits are not only limited to the limitations of different high availability technologies in SQL Server, but also to those that are non-technical. If you only have a budget of tens of thousands of yuan, but you want to do a high-availability solution based on off-site data centers and SAN replication, it is undoubtedly a fool's dream. Another non-technical limitation is the level of operations personnel, and often, complex architectures mean more skilled operations personnel. Other non-technical limitations include the availability of disk space in the data center, whether power supply and air conditioning can meet the needs, and the time required to implement the availability strategy. Technical limitations include different high-availability functions and limitations, functions supported by different SQL Server versions, the number of CPUs, and the size of memory. It is strongly recommended that you first refer to the limitations of different SQL Server versions and features on the Microsoft MSDN website before implementing a high availability policy. Select technology After gathering requirements and assessing constraints, the next step is to select the technologies or combination of technologies described earlier in this article to meet the SLA requirements. If the selected technology fails to meet the SLA, it is easy to report what limitations are not meeting the SLA, allowing you to request missing resources or compromise on the SLA. Test, validate, and document High availability policies need to be rigorously tested and validated from the outset to ensure that current availability policies meet SLAs. However, when a high availability strategy is launched, it is also necessary to regularly test and validate it to ensure that the current policy can still meet SLAs despite data growth, business or requirements changes. At the same time, the configuration of the availability solution, the method of failover, and the disaster recovery plan should be documented at the same time so that it can be traced in case of failure or future adjustment of the high availability strategy.
SummaryThis article explains the basic concepts of high availability, the concept of SLAs, the different kinds of high availability features supported in SQL Server, and the steps required to design a high availability strategy. It is worth noting that although this article only talks about high availability at the database level, high availability is not only a matter for DBA, but also includes the collaboration of different roles such as system operation and maintenance personnel, network administrators, developers, and managers to better meet SLAs.
|