Zookeeper is a subproject of Hadoop, and although it is derived from hadoop, I have found that zookeeper is increasingly using distributed frameworks outside of hadoop. Today I want to talk about zookeeper, this article will not talk about how to use zookeeper, but what are the practical applications of zookeeper, what types of applications can play the advantages of zookeeper, and finally talk about what role zookeeper can play in distributed website architecture. Zookeeper is a highly reliable coordination system for large distributed systems. From this definition, we know that zookeeper is a coordinated system that acts on distributed systems. Why do distributed systems need a coordination system? The reasons are as follows:
Developing a distributed system is a very difficult thing, and the difficulty is mainly reflected in the "partial failure" of the distributed system. "Partial failure" refers to the transmission of information between two nodes of the network, if the network fails, the sender cannot know whether the receiver has received the message, and the cause of this failure is complex, the receiver may or may not have received the message before the network error, or the receiver's process is dead. The only way the sender can get the real picture is to reconnect to the receiver and ask the receiver why the error occurred, which is the "partial failure" problem in distributed system development.
Zookeeper is the framework for solving the "partial failure" of distributed systems. Zookeeper does not allow distributed systems to avoid "partial failure" problems, but allows distributed systems to correctly handle such problems when encountering partial failures, so that distributed systems can operate normally.
Let's talk about the practical use of zookeeper:
Scenario 1: There is a group of servers that provide a certain service to the client (for example, the server side of the distributed website I made earlier is a cluster composed of four servers to provide services to the front-end cluster), and we hope that the client can find a server in the server cluster every time the client requests it, so that the server can provide the client with the services required by the client. For this scenario, we must have a list of servers in our program, from which we read the list of servers every time the client requests it. Then this sublist obviously cannot be stored on a single node server, otherwise the node will hang up and the entire cluster will fail, and we hope that this list will be highly available at that time. If a server in the storage list is broken, other servers can immediately replace the broken server, and the broken server can be removed from the list, so that the failed server can withdraw from the operation of the entire cluster, and all these operations will not be operated by the failed server, but by the normal server in the cluster. This is an active distributed data structure that can actively modify the state of data items when external conditions change. The Zookeeper framework provides this service. The name of this service is: Unified Naming Service, which is very similar to the JNDI service in javaEE.
Scenario 2: Distributed lock service. When a distributed system manipulates data, such as reading data, analyzing data, and finally modifying data. In the distributed system, these operations may be dispersed to different nodes in the cluster, then there is a problem of consistency in the process of data operation, if it is inconsistent, we will get an wrong operation result, in a single process program, the problem of consistency is easy to solve, but it is more difficult to reach the distributed system, because the operations of different servers in the distributed system are in independent processes, and the intermediate results and processes of the operation must be transmitted through the network. Then it is much more difficult to achieve data operation consistency. Zookeeper provides a lock service that solves this problem, allowing us to ensure the consistency of data operations when doing distributed data operations.
Scenario 3: Configuration management. In a distributed system, we will deploy a service application to n servers separately, and the configuration files of these servers are the same (for example, in the distributed website framework I designed, there are 4 servers on the server side, the programs on the 4 servers are the same, and the configuration files are the same), if the configuration options of the configuration files change, then we have to change these configuration files one by one, if we need to change the servers are relatively small, these operations are not too troublesome, If we have a large number of distributed servers, such as a large Internet company's Hadoop cluster with thousands of servers, then changing configuration options can be a troublesome and dangerous thing. At this time, zookeeper can come in handy, we can use zookeeper as a highly available configuration memory, hand over such a thing to zookeeper for management, we copy the configuration file of the cluster to a node of the zookeeper's file system, and then use zookeeper to monitor the status of the configuration file in all distributed systems, once it is found that the configuration file has changed, Each server will receive a notification from Zookeeper to synchronize the configuration files in Zookeeper, and the Zookeeper service will also ensure that the synchronization operation is atomic to ensure that each server's configuration file is updated correctly.
Scenario 4: Provide fault repair functions for distributed systems. Cluster management is difficult, and adding the zookeeper service to the distributed system makes it easy for us to manage the cluster. The most troublesome thing in cluster management is node fault management, zookeeper can let the cluster select a healthy node as the master, the master node will know the current health status of each server in the cluster, once a node fails, the master will notify the other servers in the cluster, so as to redistribute the computing tasks of different nodes. Zookeeper can not only find faults, but also screen the faulty server, see what kind of fault the fault server is, if the fault can be repaired, zookeeper can automatically fix or tell the system administrator the reason for the error, so that the administrator can quickly locate the problem and repair the fault of the node. You may still have a question, what should I do if the master is faulty? Zookeeper also takes this into account, zookeeper has an internal "algorithm for electing leaders", masters can be dynamically selected, and when the master fails, zookeeper can immediately select a new master to manage the cluster.
Let's talk about the features of zookeeper:
ZooKeeper is a streamlined file system. This is a bit similar to Hadoop, but the ZooKeeper file system manages small files, while Hadoop manages very large files.
Zookeeper provides a wealth of "artifacts" that enable many operations to coordinate data structures and protocols. For example: distributed queues, distributed locks, and the "leader election" algorithm of a group of nodes at the same level.
ZooKeeper is highly available, its own stability is quite good, distributed clusters can rely on the management of Zookeeper clusters, and ZooKeeper is used to avoid the problem of single point failure of distributed systems.
Zookeeper adopts a loosely coupled interaction mode. This is most evident in the fact that zookeeper provides distributed locks, which can be used as an appointment mechanism to allow participating processes to discover and interact with each other without knowing the other processes (or the network), and the participating parties do not even have to exist at the same time, as long as they leave a message in zookeeper, and after the process ends, another process can read this message, thus decoupling the relationship between nodes.
ZooKeeper provides a shared repository for the cluster, from which the cluster can read and write shared information centrally, avoiding the programming of shared operations for each node and reducing the development difficulty of distributed systems.
Zookeeper is mainly responsible for storing and managing the data that everyone cares about, and then accepting the registration of observers, once the status of these data changes, Zookeeper will be responsible for notifying those observers who have registered on Zookeeper to respond accordingly, so as to achieve a master/slave management mode similar to the cluster.
It can be seen that zookeeper is very conducive to distributed system development, which can make distributed systems more robust and efficient.
Not long ago, I participated in the department's hadoop interest group, and I installed hadoop, mapreduce, Hive and Hbase in the test environment, and I installed zookeeper in advance when installing hbase. zookeeper can provide services, so more than half of the 3 is 2, and more than half of the 4 is also two, so installing three servers can achieve the effect of 4 servers. In the process of learning hadoop, I feel that zookeeper is the most difficult sub-project to understand, the reason is not that it is technically responsible, but that its application direction is very confusing to me, so my first article about hadoop technology starts with zookeeper, and does not talk about specific technical implementation, but from the application scenarios of zookeeper, I understand the field of zookeeper application, I think learning zookeeper will be more effective with half the effort.
The reason why I want to talk about zookeeper today is to complement the distributed website framework in my previous article. Although I designed the website architecture to be a distributed structure, I also made a simple fault handling mechanism, such as the heartbeat mechanism, but there is still no way to solve the single point of failure of the cluster, if a server is broken, the client will try to connect to this server, resulting in the blocking of some requests, and also leading to the waste of server resources. However, I don't want to modify my framework at the moment, because I always feel that adding zookeeper service to existing services will affect the efficiency of the website. Fortunately, our department has also found such a problem, our department will develop a powerful remote call framework, separate the cluster management and communication management, and provide efficient and available services centrally.
Transferred from ttp://www.cnblogs.com/sharpxiajun/archive/2013/06/02/3113923.html |