This article is a mirror article of machine translation, please click here to jump to the original article.

View: 13077|Reply: 1

Alibaba: Get started with RocketMQ in ten minutes

[Copy link]
Posted on 7/28/2017 8:26:52 PM | | | |
This article first leads to what problems message middleware usually needs to solve, what difficulties will be encountered in solving these problems, whether Apache RocketMQ can be solved as a high-performance, high-throughput distributed message middleware open source by Alibaba, and how these problems are defined in the specification. This article will then introduce the architecture design of RocketMQ, in order to give readers a quick understanding of RocketMQ.
1. What problems does message middleware need to solve? Publish/Subscribe is the most basic function of message middleware and is also relative to traditional RPC communication. I won't go into detail here.
The priority described in the Message Priority specification refers to a message queue, each message has a different priority, generally described by integers, the message with high priority is delivered first, if the message is completely in a memory queue, then it can be sorted according to the priority before delivery, so that the high priority is delivered first.
Since all messages in RocketMQ are persistent, if they are sorted according to priority, the overhead will be very large, so RocketMQ does not specifically support message priority, but can implement similar functions in a workaround, that is, configure a queue with a high priority and a queue with a normal priority, and send different priorities to different queues.
For priority issues, they can be summarized into 2 categories:
  • As long as the priority is achieved, it is not a priority in the strict sense, and the priority is usually divided into high, medium, low, or several more levels. Each priority can be represented by a different topic, and when sending a message, specify different topics to represent the priority, which can solve most of the priority problems, but compromise the accuracy of business priorities.
  • Strict priority, priority is expressed as an integer, such as 0 ~ 65535, this kind of priority problem is generally not suitable to be solved with different topics. If you want MQ to solve this problem, it will have a very big impact on MQ's performance. Here's a point to make sure that the business really needs this strict prioritization, and if the priorities are compressed into few, how much impact will it have on the business?
Message Order refers to a type of message that can be consumed in the order in which it is sent. For example, an order generates 3 messages, namely order creation, order payment, and order completion. When consuming, it is meaningful to consume in this order. But at the same time, orders can be consumed in parallel.
RocketMQ can strictly ensure that messages are orderly.
Message FilterBroker Message filtering
In Broker, filtering according to the requirements of the consumer has the advantage of reducing the transmission of unnecessary messages to the consumer.
The disadvantage is that it increases the burden on the broker and is relatively complex to implement.
1. Taobao Notify supports a variety of filtering methods, including direct filtering by message type and flexible syntax expression filtering, which can meet almost the most demanding filtering needs.
2. Taobao RocketMQ supports filtering by simple Message Tag, as well as Message Header and body.
3. Flexible syntax expression filtering is also supported in the CORBA Notification specification.
Consumer-side message filtering
This filtering can be fully customized by the application, but the downside is that a lot of useless messages are sent to the consumer.
There are several common persistence methods used by message persistence:
  • Persist to a database, such as Mysql.
  • Persist to KV storage, such as levelDB, Berkeley DB, and other KV storage systems.
  • Persistence in the form of file records, such as Kafka, RocketMQ
  • Make a persisted image of the memory data, such as beanstalkd, VisiNotify
  • (1), (2), and (3) all three persistence methods have the ability to extend the memory queue buffer, and (4) are just a memory image, which can still restore the data from the previous memory after the broker hangs up and restarts.
The JMS and CORBA Notification specifications do not explicitly specify how to persist, but the performance of the persistence part directly determines the performance of the entire message middleware.
RocketMQ makes full use of the Linux file system memory cache to improve performance.
There are several situations in which Message Reliablity affects message reliability:
  • Broker closes normally
  • Broker crash
  • OS Crash
  • The machine loses power, but the power supply can be restored immediately.
  • The machine won't turn on (it may be damaged to key devices such as CPU, motherboard, memory, etc.)
  • Disk device damage.
(1), (2), (3), and (4) are all situations where hardware resources can be recovered immediately, and RocketMQ can ensure that messages are not lost or a small amount of data is lost (depending on whether the flashing method is synchronous or asynchronous).
(5) (6) It is a single point of failure and cannot be recovered, once it occurs, all messages on this single point are lost. In both cases, RocketMQ ensures that 99% of messages are not lost through asynchronous replication, but there are still very few messages that may be lost. Synchronous dual write technology can completely avoid single points, which will inevitably affect performance, making it suitable for situations with extremely high message reliability requirements, such as Money-related applications.
RocketMQ supports synchronous dual writing starting from version 3.0.
Low Latency Messaging can reach the consumer immediately after the message reaches the broker without the accumulation of messages.
RocketMQ uses a long polling pull method to ensure that the message is very real-time, and the real-time message is not lower than that of push.
At least once means that each message must be delivered once.
RocketMQ Consumer first pulls the message to the local area, and then returns the ack to the server after the consumption is completed.
Exactly Only Once
  • The sending message stage does not allow sending duplicate messages.
  • In the Consume Message stage, duplicate messages are not allowed to be consumed.
Only when the above two conditions are met can the message be considered "Exactly Only Once", and to achieve the above two points, huge overhead will inevitably be generated in the distributed system environment. Therefore, in order to pursue high performance, RocketMQ does not guarantee this feature, and requires deduplication in the business, which means that consumer messages must be idempotent. Although RocketMQ cannot strictly guarantee non-duplication, under normal circumstances, there are rarely repeated sending and consumption, only network abnormalities, consumer start and stop, and other abnormal situations such as message duplication.
The essential reason for this problem is that there is uncertainty in network calls, that is, the third state of neither success nor failure, so the problem of message repetition arises.
What should I do if Broker's Buffer is full? The broker's buffer usually refers to the memory buffer size of a queue in the broker, which is usually limited in size, what if the buffer is full?
Here's how it's handled in the CORBA Notification specification:
  • RejectNewEvents rejects the new message and returns the RejectNewEvents error code to the Producer.
  • Discard existing messages according to a specific policy
    • AnyOrder - Any event may be discarded on overflow. This is the default setting for this property.
    • FifoOrder - The first event received will be the first discarded.
    • LifoOrder - The last event received will be the first discarded.
    • PriorityOrder - Events should be discarded in priority order, such that lower priority events will be discarded before higher priority events.
    • DeadlineOrder - Events should be discarded in the order of shortest expiry deadline first.

RocketMQ does not have the concept of memory buffer, and the queues of RocketMQ are persistent disks, and the data is cleared regularly.
For the solution to this problem, RocketMQ has a very significant difference from other MQs, RocketMQ's memory Buffer is abstracted into an infinite length queue, no matter how much data comes in, it can be installed, this infinity is premised, the broker will regularly delete expired data, for example, the broker only saves 3 days of messages, then although the length of this Buffer is infinite, but the data from 3 days ago will be deleted from the end of the queue.
Retrospective consumption refers to the message that the consumer has successfully consumed, and the message needs to be re-consumed due to business demand. For example, due to the failure of the consumer system, the data from 1 hour ago needs to be reconsumed after recovery, then the broker should provide a mechanism to revert the consumption progress according to the time dimension.
RocketMQ supports retrospective consumption based on time, with a time dimension accurate to milliseconds, which can be backtracked forward or backward.
The main function of message stacking message middleware is asynchronous decoupling, and another important function is to block the data flood peak of the front-end and ensure the stability of the back-end system, which requires the message middleware to have a certain message stacking ability, and the message heap integrates the following two situations:
  • Messages are piled up in memory buffers, and once they exceed the memory buffer, messages can be dropped according to a certain drop policy, as described in the CORBA Notification specification. It is suitable for services that can tolerate discarding messages, in this case, the accumulation capacity of messages mainly lies in the size of the memory buffer, and the performance degradation will not be too large after the message is stacked, because the amount of data in memory has a limited impact on the access ability provided to the outside world.
  • Messages are piled up in persistent storage systems such as DB, KV storage, file record form. When messages cannot be hit in the memory cache, it is inevitable to access the disk, which will generate a large amount of read IO, and the throughput of read IO directly determines the access ability of messages after they are piled up.
There are four main points to evaluate the message accumulation ability:
  • How many messages can be piled up, how many bytes? That is, the heap capacity of the message.
  • After a message is piled up, is the throughput of the message affected by the stacking?
  • Will the normal consumption of consumers be affected after the message pile up?
  • After the messages are piled up, what is the throughput when accessing messages piled up on disk?
Distributed Transactions Several known distributed transaction specifications, such as XA, JTA, etc. Among them, the XA specification is widely supported by major database vendors, such as Oracle, Mysql, etc. Among them, XA's TM implementation leader such as Oracle Tuxedo is widely used in finance, telecommunications and other fields.
Distributed transactions involve two-stage commit problems, and in terms of data storage, KV storage must be supported, because the second stage of commit rollback needs to modify the message state, which must involve the action of finding the message according to the key. RocketMQ bypasses the problem of finding the message according to the key in the second stage, using the first stage to send the prepared message, getting the offset of the message, and the second stage to access the message through the offset and modify the state, the offset is the address of the data.
RocketMQ's transaction implementation method is not done through KV storage, but through the offset method, which has a significant flaw, that is, changing data through offset will cause too many dirty pages in the system, which requires special attention.
Scheduled messages Scheduled messages mean that messages cannot be consumed by consumers immediately after they are sent to the broker, and can only be consumed at a specific time point or after waiting for a specific time.
If you want to support arbitrary time accuracy, at the broker level, you must do message sorting, and if persistence is involved, then message sorting will inevitably incur huge performance overhead.
RocketMQ supports timing messages, but does not support arbitrary time accuracy, and supports specific levels, such as timing 5s, 10s, 1m, etc.
Message Retry After the consumer fails to consume the message, provide a retry mechanism to make the message consume again. Consumer consumption message failures can usually be considered in the following situations:
  • Due to the reason of the message itself, such as deserialization failure, the message data itself cannot be processed (such as phone bill recharge, the mobile phone number of the current message is logged out, cannot be recharged), etc. This error usually requires skipping this message and consuming other messages, and this failed message is 99% unsuccessful even if the consumption is retried immediately, so it is best to provide a timed retry mechanism, that is, retry after 10 seconds.
  • Because the dependent downstream application services are unavailable, such as the DB connection is unavailable, the external system network is unreachable, etc. When encountering this error, even if the current failed message is skipped, other messages will also be consumed. In this case, it is recommended to apply sleep 30s and consume the next message, which can reduce the pressure on the broker to retry the message.
RocketMQ OverviewLet's find out if RocketMQ solves the problems faced by the message middleware mentioned above.

What is RocketMQ?
The above figure is a typical model of message middleware sending and receiving messages, RocketMQ is also designed in this way, in short, RocketMQ has the following characteristics:
  • It is a queue model message middleware with high performance, high reliability, high real-time, and distributed characteristics.
  • Producer, Consumer, and Queue can all be distributed.
  • Producer sends messages to some queues in turn, the queue collection is called Topic, Consumer If broadcast consumption, one consumer instance consumes all queues corresponding to this topic, and if cluster consumption, multiple consumer instances consume the queue collection corresponding to this topic evenly.
  • Strict message order can be guaranteed
  • Provides rich message pull modes
  • Efficient horizontal subscriber scaling capabilities
  • Real-time message subscription mechanism
  • Hundreds of millions of messages accumulation capacity
  • Less dependency

RocketMQ physical deployment structure

As shown in the figure above, the deployment structure of RocketMQ has the following characteristics:
  • Name Server is a virtually stateless node that can be deployed in clusters without any information synchronization between nodes.
  • The deployment of Broker is relatively complex, Broker is divided into Master and Slave, a Master can correspond to multiple Slaves, but a Slave can only correspond to one Master, the correspondence between Master and Slave is defined by specifying the same BrokerName, different BrokerId, BrokerId is 0 for Master, and non-0 means Slave. Masters can also be deployed in multiple. Each broker establishes a long connection with all nodes in the Name Server cluster and registers topic information to all Name Servers at regular intervals.
  • The producer establishes a long connection with one of the nodes in the Name Server cluster (randomly selected), periodically retrieves topic routing information from the Name Server, establishes a long connection to the master that provides the topic service, and sends heartbeats to the master at regular intervals. Producer is completely stateless and can be deployed in clusters.
  • The consumer establishes a long connection with one of the nodes in the Name Server cluster (randomly selected), regularly retrieves topic routing information from the Name Server, and establishes a long connection to the Master and Slave who provide the topic service, and sends heartbeats to the Master and Slave at regular intervals. Consumers can subscribe to messages from both Master and Slave, and the subscription rules are determined by the Broker configuration.

RocketMQ logical deployment structure

As shown in the figure above, the logical deployment structure of RocketMQ has two characteristics: Producer and Consumer.
  • Producer Group
Used to represent a messaging application, a Producer Group contains multiple Producer instances, which can be multiple machines, multiple processes of a machine, or multiple Producer objects of a process. A Producer Group can send multiple Topic messages, and the Producer Group functions as follows:
  • Identify a type of Producer
  • You can query that there are multiple Producer instances in this messaging application through the O&M tool
  • When sending a distributed transaction message, if the producer goes down unexpectedly, the broker will actively call back any machine in the producer group to confirm the transaction status.
  • Consumer Group
Used to represent a consumer messaging application, a consumer group contains multiple consumer instances, which can be multiple machines, multiple processes, or multiple consumer objects of a process. Multiple Consumers in a Consumer Group consume messages in an evenly distributed manner, and if set to broadcast, each instance under this Consumer Group consumes the full amount of data.

RocketMQ data storage structure

As shown in the figure above, RocketMQ adopts a storage method that separates data from indexes. Effectively reduce the loss of file resources, IO resources, and memory resources. Even with massive data like Alibaba, high-concurrency scenarios can effectively reduce end-to-end latency and have strong horizontal scaling capabilities.






Previous:Unknown: Input variables exceeded 1000. To increase the limit change max_inpu...
Next:July 2017 WIN7\XP· GHOST System Download Encyclopedia! Updates continue, exciting!
Posted on 7/29/2017 8:09:37 AM |
ganxiefenxiang thanks for sharing
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com