|
|
Posted on 12/25/2014 4:18:52 PM
|
|
|
|

Background and needs
China Railway Customer Service Center website (www.1230**) is one of the world's largest real-time trading systems, comparable to Amazon.com, and the website is under great pressure during holidays, especially during the Spring Festival. According to statistics, during the peak of the Spring Festival in early 2012, 20 million people visited the website every day, with a maximum of 1.4 billion daily hits. A large number of simultaneous network accesses caused 12306 to be nearly paralyzed. The Institute of Electronic Computing Technology of the Chinese Academy of Railway Sciences, as the contractor of the 12306 Internet ticketing system, urgently needs to find a way to solve the problem.
Successful resolution: more than 75 times faster
Since March 2012, the Railway Corporation (formerly the Ministry of Railways) has begun to investigate and renovate 12306. In June 2012, Pivotal GemFire distributed in-memory computing platform (Distributed In-memory computing) was selected to transform 12306, which was provided by Wang Mingzhe, head of the project team of the Academy of Iron Sciences, and IISI Information Technology Co., Ltd. under the leadership of Zhu Jiansheng, director of the Academy of Railway Sciences.The first phase will first transform the main bottleneck of 12306 - the remaining ticket query system。 In September, the code transformation was completed and the system was launched. On National Day in 2012, during the peak period of online booking, you can significantly find that you can log in to 12306, although it is still difficult to book tickets, but the remaining tickets are very fast. In October 2012,The second phase is to transform the order inquiry system with GemFire (customers query their own order records)。 During the Spring Festival of 2013, it is the peak period of online booking, and you can significantly find that you can log in to 12306, although it is still difficult to book tickets, but the query of remaining tickets is very fast, and the query of your own booking and order is also very fast.
According to the system operation data record, after the technical transformation,In the use of only 10 X86 servers, the remaining ticket calculation and query capabilities of dozens of small computers have been realized, and the maximum time of a single query has been reduced from about 15 seconds to less than 0.2 seconds, which has been shortened by more than 75 times。 In the case of extremely high traffic concurrency during the 2012 Spring Festival, the system was almost paralyzed. After the transformation,It supports tens of thousands of concurrent queries per second, reaching a throughput of 26,000 queries per second during peak periods, the efficiency of the entire system is significantly improved. As shown in the image above.
In the system operation mode before the transformation, the order query system can only support a throughput of 300-400 queries per second, and high-traffic concurrent queries can only be realized through database splitting. After the transformation, the throughput can be up to tens of thousands of queries per second, and the query speed can be guaranteed to be about 20 milliseconds.
The new technology architecture can:Dynamic scaling on demand elasticityWhen the amount of concurrency increases, you can also dynamically increase the X86 server to maintain a millisecond-level response time.
Looking for it in a dream: the technological revolution spans three generations in one step
12306 can achieve such earth-shaking effects, it is impossible to rely on small technical repairs, and there must be a new idea that can bring leverage to performance improvement. 12306 discovered that the GemFire distributed in-memory data platform is one such technology.
Technical rationale of the GemFire distributed in-memory data platformAs shown in the figure above: Through the virtualization technology of the cloud computing platform, the memory of several X86 servers is centralized to form a memory resource pool of up to tens of terabytes, and all data is loaded into memory for in-memory calculation. The computation process itself does not need to read and write to the disk, but only periodically writes data to the disk synchronously or asynchronously. GemFire stores multiple copies of data in a distributed cluster, and if any machine fails, there are backups on other machines, so there is usually no need to worry about data loss, and there is disk data as a backup. GemFire supports persisting in-memory data into a variety of traditional relational databases, Hadoop libraries, and other file systems.
As we all know, the bottleneck of the current computing architecture is storage, the speed of the processor doubles according to Moore's Law, and the speed of disk storage grows very slowly, resulting in a huge gap of up to 100,000 times (as shown in the figure above). This makes it easy to understand why GemFire can greatly improve system performance.
According to the relationship between computing and storage, we can divide the computing architecture into four generations:
The first generation, a single disk-based system: Data needs to be read from disk during calculations. Small computers and mainframes are among the leaders, achieving the ultimate performance of a single system.
The second generation, a distributed clustering system based on disks: During the calculation process, data needs to be read from disk, but the data is distributed across different server disks through a distribution system to improve the processing power of the entire system. At present, many large Internet and e-commerce companies use distributed clustering systems based on X86 servers, relying on massive X86 server deployment to solve the problem of high traffic concurrency.
The third generation, a single memory-based system: Puts the entire database in memory, and the calculation process does not require reading data from disk. The performance of the entire system depends on the performance of a single system. Traditional in-memory databases are such systems, which can solve the problem of access speed well for enterprise-level applications, but cannot do anything about the scalability problem of massive data or massive concurrent access.
The fourth generation is a memory-based distributed clustering system: GemFire is such a system, parallel computing is one of its key technologies, so it can scale performance linearly on the basis of in-memory computing by increasing the scale of server deployment.
12306 previously adopted the Unix minicomputer architecture and used GemFire technology to transform it into a Linux/X86 server cluster architecture, which means that it spans three generations. From small computers to large memory X86 server clusters, not only improves performance by an order of magnitude, but also costs much lower.
GemFire is part of Pivotal's enterprise-grade big data PaaS platform. Pivotal's enterprise-level big data PaaS platform has three main levels: Cloud Fabric, Big Data Infrastructure Layer, and Application Fabric. GemFire belongs to the big data infrastructure layer, and in addition, the Greenplum database also belongs to this layer; The technology of the cloud infrastructure layer is Cloud Foundry; The technologies for the application development infrastructure layer are Spring Framework and RabbitMQ, among others.
Regarding the transformation of the introduction of GemFire technology, Zhu Jiansheng, deputy director of the Institute of Electronic Computing Technology of the Chinese Academy of Railway Sciences, said: "Through the technological transformation, we have solved the problem of peak high traffic concurrency that has plagued us for a long time, so that the people of the whole country no longer complain because of technical reasons, and we are finally relieved." Pivotal GemFire distributed cluster in-memory data technology has played a key role in the entire technology transformation. At the same time, thanks to the efforts of Pivotal and its project team to ensure the smooth operation of the old system and the smooth migration from the old system to the new system during the technical development and transformation process, the new system was quickly launched. ”
|
Previous:12306 A large number of user passwords were leaked, and the 12306 database download address was leakedNext:A large amount of 12306 user data went viral on the Internet, including user account numbers, plaintext passwords, ID cards, emails, etc
|