Science popularization tools for large-scale architectures

Little scum · Posted on 5/14/2018 1:32:51 PM

I.1 Java Spring Boot

It is suitable for building microservice systems

Using the Spring Project Bootstrap page can build a project in seconds
It is convenient to export various forms of services, such as REST API, WebSocket, Web, Streaming, and Tasks
Very concise security policy integration
Relational and non-relational databases are supported
Support embedded containers during runtime, such as Tomcat and Jetty
Powerful development package with support for hot boot
Automatically manage dependencies
Built-in application monitoring
Supports various IEDs, such as IntelliJ IDEA, NetBeans

Other languages: .net core, Go, etc

I.2 Jenkins

Automated CI procedures for continuous integration

Open source and free
Cross-platform, support all platforms (I installed on Ubuntu 14.04, using jenkins docker image was not successful)
master/slave supports distributed builds
A visual management page in the form of a web
Installation and configuration are super easy
tips Timely and fast help
Hundreds of plugins that already exist

I.3 GitLab

A self-hosted Git project repository that can be accessed through a web interface for public or private project installations.
It has similar functionality to GitHub, with the ability to navigate the source code, manage bugs, and comments. It is possible to manage team access to the repository, it is very easy to browse through submitted versions and provides a file history library.
Team members can communicate using the built-in simple chat program (Wall). It also provides a code snippet collection feature that allows for easy code reuse and easy to find later when needed.
Dokcer
Docker is an open-source engine that makes it easy to create a lightweight, portable, and self-sufficient container for any application. Containers that developers compile and test on notebooks can be deployed in batches in production environments, including VMs (virtual machines), bare metal, OpenStack clusters, and other underlying application platforms.
Docker is commonly used in the following scenarios:
automated packaging and publishing of web applications;
automated testing and continuous integration, release;
Deploy and tune databases or other background applications in a service-oriented environment;
Build your own PaaS environment by compiling or extending your existing OpenShift or Cloud Foundry platform from scratch.

I.4 Kubernetes

Kubernetes is a container cluster management system and an open-source platform that can realize functions such as automated deployment, automatic scaling, and maintenance of container clusters.
With Kubernetes you can:
Deploy applications quickly
Scale your applications quickly
Seamlessly integrate with new application functions
Save resources and optimize the use of hardware resources

I.5 MQ

When factors such as the speed or stability of "production" and "consumption" are inconsistent in the system, message queues are needed as an abstraction layer to bridge the differences between the two sides. A message is a unit of data that is transmitted between two computers. Messages can be very simple, such as containing only text strings; It can also be more complex and may contain embedded objects. Messages are sent to queues, which are containers that hold messages during transmission.

Uncoupled
Redundancy
Scalability
Flexibility & peak throughput
Recoverability
Delivery guaranteed
Sorting guaranteed
Buffering
Understand data flows
Asynchronous communication

I.6 SQL DB

A database is a warehouse built on computer storage devices that organizes, stores, and manages data according to a data structure.
To put it simply, it can be regarded as an electronic file cabinet - a place where electronic files are stored, and users can add, intercept, update, delete and other operations on the data in the file.
In the daily work of economic management, it is often necessary to put some relevant data into such a "warehouse" and process it accordingly according to the needs of management.

MySQL/PostgreSQL is a representative of traditional relational databases.

HBase is a representative of Big Tables technology (row indexing, column storage).

Neo4j（http://www.neo4j.org/) is a graph database representative used to store complex and multi-dimensional graph structure data.

Redis is a NoSQL representative based on Key-Value, and there is Redis-to-go to provide storage services.

MongoDB/CouchDB is a NoSQL representative based on Document, and Couchbase is a fusion of Document/Key-Value technology.

VoltDB is a representative of NewSQL, with data consistency and good scalability, and its performance claim is dozens of times that of MySQL.

TiDB is a distributed SQL database developed by the domestic PingCAP team. Inspired by Google's F1 and Google spanner, TiDB supports features that include traditional RDBMS and NoSQL.

I.7 TICK stack

InfluxDB

Time series database tools.

Telegraf

is a data collection and storage tool. It provides many input and output plugins, such as collecting local CPU, load, network traffic, etc., and then writing them to InfluxDB or Kafka.

Chronograf

Drawing tools

Kapacitor

Kapacitor is an alarm tool from InfluxData, which reads data from InfluxDB and configures TickScript according to the DLS type to alert.

I.8 Keepalived

Keepalived is a service software in cluster management that ensures high cluster availability, similar to heartbeat, to prevent single points of failure.

keepalilived is based on the VRRP protocol, which stands for Virtual Router Redundancy Protocol, that is, the virtual routing redundancy protocol.

Virtual route redundancy protocol, which can be regarded as a protocol to achieve high availability of routers, that is, N routers that provide the same functions form a router group, this group has a master and multiple backups, and there is a VIP on the master that provides services to the outside world (the default route of other machines in the LAN where the router is located is the VIP), the master will send a multicast, and when the backup cannot receive the VRRP packet, it is considered that the master is down. At this time, you need to elect a backup as the master according to the priority of the VRRP. This will ensure high availability of the router.

Keepalived has three main modules, namely Core, Check, and VRRP. The core module is the core of keepalived, which is responsible for starting and maintaining the main process, as well as loading and parsing global configuration files. check is responsible for health checks, including various common examination methods. The VRRP module is designed to implement the VRRP protocol

I.9 Harbor

Harbor is an enterprise-grade registry server for storing and distributing Docker images.

I.10 Ignite / Redis

The Apache Ignite In-Memory Data Organization Framework is a high-performance, integrated, and distributed in-memory computing and transactional platform for large-scale dataset processing with higher performance than traditional disk- or flash-based technologies, while also providing high-performance, distributed in-memory data organization management between applications and different data sources.

serial number	Compare projects	Apache Ignite	Redis
1	JCache (JSR 107)	Ignite is fully compatible with the JCache (JSR107) caching specification	Not supported
2	ACID transactions	Ignite fully supports ACID transactions, including optimistic and pessimistic concurrency models as well as READ_COMMITTED, REPEATABLE_READ, and SERIALIZABLE isolation levels.	Redis provides limited support for client-side optimistic transactions, which require the client to manually retry the transaction in the case of concurrent updates.
3	Data partitioning	Ignite supports partitioned caching, similar to a distributed hash, where each node in the cluster stores a portion of the data, and Ignite automatically rebalances the data in the event of a topology change.	Redis does not provide partitioning, but does provide sharding of replicas, which is very rigid to use and requires a series of rather complex manual steps whenever the topology changes, both client and server.
4	Full copy	Ignite supports cached replication, supported by every key-value pair for every node in the cluster.	Redis does not provide direct support for full replication.
5	native object	Ignite allows users to use their own domain object model and provides native support for any Java/Scala, C++, and .NET/C# data type (object), allowing users to easily store any program and domain object in the Ignite cache.	Redis does not allow users to use custom data types, only supports predefined collections of basic data structures, such as Set, List, Array, and a few others.
6	Client-side (near) cache	Ignite provides direct support for client-side caching of recently accessed data.	Not supported
7	(server side) juxtaposition	Ignite supports direct execution of any Java, C++, and .NET/C# code in a collateral manner close to the data on the server side.	Redis usually does not have any collocation capabilities, and the server side basically only supports LUA scripting language, and the server side does not directly support Java, .NET, or C++ code execution.
8	SQL queries	Ignite supports full SQL (ANSI-99) syntax to query in-memory data.	Redis does not support any query language, only the client-side caching API.
9	Continuous inquiry	Ignite provides support for both client-side and server-side persistent queries, and users can set server-side filters to reduce and lower the number of events transmitted to the client.	Redis provides support for client-side key-based event notifications, but it does not provide server-side filters, resulting in a significant increase in network traffic to update notifications on both the client and server sides.
10	Database integration	Ignite can automatically integrate external databases - RDBMS, NoSQL, and HDFS.	Redis cannot be integrated with external databases.

I.11 ELK

ELK consists of three components: Elasticsearch, Logstash, and Kibana;

Elasticsearch is an open-source distributed search engine that features distributed, zero-configuration, auto-discovery, index auto-sharding, index replica mechanism, restful style interface, multiple data sources, and auto-search payload.

Logstash is a completely open-source tool that collects, analyzes, and stores your logs for later use

Kibana is an open-source and free tool that provides Logstash and ElasticSearch with a log analytics-friendly web interface that can help you aggregate, analyze, and search important data logs.

I.12 Kong(Nginx)

Kong is a highly available, easy-to-extend API project written on Nginx_Lua modules and is open source by Mashape. Since Kong is based on Nginx, it can scale multiple Kong servers horizontally and distribute requests evenly to each server through a front-loading configuration to cope with a large number of network requests.

Kong has three main components:

KongServer: A server based on nginx that receives API requests.

ApacheCassandra/PostgreSQL: Used to store operational data.

Kongdashboard: Officially recommended UI management tool, of course, you can also use restfull to manage adminapi.

Kong uses a plugin mechanism for functional customization, where a set of plugins (which can be 0 or n) is executed during the lifecycle of the API request response loop. The plugin is written in Lua and currently has several basic functions: HTTP basic authentication, key authentication, CORS (Cross-originResourceSharing), TCP, UDP, file logging, API request throttling, request forwarding, and nginx monitoring.

I.13 Openstack

OpenStack+KVM

OpenStack: Open source management project

OpenStack is an open source project that aims to provide software for the construction and management of public and private clouds. It is not a piece of software, but a combination of several main components to do some specific work. OpenStack is made up of the following five relatively independent components:

OpenStackCompute (Nova) is a set of controllers used for virtual machine computing or to launch virtual machine instances using groups;

OpenStack Image Service (Glance) is a virtual machine image search and retrieval system that realizes virtual machine image management.

OpenStack Object Storage (Swift) is an object-based storage system for large-scale systems with built-in redundancy and fault tolerance, similar to Amazon S3.

OpenStackKeystone for user identity services and resource management, as well as

OpenStackHorizon, a Django-based dashboard interface, is a graphical management front-end.

Originally developed by NASA and Rackspace in late 2010, this open-source project aims to create an easy-to-deploy, feature-rich, and scalable cloud computing platform. The first task of the OpenStack project is to simplify the deployment process of the cloud and bring good scalability to it, in an attempt to become the operating system of the data center, that is, the cloud operating system.

KVM: Open virtualization technology

KVM (Kernel-based VirtualMachine) is an open-source system virtualization module that requires hardware support, such as IntelVT technology or AMDV technology, which is fully virtualized based on hardware and is fully built into Linux.

In 2008, Red Hat acquired Qumranet to acquire KVM technology and promote it as part of its virtualization strategy, supporting KVM as the only hypervisor when RHEL 6 was released in 2011. KVM focuses on high performance, scalability, high security, and low cost.

I.14 Disconf

Focusing on various "common components" and "common platforms" of "distributed system configuration management", we provide unified "configuration management services".

I.15 Apollo

Apollo is a configuration management platform developed by Ctrip's framework department, which can centrally manage the configuration of different environments and clusters of applications, and can be pushed to the application side in real time after configuration modification, and has standardized permissions, process governance and other characteristics.

The server is developed based on SpringBoot and SpringCloud, and can be run directly after packaging without installing additional application containers such as Tomcat.

I.16 gRPC

gRPC is a high-performance, open-source, and versatile RPC framework for mobile and HTTP/2 designs. At present, C, Java and Go language versions are available, namely: grpc, grpc-java, and grpc-go. The C version supports C, C++, Node.js, Python, Ruby, Objective-C, PHP and C#.

gRPC is designed based on the HTTP/2 standard, bringing features such as bidirectional flow, flow control, head compression, and multiplexing requests on a single TCP connection. These features make it perform better on mobile devices, saving power and space.

I.17 Canal

canal is an open source project under Alibaba, purely Java development. Based on database incremental log parsing, it provides incremental data subscription and consumption, and currently mainly supports MySQL (also supports mariaDB).

Services based on log incremental subscription & consumption support:

Database mirroring
Real-time database backup
Multi-level indexing (sellers and buyers separate indexes)
searchbuild
Service cache refresh
Important business news such as price changes

I.18 Sparkstreaming

SparkStreaming is an extension of the Spark core API that enables high-throughput, fault-tolerant processing of real-time streaming data. Support for data acquisition from a variety of data sources, including Kafk, Flume, Twitter, ZeroMQ, Kinesis, and TCPsockets, and after fetching data from data sources, complex algorithms can be processed using advanced functions such as map, reduce, join, and window.

I.19 SonarQube

Sonar is an open-source platform for code quality management to manage the quality of source code and detect code quality from seven dimensions

Through the form of plug-ins, it can support code quality management and detection in more than 20 programming languages, including java, C#, C/C++, PL/SQL, Cobol, JavaScrip, Groovy, etc

I.20 DataX

DataX is an offline synchronization tool for heterogeneous data sources, dedicated to achieving stable and efficient data synchronization between various heterogeneous data sources, including relational databases (MySQL, Oracle, etc.), HDFS, Hive, ODPS, HBase, FTP, and more.

I.21 Zen Management/Jira

Zen function
1) Product management: products, requirements, plans, releases, roadmaps, and other functions.
2) Project management: projects, tasks, teams, builds, burndown charts, and other functions.
3) Quality management: bugs, test cases, test tasks, test results and other functions.
4) Document management: product document library, project document library, custom document library and other functions.
5) Transaction management: todo management, personal affairs management functions such as My Tasks, My Bugs, My Needs, and My Projects.
6) Organization and management: departments, users, groups, permissions and other functions.
7) Statistical function: rich statistical table.
8) Search function: Find the corresponding data through search.

JIRA features
1) Problem tracking and management (problem types include NewFeature, Bug, Task, and Improvement);
2) Analysis report of problem follow-up;
3) Project category management function;
4) component/module leader function;
5) Project email address function;
6) Unlimited workflows.

I.22 XXJOB

XXL-JOB is a lightweight distributed task scheduling framework with the core design goals of rapid development, simple learning, lightweight, and easy to scale.

Simple: Support CRUD operation on tasks through web pages, simple operation, one minute to get started;
Dynamic: Supports dynamic modification of task status, pause/resume tasks, and termination of running tasks, which take effect immediately.
Dispatch Center HA (centralized): The scheduling is designed centrally, and the "Dispatch Center" is based on the cluster Quartz and supports cluster deployment, which can ensure the HA of the dispatch center.
Executor HA (distributed): Tasks are executed in a distributed manner, and the task "executor" supports cluster deployment to ensure HA execution of tasks.
Registry: The Executor will automatically register tasks periodically, and the Dispatch Center will automatically discover the registered tasks and trigger their execution. At the same time, it also supports manual entry of actuator addresses;
Elastic scaling: Once a new executor machine goes online or offline, the task will be reassigned during the next scheduling.
Routing Strategies: Provides rich routing strategies when deploying an executor cluster, including: first, last, polling, random, consistent HASH, least used, most recently unused, failover, busy transfer, etc.
Failover: If Failover is selected for the task routing policy, if a machine in the executioner cluster fails, the system automatically switches to a normal executor to send a scheduling request.
failure handling strategy; The handling policies for scheduling failures include: failure alarm (default), failure retry;
Failed retry: When the dispatch center schedules fails and the Failed Retry policy is enabled, the system automatically retries once. If the executor fails to execute and the callback fails to retry the state, it will also be automatically retried.
Blocking processing strategy: the processing strategy when the scheduling is too dense for the executor to process, including single-machine serial (default), discarding subsequent scheduling, and overwriting previous scheduling;
Shard broadcast task: When the executor cluster is deployed, if the task routing policy is selected as "Shard Broadcast", a task scheduling will broadcast to trigger all executors in the cluster to execute a task, and the sharding task can be developed according to the sharding parameters.
Dynamic sharding: The sharded broadcast task is sharded based on the dimension of the executor, and supports dynamic expansion of the executor cluster to dynamically increase the number of shards and collaborate on business processing. When performing large-data-volume business operations, it can significantly improve task processing capacity and speed.
Event triggering: In addition to the cron method and the task-dependent mode to trigger task execution, event-based triggering methods are supported. The dispatch center provides API services that trigger a single execution of tasks, which can be flexibly triggered according to business events.
Task progress monitoring: Support real-time monitoring of task progress;
Rolling real-time log: supports viewing scheduling results online, and supports viewing the complete execution log output of the executor in real time in a rolling manner.
GLUE: Provides a WebIDE that supports online development of task logic code, dynamic release, real-time compilation and effectiveness, and eliminates the process of deployment and launch. Support 30 versions of historical version backwards.
Script tasks: Support developing and running script tasks in GLUE mode, including Shell, Python, NodeJS, and other scripts.
Task dependencies: Support configuring subtask dependencies, when the parent task is executed and the execution is successful, the execution of the subtask will be actively triggered, and multiple subtasks will be separated by commas.
Consistency: The Scheduling Center ensures the consistency of cluster distributed scheduling through DB locks, and only one task scheduling will trigger one execution.
Custom task parameters: Support online configuration of scheduling task parameters, which take effect immediately;
Scheduling thread pool: The scheduling system triggers scheduling operation with multiple threads to ensure that the scheduling is executed accurately and is not blocked.
Data encryption: The communication between the dispatch center and the executor is encrypted to improve the security of dispatch information.
Email alarm: Support email alarm when a task fails, and support configuring multiple email addresses to send alarm emails in bulk.
Push maven central repository: The latest stable version will be pushed to maven central repository to facilitate user access and use.
Run report: Supports real-time viewing of running data, such as the number of tasks, the number of schedules, the number of executors, etc. and scheduling reports, such as scheduling date distribution chart, scheduling success distribution map, etc.;
Full asynchronous: The bottom layer of the system realizes full asynchronous, and the traffic peak shaving is carried out for dense scheduling, which theoretically supports the operation of tasks of any duration.

I.23 Salt stack

A new way of infrastructure management, easy to deploy, can run in minutes, good scalability, easy to manage tens of thousands of servers, fast enough, communication between servers in seconds.

The underlying layer of salt adopts a dynamic connection bus that can be used for orchestration, remote execution, configuration management, etc.

I.24 Istio

As a cutting-edge project for microservice service aggregation layer management, Istio is the first joint open source project of Google, IBM, and Lyft (an overseas ride-sharing company and Uber's rival), providing a unified solution, security, management and monitoring of microservices.

The first beta is currently for Kubernetes environments, and the community claims that support for virtual machines and other environments such as CloudFoundry will be added in the coming months. Istio adds traffic management to microservices and creates a foundation for value-added functions such as security, monitoring, routing, connection management, and policies.

automatic load balancing for HTTP, gRPC and TCP network traffic;
It provides rich routing rules to achieve fine-grained network traffic behavior control.
traffic encryption, inter-service authentication, and strong identity claims;
Fleet-wide policy enforcement;
Deep telemetry and reporting.

Architecture

Safety

Basics

Salt Stack + OpenStack + KVM + Kubernetes + Istio

dylan · Posted on 7/17/2019 1:41:46 PM

Thank you for sharing, learning...

Luo Xiaofang · Posted on 11/27/2019 10:56:12 AM

The amount of knowledge is a bit large

Science popularization tools for large-scale architectures

Related Posts

Sections viewed