Time Series Database (TSDB) is a brief introduction to summarization

Little scum · Posted on13 seconds ago

Application scenarios

A Time Series Database (TSDB) is a database optimized for processing time-stamped continuous data streams such as IoT sensor readings, server metrics, financial transactions. It is specially designed for high-frequency writing of massive data and fast aggregation and querying based on the time dimension.

In the era of the Internet of Everything, the amount of data generated by the Industrial Internet of Things is thousands or even tens of thousands of times more than that of traditional informatization, and it is real-time collection, high frequency, high density, and the dynamic data model is changeable at any time. Traditional databases are stretched thin in storing, querying, analyzing and other processing operations of these data, and there is an urgent need for a database system optimized for time series data, that is, time series databases.

Time series database is a specialized database for storing and managing time series data, with the characteristics of writing more and reading less, hot and cold distinction, high concurrent writing, no transaction requirements, and continuous writing of massive data.

Characteristics of time series data

Timestamp: Each data point is timestamped, which is important for data calculation and analysis.
Structured: Unlike the massive data from web crawlers, Weibo, and WeChat, the data generated by networked devices or monitoring systems is structured. These data have predefined data types or fixed lengths, such as the current and voltage collected by smart meters, which can be expressed in a standard floating-point number of 4 bytes.
Streaming: Data sources generate data at an approximately constant rate, such as audio or video streams. These data streams are independent of each other.
Smooth and predictable traffic: Unlike data from e-commerce platforms or social media sites, the traffic of time series data is stable over time and can be calculated and predicted based on the number of data sources and sampling periods.
Immutability: Time series data is generally append-only, similar to log data, and is generally not allowed and does not need to be modified. There are few scenarios where modifications to the raw data collected are required.

Ranking

Address:The hyperlink login is visible.The latest rankings are as follows:

database

1、InfluxDB

InfluxDB is an open-source distributed time-series, event, and metric database written in Go language without external dependencies. The database is now primarily used to store large amounts of timestamped data such as DevOps monitoring data, APP metrics, loT sensor data, and real-time analytics data.

As the highest-ranking open source time series database, InfluxDB supports data storage policy (RP) and data archiving (CQ), which can be queried in real time, and the data can be immediately found after being indexed when written.

2、Kdb+

Officially known as the world's fastest time series database, kdb+/q uses a unified database to process real-time data and historical data, and has functions such as CEP (complex event processing) engine, in-memory database, and disk database. The characteristics of columnar storage make it extremely convenient to perform statistical analysis of a certain column.

Compared with general databases or big data platforms, kdb+/q has faster speed and lower total cost of ownership, making it ideal for massive data processing, mainly used in massive data analysis, high-frequency trading, artificial intelligence, Internet of Things and other fields. In the financial sector, where latency is demanding, kdb+ has a unique advantage.

3、Prometheus

Prometheus is an open-source system monitoring and alarm framework created by former Google employees working at SoundCloud in 2012 and developed as a community open source project, officially released in 2015, and officially joined the Cloud Native Computing Foundation the following year.

As a new generation of monitoring framework, Prometheus has a powerful multi-dimensional data model, a variety of visual graphical interfaces, and uses pull mode to collect time series data, which can be pushed to the Prometheus server in the form of push gateway.

4、 Graphite

Graphite is an open-source real-time graphing system that displays data for time series measurements. Graphite does not collect the metrics themselves, but acts like a database that receives them through its backend and then queries, transforms, and combines them in real time.

Graphite supports a built-in web interface that allows users to browse measurement data and graphs. It consists of multiple back-end and front-end components. The back-end component is used to store numerical time series data, while the front-end component is used to obtain metric item data and render charts based on the situation.

5、TimescaleDB

TimescaleDB is the only open-source time series database that supports full SQL and is optimized for fast extraction and complex queries that support full SQL. It is based on PostgreSQL and offers the best of the NoSQL and Relational worlds for time series data.

TimescaleDB enables developers and organizations to take advantage of its capabilities even more: analyze the past, understand the present, and predict the future. Unifying time series and relational data at the query level eliminates data silos and makes demos and prototypes easier to implement. The combination of scalability and a full SQL interface empowers employees to ask data questions.

Time Series Database (TSDB) is a brief introduction to summarization

Related Posts