Requirements: Recently, I saw a video of the Redis Bloom algorithm on Bilibili to solve the problem of cache penetration, simply put, add a layer of logical judgment before accessing the database to determine whether the data exists, and if so, access the database. For example, if a website is a news system, the URLs of articles are generated by self-incrementing primary key IDs (URL format example:/news-1.html), the website may only have tens of thousands of articles and caches.
Originally:
Request news resource -> Determine if the cache exists -> Presence -> The cache returns data. Request news resources -> Determine if the cache exists -> does not exist -> Query from the database -> Present -> Cache and return data. Request a news resource -> Determine if the cache exists -> does not exist -> Query from the database -> Does not exist -> Returns a 404 error.
Right now:
Request news resource -> Bloom's algorithm -> Existence -> Follow the original logic. Request news resource -> Bloom's algorithm -> does not exist -> returns a 404 error directly.
BloomFilter
BloomFilter algorithm is a big data scheduling algorithm. In a set with a large amount of data, it can be accurately determined that an object is not in the set; It is possible to judge an object in a set and take up little space. itIt is not suitable for situations that require high accuracy and zero errors。 Efficient use of space is achieved by sacrificing partial accuracy.
Bloom's algorithm is a method based onSacrifice a certain accuracy in exchange for a filtering algorithm with low memory consumption, which can realize the filtering, deduplication and other operations of a large amount of data.
The Bloom algorithm is just an abstract concept and can be implemented in many ways, and the use of BitMap in Redis in the article is just a simple implementation.
Reference:The hyperlink login is visible.
BitMap introduction
BitMap is a bitmap, which is actually a byte array, represented in binary.There are only two numbers, 0 and 1, bitmap is to use each binary bit to store or mark the value corresponding to an element. It is usually used to determine whether a certain data exists or not, because it is stored in bits, so Bitmap itself will greatly save storage space.
As shown in the figure below, the string is stored in binary form in the computer.
BitMap data types in Redis
The data type provided by Redis is BitMap, and each bit corresponds to two states: 0 and 1. Although the internal storage is still in the String type, Redis provides some instructions for directly manipulating BitMap, which can be thought of as a bit array, and the subscript of the array is the offset.
Its advantages are:Low memory overhead and high efficiencyAnd the operation is simple.
Space-saving: A bit is used to represent the value or state of an element, where key is the value of the corresponding element. In fact, 8 bits can make up a byte, so it is space-saving. High efficiency: The time complexity of setbit and getbit is O(1), and the efficiency of other bits is also high.
Here's an example of using the set collection and BitMap storage:
| data type | Each userid takes up space | The amount of users that need to be stored | All occupies memory | | set | 32 bits is 4 bytes (assuming userid uses integers, many websites actually use long integers) | 50,000,000 | 32 bits * 50,000,000 = 200 MB | | BitMap | 1 bit | 100,000,000 | 1 bit * 100,000,000 = 12.5 MB |
Time is stretching a little
| | One day | One month | One year | | set | 200M | 6G | 72G | | BitMap | 12.5M | 375M | 4.5G |
After calculation, it was found that as time increased, the amount of data to be recorded increased, and the contrast became more obvious, and BitMap took up less space than set.
Redis provides the following instructions for operating BitMap:
Command documentation:The hyperlink login is visible.
Now that you have a brief understanding of the algorithm and the Bitmap features and syntax of Redis, let's use redis to do a simple operation.
SETBIT syntax:SETBIT key offset value
Set the article id:9, 10, 156 to 1, and the command is as follows:
GETBIT syntax: GETBIT key offset
To determine whether id: 10 or 11 exists, the command is as follows:
.NET/C# manipulates Redis's BitMap type
We learned about several BitMap commands in redis and how to operate them programmatically. Create a new .NET 3.1 console project, reference the StackExchange.Redis package, and use the following command:
The source code is as follows:
There are many other bitmap application scenarios for Redis, as follows:
- It can be used as a simple Bloom filter to determine whether a user has performed certain actions.
- Statistics of user daily activity, monthly activity, and retention rate
- Realize the statistics of the number of user launches
- User online presence and people statistics
(End)
|