This article is a mirror article of machine translation, please click here to jump to the original article.

View: 6755|Reply: 3

[Source] .NET/C# uses Redis to implement the Bloom algorithm based on BitMap

[Copy link]
Posted on 1/2/2023 5:37:01 PM | | | |
Requirements: Recently, I saw a video of the Redis Bloom algorithm on Bilibili to solve the problem of cache penetration, simply put, add a layer of logical judgment before accessing the database to determine whether the data exists, and if so, access the database. For example, if a website is a news system, the URLs of articles are generated by self-incrementing primary key IDs (URL format example:/news-1.html), the website may only have tens of thousands of articles and caches.

Originally:

Request news resource -> Determine if the cache exists -> Presence -> The cache returns data.
Request news resources -> Determine if the cache exists -> does not exist -> Query from the database -> Present -> Cache and return data.
Request a news resource -> Determine if the cache exists -> does not exist -> Query from the database -> Does not exist -> Returns a 404 error.

Right now:

Request news resource -> Bloom's algorithm -> Existence -> Follow the original logic.
Request news resource -> Bloom's algorithm -> does not exist -> returns a 404 error directly.

BloomFilter

BloomFilter algorithm is a big data scheduling algorithm. In a set with a large amount of data, it can be accurately determined that an object is not in the set; It is possible to judge an object in a set and take up little space. itIt is not suitable for situations that require high accuracy and zero errors。 Efficient use of space is achieved by sacrificing partial accuracy.

Bloom's algorithm is a method based onSacrifice a certain accuracy in exchange for a filtering algorithm with low memory consumption, which can realize the filtering, deduplication and other operations of a large amount of data.

The Bloom algorithm is just an abstract concept and can be implemented in many ways, and the use of BitMap in Redis in the article is just a simple implementation.

Reference:The hyperlink login is visible.

BitMap introduction

BitMap is a bitmap, which is actually a byte array, represented in binary.There are only two numbers, 0 and 1, bitmap is to use each binary bit to store or mark the value corresponding to an element. It is usually used to determine whether a certain data exists or not, because it is stored in bits, so Bitmap itself will greatly save storage space.

As shown in the figure below, the string is stored in binary form in the computer.



BitMap data types in Redis

The data type provided by Redis is BitMap, and each bit corresponds to two states: 0 and 1. Although the internal storage is still in the String type, Redis provides some instructions for directly manipulating BitMap, which can be thought of as a bit array, and the subscript of the array is the offset.

Its advantages are:Low memory overhead and high efficiencyAnd the operation is simple.

Space-saving: A bit is used to represent the value or state of an element, where key is the value of the corresponding element. In fact, 8 bits can make up a byte, so it is space-saving.
High efficiency: The time complexity of setbit and getbit is O(1), and the efficiency of other bits is also high.

Here's an example of using the set collection and BitMap storage:

data typeEach userid takes up spaceThe amount of users that need to be storedAll occupies memory
set32 bits is 4 bytes (assuming userid uses integers, many websites actually use long integers)50,000,00032 bits * 50,000,000 = 200 MB
BitMap1 bit100,000,0001 bit * 100,000,000 = 12.5 MB


Time is stretching a little

One dayOne monthOne year
set200M6G72G
BitMap12.5M375M4.5G


After calculation, it was found that as time increased, the amount of data to be recorded increased, and the contrast became more obvious, and BitMap took up less space than set.

Redis provides the following instructions for operating BitMap:

commandillustrateAvailable versionsTime complexity
The hyperlink login is visible.Set or clear the bits on the specified offset for the string value stored in the key.>= 2.2.0O(1)
The hyperlink login is visible.For the string value stored in the key, obtain the bits on the specified offset.>= 2.2.0O(1)
The hyperlink login is visible.Counts the number of bits in a given string that are set to 1.>= 2.6.0O(N)
The hyperlink login is visible.Returns the position of the binary bit in the bitmap where the first value is bit.>= 2.8.7O(N)
The hyperlink login is visible.Bit manipulation on one or more string keys that hold binary bits.>= 2.6.0O(N)
The hyperlink login is visible.The BITFIELD command can operate on multiple bitranges simultaneously in a single call.>= 3.2.0O(1)


Command documentation:The hyperlink login is visible.

Now that you have a brief understanding of the algorithm and the Bitmap features and syntax of Redis, let's use redis to do a simple operation.

SETBIT syntax:SETBIT key offset value

Set the article id:9, 10, 156 to 1, and the command is as follows:

GETBIT syntax: GETBIT key offset

To determine whether id: 10 or 11 exists, the command is as follows:




.NET/C# manipulates Redis's BitMap type

We learned about several BitMap commands in redis and how to operate them programmatically. Create a new .NET 3.1 console project, reference the StackExchange.Redis package, and use the following command:

The source code is as follows:



There are many other bitmap application scenarios for Redis, as follows:

  • It can be used as a simple Bloom filter to determine whether a user has performed certain actions.
  • Statistics of user daily activity, monthly activity, and retention rate
  • Realize the statistics of the number of user launches
  • User online presence and people statistics

(End)




Previous:Virtual actors: Dapr vs Orleans
Next:Alibaba Cloud SLB Load Balancing 503 Error Resolution
 Landlord| Posted on 1/2/2023 5:41:56 PM |
Posted on 1/2/2023 8:42:47 PM |
I have learned, thank you, and gained knowledge
Posted on 1/6/2023 8:34:22 PM |
Learn to learn
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com