This article is a mirror article of machine translation, please click here to jump to the original article.

View: 56998|Reply: 2

[Source] Implement the string similarity algorithm in C#

[Copy link]
Posted on 4/23/2019 12:59:18 PM | | | |
Recently, there is a problem that needs to be dealt with, that is, to compare the string entered by the user and the string obtained by the system, if the error is not very large, the system will consider it to meet the requirements, and it is best to set a threshold.
When engaging in CAPTCHA recognition, you need to compare the similarity of character codes and use the "edit distance algorithm" to make a record of the principle and C# implementation.

According to Baidu Encyclopedia:

Edit distance, also known as Levenshtein distance (also known as edit distance), is the minimum number of edits required to switch from one to another between two strings, and the greater the distance, the more different they are. Permissioned editing operations include replacing one character with another, inserting a character, and removing a character.

For example, convert the word kitten to sitting:

sitten (k→s)

sittin (e→i)

sitting (→g)

Russian scientist Vladimir Levenshtein proposed this concept in 1965. Hence the name Levenshtein Distance.

For example

If str1="ivan", str2="ivan", then it is calculated to be equal to 0. Not converted. similarity=1-0/Math.Max(str1.length,str2.length)=1
If str1="ivan1", str2="ivan2", then it is calculated to be equal to 1. The "1" of str1 converts to "2", converts a character, so the distance is 1, similarity = 1-1/Math.Max(str1.length, str2.length) = 0.8

Application:

  • DNA analysis
  • Spell check
  • Speech recognition
  • Plagiarism detection



The algorithm is implemented in C#:

Test code:






From the test results, it is concluded thatspaceorPunctuationString positionDifferent citiesResults that affect similarityTherefore, when comparing string recognition, it is recommended to remove all spaces and special symbols in the string before calling the algorithm

Resources:The hyperlink login is visible.

On GitHub, there is also a library for C# string similarity comparisons

FuzzyString is a library developed for my daily work to coordinate naming conventions between different grid models. I've stripped the power system-specific code and put together what can be effectively used as a string extension to determine the approximate equality between the two strings. All the algorithms used here have been extracted from online sources, converted to C#, and compiled into this library. I found several other similar open source implementations that are not available for . NET / C#。 Adding *.dll to your project will give you access to this extension and the individual extensions under the ApproximatelyEquals() extension.



Address:The hyperlink login is visible.

nuget install:

Algorithms included in this project:

  • Hamming distance
  • Jaccard distance
  • Jaro distance
  • Jaro-Winkler distance
  • Levenshtein distance
  • The longest public
  • The longest common substring of the subsequence
  • Overlap coefficient
  • Ratcliff-Obershelp similarity
  • Sorensen-Dice distance
  • Tanimoto coefficient



Use:


Outcome:



(End)




Previous:The Art of Software Testing 3rd Edition pdf
Next:"Eight Competencies of Data Analysts" does not encrypt the complete course
Posted on 5/6/2019 1:11:34 PM |
Knowledge is coming
Posted on 9/7/2021 3:01:05 PM |
Is it possible to see the code after replying?
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com