This article is a mirror article of machine translation, please click here to jump to the original article.

View: 1800|Reply: 0

The basic principles of deep learning DBNet

[Copy link]
Posted on 2025-1-19 12:26:21 | | | |
Original link:The hyperlink login is visible.
Original code link:The hyperlink login is visible.
Reproduction is better:The hyperlink login is visible.

At present, text detection can be roughly divided into two categories: regression-based methods and segmentation-based methods. The general method process based on segmentation is shown in the blue arrow in the figure below: first, the text segmentation result of the image is output through the network (probability graph, whether each pixel is a positive sample), the preset threshold is used to convert the segmentation result graph into a binary plot, and finally some aggregation operations such as connecting domains are used to convert pixel-level results into detection results.



From the above description, it can be seen that because there is an operation that uses thresholds to determine the foreground and background, this operation is indifferentiable, so it is not possible to use the network to put this part of the process into the network for training. The process is shown by the red arrow in the image above.

1. Network structure

The network structure in this paper is shown in the following figure, during the training process, after the picture is input into the network, the blue feature map in the above figure is called F after feature extraction and upsampling fusion and concat operation, and then the probability map (probability map) is predicted by F called P and the threshold map (threshold map) is predicted by F is called T, and finally the approximate binary map B ^ is calculated through P and T. The inference process text box can be obtained by approximate binary graph or probability graph.



2. Binary


2.1 Binarization of standards



2.2 Differentiable binarization


The above binarization method is not differentiable, so it cannot be optimized in network learning. To solve this problem, this paper proposes an approximate step function:



The output of the above equation B ^ represents the approximate binary graph, T is the threshold graph of network learning, and k is a factor, and this paper is set to 50. The diagram of this function is very similar to the step function above, as shown in the figure A in the figure below.



3. Adaptive threshold

The above describes how to binary P into an approximate binary graph B ^ after obtaining the probability graph P and the threshold graph T. This section explains how to get the labels of Probability P, Threshold T, and Binary Graph B^.

3.1 Deformation convolution

Because large receptive fields may be required, the article applies deformation convolution to a network of ResNet-18 or ResNet-50.



loss function

The formula for the loss function used in the text is as follows:



deduce




Original:The hyperlink login is visible.




Previous:Angular 18 series (thirty) creates a workspace
Next:Convert the paddle model to the ONNX model format
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com