久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

從表格圖像中提取單個字段以使用 OCR 進行 Exce

Extract individual field from table image to excel with OCR(從表格圖像中提取單個字段以使用 OCR 進行 Excel)
本文介紹了從表格圖像中提取單個字段以使用 OCR 進行 Excel的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

我已經掃描了具有如下圖所示表格的圖像:

I have scanned images which have tables as shown in this image:

我正在嘗試分別提取每個框并執行 OCR,但是當我嘗試檢測水平線和垂直線然后檢測框時,它會返回以下圖像:

I am trying to extract each box separately and perform OCR but when I try to detect horizontal and vertical lines and then detect boxes it's returning the following image:

當我嘗試執行其他轉換來檢測文本(腐蝕和擴張)時,仍然會出現一些剩余的線條,如下所示:

And when I try to perform other transformations to detect text (erode and dilate) some remains of lines are still coming along with text like below:

我無法檢測到僅用于執行 OCR 的文本,并且未生成正確的邊界框,如下所示:

I cannot detect text only to perform OCR and proper bounding boxes aren't being generated like below:

我無法使用實線得到清晰分隔的框,我已經在一個用paint(如下所示)編輯的圖像上嘗試了這個以添加數字并且它有效.

I cannot get clearly separated boxes using real lines, I've tried this on an image that was edited in paint(as shown below) to add digits and it works.

我不知道我做錯了哪一部分,但如果有什么我應該嘗試或更改/添加我的問題,請告訴我.

I don't know which part I'm doing wrong but if there's anything I should try or maybe change/add in my question please please tell me.

#Loading all required libraries 
%pylab inline
import cv2
import numpy as np 
import pandas as pd
import pytesseract
import matplotlib.pyplot as plt
import statistics
from time import sleep
import random

img = cv2.imread('images/scan1.jpg',0)

# for adding border to an image
img1= cv2.copyMakeBorder(img,50,50,50,50,cv2.BORDER_CONSTANT,value=[255,255])

# Thresholding the image
(thresh, th3) = cv2.threshold(img1, 255, 255,cv2.THRESH_BINARY|cv2.THRESH_OTSU)

# to flip image pixel values
th3 = 255-th3

# initialize kernels for table boundaries detections
if(th3.shape[0]<1000):
    ver = np.array([[1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1]])
    hor = np.array([[1,1,1,1,1,1]])

else:
    ver = np.array([[1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1]])
    hor = np.array([[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]])




# to detect vertical lines of table borders
img_temp1 = cv2.erode(th3, ver, iterations=3)
verticle_lines_img = cv2.dilate(img_temp1, ver, iterations=3)

# to detect horizontal lines of table borders
img_hor = cv2.erode(th3, hor, iterations=3)
hor_lines_img = cv2.dilate(img_hor, hor, iterations=4)

# adding horizontal and vertical lines
hor_ver = cv2.add(hor_lines_img,verticle_lines_img)

hor_ver = 255-hor_ver

# subtracting table borders from image
temp = cv2.subtract(th3,hor_ver)

temp = 255-temp

#Doing xor operation for erasing table boundaries
tt = cv2.bitwise_xor(img1,temp)

iii = cv2.bitwise_not(tt)

tt1=iii.copy()

#kernel initialization
ver1 = np.array([[1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1]])
hor1 = np.array([[1,1,1,1,1,1,1,1,1,1],
               [1,1,1,1,1,1,1,1,1,1]])

#morphological operation
temp1 = cv2.erode(tt1, ver1, iterations=2)
verticle_lines_img1 = cv2.dilate(temp1, ver1, iterations=1)

temp12 = cv2.erode(tt1, hor1, iterations=1)
hor_lines_img2 = cv2.dilate(temp12, hor1, iterations=1)

# doing or operation for detecting only text part and removing rest all
hor_ver = cv2.add(hor_lines_img2,verticle_lines_img1)
dim1 = (hor_ver.shape[1],hor_ver.shape[0])
dim = (hor_ver.shape[1]*2,hor_ver.shape[0]*2)

# resizing image to its double size to increase the text size
resized = cv2.resize(hor_ver, dim, interpolation = cv2.INTER_AREA)

#bitwise not operation for fliping the pixel values so as to apply morphological operation such as dilation and erode
want = cv2.bitwise_not(resized)

if(want.shape[0]<1000):
    kernel1 = np.array([[1,1,1]])
    kernel2 = np.array([[1,1],
                        [1,1]])
    kernel3 = np.array([[1,0,1],[0,1,0],
                       [1,0,1]])
else:
    kernel1 = np.array([[1,1,1,1,1,1]])
    kernel2 = np.array([[1,1,1,1,1],
                        [1,1,1,1,1],
                        [1,1,1,1,1],
                        [1,1,1,1,1]])

tt1 = cv2.dilate(want,kernel1,iterations=2)

# getting image back to its original size
resized1 = cv2.resize(tt1, dim1, interpolation = cv2.INTER_AREA)

# Find contours for image, which will detect all the boxes
contours1, hierarchy1 = cv2.findContours(resized1, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

#function to sort contours by its x-axis (top to bottom)
def sort_contours(cnts, method="left-to-right"):
    # initialize the reverse flag and sort index
    reverse = False
    i = 0

    # handle if we need to sort in reverse
    if method == "right-to-left" or method == "bottom-to-top":
        reverse = True

    # handle if we are sorting against the y-coordinate rather than
    # the x-coordinate of the bounding box
    if method == "top-to-bottom" or method == "bottom-to-top":
        i = 1

    # construct the list of bounding boxes and sort them from top to
    # bottom
    boundingBoxes = [cv2.boundingRect(c) for c in cnts]
    (cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
        key=lambda b:b[1][i], reverse=reverse))

    # return the list of sorted contours and bounding boxes
    return (cnts, boundingBoxes)


#sorting contours by calling fuction
(cnts, boundingBoxes) = sort_contours(contours1, method="top-to-bottom")

#storing value of all bouding box height
heightlist=[]
for i in range(len(boundingBoxes)):
    heightlist.append(boundingBoxes[i][3])

#sorting height values
heightlist.sort()

sportion = int(.5*len(heightlist))
eportion = int(0.05*len(heightlist))

#taking 50% to 95% values of heights and calculate their mean 
#this will neglect small bounding box which are basically noise 
try:
    medianheight = statistics.mean(heightlist[-sportion:-eportion])
except:
    medianheight = statistics.mean(heightlist[-sportion:-2])

#keeping bounding box which are having height more then 70% of the mean height and deleting all those value where 
# ratio of width to height is less then 0.9
box =[]
imag = iii.copy()
for i in range(len(cnts)):    
    cnt = cnts[i]
    x,y,w,h = cv2.boundingRect(cnt)
    if(h>=.7*medianheight and w/h > 0.9):
        image = cv2.rectangle(imag,(x+4,y-2),(x+w-5,y+h),(0,255,0),1)
        box.append([x,y,w,h])
    # to show image

###Now we have badly detected boxes image as shown

推薦答案

你在正確的軌道上.這是您的方法的延續,稍作修改.這個想法是:

You're on the right track. Here's a continuation of your approach with slight modifications. The idea is:

  1. 獲取二值圖像.加載圖像,轉灰度,大津閾值.

  1. Obtain binary image. Load image, convert to grayscale, and Otsu's threshold.

刪除所有字符文本輪廓.我們創建一個矩形內核并執行打開以僅保留水平/垂直線.這將有效地使文本變成微小的噪聲,因此我們找到輪廓并使用輪廓區域進行過濾以去除它們.

Remove all character text contours. We create a rectangular kernel and perform opening to only keep the horizontal/vertical lines. This will effectively make the text into tiny noise so we find contours and filter using contour area to remove them.

修復水平/垂直線并提取每個 ROI. 我們變形接近修復和斷線并平滑表格.從這里我們使用帶有 top-to-bottom 參數的 imutils.sort_contours() 對框域輪廓進行排序.接下來我們找到輪廓并使用輪廓區域進行過濾,然后提取每個 ROI.

Repair horizontal/vertical lines and extract each ROI. We morph close to fix and broken lines and smooth the table. From here we sort the box field contours using imutils.sort_contours() with the top-to-bottom parameter. Next we find contours and filter using contour area then extract each ROI.

<小時>

這是每個框字段和提取的 ROI 的可視化


Here's a visualization of each box field and the extracted ROI

代碼

import cv2
import numpy as np
from imutils import contours

# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Remove text characters with morph open and contour filtering
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
cnts = cv2.findContours(opening, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 500:
        cv2.drawContours(opening, [c], -1, (0,0,0), -1)

# Repair table lines, sort contours, and extract ROI
close = 255 - cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel, iterations=1)
cnts = cv2.findContours(close, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
for c in cnts:
    area = cv2.contourArea(c)
    if area < 25000:
        x,y,w,h = cv2.boundingRect(c)
        cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), -1)
        ROI = original[y:y+h, x:x+w]

        # Visualization
        cv2.imshow('image', image)
        cv2.imshow('ROI', ROI)
        cv2.waitKey(20)

cv2.imshow('opening', opening)
cv2.imshow('close', close)
cv2.imshow('image', image)
cv2.waitKey()

這篇關于從表格圖像中提取單個字段以使用 OCR 進行 Excel的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

相關文檔推薦

How to draw a rectangle around a region of interest in python(如何在python中的感興趣區域周圍繪制一個矩形)
How can I detect and track people using OpenCV?(如何使用 OpenCV 檢測和跟蹤人員?)
How to apply threshold within multiple rectangular bounding boxes in an image?(如何在圖像的多個矩形邊界框中應用閾值?)
How can I download a specific part of Coco Dataset?(如何下載 Coco Dataset 的特定部分?)
Detect image orientation angle based on text direction(根據文本方向檢測圖像方向角度)
Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 檢測圖像中矩形的中心和角度)
主站蜘蛛池模板: 久久精品色欧美aⅴ一区二区 | 精品一区av | 久久久久久久久久久爱 | 免费一区二区三区 | 国产精品久久久久久一级毛片 | 欧美在线一区二区三区 | 亚洲福利视频网 | h片在线观看网站 | 岛国av免费看 | 欧美成人a∨高清免费观看 色999日韩 | 久久国产精品精品国产色婷婷 | 丝袜久久 | 日韩中文一区 | 欧美在线国产精品 | 欧美精品一二三区 | 亚洲人人 | 日日夜精品视频 | 国产成人a亚洲精品 | 久久草视频 | 久久影音先锋 | 97国产超碰 | 久久久久se | 蜜桃视频在线观看免费视频网站www | 国产在线www| 精品一区二区三区四区 | 亚洲另类视频 | 国产人成精品一区二区三 | 亚洲网址在线观看 | 亚洲一区二区久久 | 五月天激情综合网 | 亚洲欧美中文日韩在线v日本 | 天天爱天天操 | 男人天堂网址 | 国产精品久久九九 | 久久青草av | 91在线一区二区三区 | 免费一区在线观看 | 一级大片免费 | 亚洲精品一区在线 | 亚洲国产视频一区 | 久久蜜桃av|