TY - GEN
T1 - Utilising OpenCV with Tesseract to extract Bill of Materials (BOM) from Isometric Drawings
AU - Meehan, Kevin
AU - McShane, Jack
AU - McClay, Stephen
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6/10
Y1 - 2021/6/10
N2 - Quality assurance is often a time-consuming and error prone process for organisations. However, it is increasingly important for companies that produce fabricated products for integration into safety critical environments. For example, creating pipe systems for the pharmaceutical industry will include additional risks. As a result, increased regulation is required, which has resulted in further paperwork and validation for companies operating in this sector.A lot of the isometric drawings provided to companies for fabrication remain in paper format (or scanned paper documents). This provides an administrative burden on these companies as the average project could generate up to 5, 000 isometric drawings. This research explores techniques that could be utilised to automatically extract Bill of Materials (BOM) information from these isometric drawings.Tesseract has failed to perform OCR accurately on the extracted Region of Interest (ROI) data containing the BOM information, achieving a mean average of 43.8%. This paper explores different pre-processing techniques to increase the accuracy of recognition. Techniques such as binarisation, erosion, noise reduction and contouring were employed to increase this accuracy. In the study, the accuracy increased to a mean average of 81.2%. This has demonstrated that effective use of pre-processing can have an impact on character recognition.
AB - Quality assurance is often a time-consuming and error prone process for organisations. However, it is increasingly important for companies that produce fabricated products for integration into safety critical environments. For example, creating pipe systems for the pharmaceutical industry will include additional risks. As a result, increased regulation is required, which has resulted in further paperwork and validation for companies operating in this sector.A lot of the isometric drawings provided to companies for fabrication remain in paper format (or scanned paper documents). This provides an administrative burden on these companies as the average project could generate up to 5, 000 isometric drawings. This research explores techniques that could be utilised to automatically extract Bill of Materials (BOM) information from these isometric drawings.Tesseract has failed to perform OCR accurately on the extracted Region of Interest (ROI) data containing the BOM information, achieving a mean average of 43.8%. This paper explores different pre-processing techniques to increase the accuracy of recognition. Techniques such as binarisation, erosion, noise reduction and contouring were employed to increase this accuracy. In the study, the accuracy increased to a mean average of 81.2%. This has demonstrated that effective use of pre-processing can have an impact on character recognition.
KW - Computer Vision
KW - OCR
KW - Pre-Processing
KW - Tesseract
KW - Text Extraction
UR - http://www.scopus.com/inward/record.url?scp=85114444448&partnerID=8YFLogxK
U2 - 10.1109/ISSC52156.2021.9467854
DO - 10.1109/ISSC52156.2021.9467854
M3 - Conference contribution
AN - SCOPUS:85114444448
T3 - 2021 32nd Irish Signals and Systems Conference, ISSC 2021
BT - 2021 32nd Irish Signals and Systems Conference, ISSC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd Irish Signals and Systems Conference, ISSC 2021
Y2 - 10 June 2021 through 11 June 2021
ER -