Utilising OpenCV with Tesseract to extract Bill of Materials (BOM) from Isometric Drawings

Kevin Meehan, Jack McShane, Stephen McClay

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Quality assurance is often a time-consuming and error prone process for organisations. However, it is increasingly important for companies that produce fabricated products for integration into safety critical environments. For example, creating pipe systems for the pharmaceutical industry will include additional risks. As a result, increased regulation is required, which has resulted in further paperwork and validation for companies operating in this sector.A lot of the isometric drawings provided to companies for fabrication remain in paper format (or scanned paper documents). This provides an administrative burden on these companies as the average project could generate up to 5, 000 isometric drawings. This research explores techniques that could be utilised to automatically extract Bill of Materials (BOM) information from these isometric drawings.Tesseract has failed to perform OCR accurately on the extracted Region of Interest (ROI) data containing the BOM information, achieving a mean average of 43.8%. This paper explores different pre-processing techniques to increase the accuracy of recognition. Techniques such as binarisation, erosion, noise reduction and contouring were employed to increase this accuracy. In the study, the accuracy increased to a mean average of 81.2%. This has demonstrated that effective use of pre-processing can have an impact on character recognition.

Original languageEnglish
Title of host publication2021 32nd Irish Signals and Systems Conference, ISSC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665434294
DOIs
Publication statusPublished - 10 Jun 2021
Event32nd Irish Signals and Systems Conference, ISSC 2021 - Athlone, Ireland
Duration: 10 Jun 202111 Jun 2021

Publication series

Name2021 32nd Irish Signals and Systems Conference, ISSC 2021

Conference

Conference32nd Irish Signals and Systems Conference, ISSC 2021
Country/TerritoryIreland
CityAthlone
Period10/06/2111/06/21

Keywords

  • Computer Vision
  • OCR
  • Pre-Processing
  • Tesseract
  • Text Extraction

Fingerprint

Dive into the research topics of 'Utilising OpenCV with Tesseract to extract Bill of Materials (BOM) from Isometric Drawings'. Together they form a unique fingerprint.

Cite this