TY - GEN
T1 - Resume Parsing Across Multiple Job Domains Using a BERT-Based NER Model
AU - Srivastava, Madhumita
AU - Greaney, Paul
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - This study presents a resume information extraction system using named entity recognition (NER) techniques. By harnessing the power of BERT, a state-of-the-art transfer learning model, in conjunction with NER, we develop a model which accurately extracts relevant information from resumes. Our approach involves fine-tuning the pre-trained BERT base model on a customised NER resume dataset, which comprises a limited volume of annotated resume data from across four diverse job domains: information technology, human resources, consultancy, and engineering. To achieve this, we utilised the NLP capabilities of spaCy pipelines. Our results show that even with a constrained training dataset and minimal fine-tuning, transfer learning can be successfully leveraged to extract named entities from resumes, achieving respectable accuracy tailored to our specific application. Our findings underscore the pivotal role of data size and annotation quality in custom NER training. The model's generalisation and contextual comprehension heavily depend on these factors, reinforcing the need for carefully selected training data. This paper sheds light on the relationship between transfer learning, NER, and data quality in developing a sophisticated resume information extraction system.
AB - This study presents a resume information extraction system using named entity recognition (NER) techniques. By harnessing the power of BERT, a state-of-the-art transfer learning model, in conjunction with NER, we develop a model which accurately extracts relevant information from resumes. Our approach involves fine-tuning the pre-trained BERT base model on a customised NER resume dataset, which comprises a limited volume of annotated resume data from across four diverse job domains: information technology, human resources, consultancy, and engineering. To achieve this, we utilised the NLP capabilities of spaCy pipelines. Our results show that even with a constrained training dataset and minimal fine-tuning, transfer learning can be successfully leveraged to extract named entities from resumes, achieving respectable accuracy tailored to our specific application. Our findings underscore the pivotal role of data size and annotation quality in custom NER training. The model's generalisation and contextual comprehension heavily depend on these factors, reinforcing the need for carefully selected training data. This paper sheds light on the relationship between transfer learning, NER, and data quality in developing a sophisticated resume information extraction system.
KW - machine learning
KW - named entity recognition
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85189937488&partnerID=8YFLogxK
U2 - 10.1109/AICS60730.2023.10470917
DO - 10.1109/AICS60730.2023.10470917
M3 - Conference contribution
AN - SCOPUS:85189937488
T3 - 2023 31st Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2023
BT - 2023 31st Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 31st Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2023
Y2 - 7 December 2023 through 8 December 2023
ER -