Automatically expanding job descriptions through knowledge graphs
Executive summary
Job listings usually contain the skills required for performing the advertised job as unstructured text. Several different terms can be used to represent a set of similar skills. This makes searching for jobs that match a given skill set hard, as search terms need to exactly match textual skill descriptions. This also applies when searching for candidates that describe their skill set in a text-based resume. For job platforms and staffing firms, this is a missed opportunity: possible matches between candidates and positions are not found, because the textual descriptions that represent very similar skills do not match.
brox IT-Solutions implemented a solution that helps to automatically identify skills and relate them to structured data, to allow a Knowledge Graph to solve this issue. The skills are automatically related to a linked open data source using NLP technologies. For staffing firms, this allows candidates to be assigned to open positions faster, which will increase revenue.
Objective
The job listings available are unstructured. The objective is to categorize the job listings by the field of work, along with skills required that are specific to the jobs. This makes it convenient for jobseekers to navigate the labour market and find new occupations, and understand which skills relate to which occupations to inform about the needed skills for desired occupations. This will also allow staffing firms to easier connect applicants to jobs.
Challenges
Several challenges needed to be overcome for a solution to be implemented: Firstly, the available job data is unstructured and has multiple sources. Knowledge graphs help to integrate the data. As data is sourced from different areas, there might be a problem that the different authors have different names to describe the same skills. Also, data is not connected to available open data sources about these skills.
Solution
The core parts of the solution were separated into two components: processing unstructured text from job descriptions and creating an integrated knowledge graph containing skills and job postings.
The text processing consisted of the following steps:
- Extract the text content of the job postings from the respective job portals using web scraping tools.
- Using pre-processing and data cleaning techniques to extract only the information that is relevant for the required skills for a job.
Creating the integrated knowledge graphs utilizes the following components
- Processing the extracted and cleaned text with the wikifier API, which recognizes named entities and their Wikipedia ID.
- Annotating the job postings with the related DBPedia entities from wikifier. An example of such an annotation is given in the following figure.
- Storing the results (job postings, skills, linked DBPedia entities ) in a knowledge graph.
Business Benefits
Several business aspects can benefit from the described solution:
- For job platforms and staffing firms, expanding job descriptions allows finding more candidates for open positions, which increases revenue.
- Structured storage of job descriptions allows automatic mappings of positions and candidates, which reduces the amount of manual labour required and decreases costs.
- The extracted data can also be used to do a more reliable analysis of market trends, as synonyms for skills are taken into account.
- For companies advertising on job platforms, roles are filled more quickly, which will increase revenue, reduce the cost of hiring and reduce the risk of not finding good candidates in time.
- Providing more value for companies advertising on job platforms will allow job platforms to attract more customers, who are willing to pay higher prices.
- For job seekers having a structured representation of job advertisements would allow better querying and setting up more specialized alerts. This would attract more job seekers to a platform using those technologies which would increase revenue.
Dr. Matthias Jurisch
Author
Manager Information Management Unit