Automatically generating lead information in the heavy machine industry domain
Executive summary
Customer relationship management in the heavy industry domain is characterized by large customer databases that sometimes lack actionable information such as the business area the client is operating in. Gaining new actionable client information is vital for a successful sales process. Manually finding this information is cumbersome and requires many man-hours.
brox IT-solutions have approached this challenge with a knowledge-graph-based solution that investigates if a company has potential ties to a certain business area. Using this information potential customers can be prioritized and ascertained. This allows the customers to gain more revenue by generating leads faster and cutting costs because less manual work is required to process leads.
Objective
A heavy machine industry company needed to enrich its existing CRM data with information regarding business domains. To achieve this, a lead repository with the following properties is desired:
- The repository contains data about the potential customer that is available on their website.
- It supports finding leads or information which can help the sales team for a better sales pitch as per the customer‘s requirements.
Another factor is to convert the available data into meaningful information which in turn would enable:
- A focused approach to the customer and
- finding areas of interest for the potential customer.
Challenges
Three significant challenges can be seen in this problem:
- Extracting the data: Since scraping web data comes with noise or unwanted data, an approach is required which reduces that noise and cleans the data. Cleaning of data comes with the additional challenge of not accidentally removing important information. So a process is required that not only reduces noise but retains the maximum useable data from the mined text.
- Different companies use different terms to specify their business area that might be similar to each other. This requires a domain expert to identify similar terms in their domain which would require many man-hours.
- Converting the data into meaningful information: Once the data is processable the next step is to convert it into information. With generating information the focus is on how to present the data and how to make it meaningful. Therefore the challenge is to find a method that not only makes the data actionable but also provides the necessary context to make the information understandable.
Solution
The first challenge was to obtain publicly available data. For that, a web crawler (Beautiful Soup, Scrapy) was applied to the target website to generate a set of text from the website. The extracted text needed to be converted into actionable information.
To identify which terms from the extracted are related to the required domain the following steps were conducted:
- Extracting keywords from the text and attaching metadata to it, which is done via Wikifier.org which links the sites to dbpedia.org to enrich the data.
- Creating a graph using the enriched data and storing it in a Knowledge base repository (GraphDB, AllegroGraph).
The intuition is that if a company works in a particular domain then domain-related terms will be repeatedly mentioned in the extracted data. Hence, the following steps are taken to achieve this goal:
- A SPARQL query was used to explore the information in the graph.
- The query checks for entities that are directly or remotely related to the relevant domain.
The result of the query was used to determine how strongly a company is related to a domain.
Business benefits
The implementation of this solution brings some key advantages to the industry:
- Automatic extraction of business area information is available – no manual searching of customer websites required which leads to cost savings
- The information is already machine filtered which enables a more focused approach.
- The information provides potential clues if the customer in focus deals with a certain domain which enables a more tailored approach that can increase revenue.
- The solution helps to set up a priority-based approach within the list of potential customers. This focuses sales on the right clients which will increase revenue.
- The information can be utilized in multiple domains which enables to approach most customers with different agendas.
Gaurav Mukherjee
Author
Data Engineer / Linked Data Consultant
Dr. Matthias Jurisch
Author
Manager Information Management Unit