Specialist Articles EN Archive | brox IT-Solutions https://brox.de/en/Blog/latest/specialist-articles-en/ KNOW-HOW TO SUCCEED Wed, 04 Sep 2024 08:39:08 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.5 https://brox.de/wp-content/uploads/2023/11/cropped-brox_logo_2020_darkblue-32x32.png Specialist Articles EN Archive | brox IT-Solutions https://brox.de/en/Blog/latest/specialist-articles-en/ 32 32 Time Ontology https://brox.de/en/blog/time-ontology/ Fri, 14 Jun 2024 15:12:49 +0000 https://brox.de/?p=21591 Dashhboads visualize important key figures from analyses in graphics as required. Such visualizations can also be of great importance in incident management. Our colleague Patrick Heger reports on five important tips and hints for creating an incident management dashboard in our new blog post.

Der Beitrag Time Ontology erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-kzgxewmc-349f3d4f0bf4d8f91c8acc1596c5c06f .av-parallax .av-parallax-inner{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2024/06/time_ontology_header.jpg); background-position:0% 0%; background-attachment:scroll; }

Case study: a browser embedded ontology-driven app for finding time intervals

Friday, June 14th, 2024


RDF (Resource Description Framework), SPARQL, ontologies and other Semantic Web Stack technologies are typically associated either with academia or with big corporate data integration projects where knowledge graphs solutions enable what is hard to achieve without the sophisticated modelling functionalities they bring. Knowledge graphs related projects rarely even come into spotlight, mostly being complex, in-house, long-term iterative processes.

The purpose of this article is to demonstrate Semantic Web technologies can be easily used by developers and data engineers to quickly solve concrete problems and accomplish specific tasks. All of this without big budgets, expensive triple stores or PhD in applied logics. There are two motivations behind this goal (inspired by the insightful Ora Lassila’s talks1):

  • RDF and SPARQL combined with ontologies can bring what would be hard to achieve using other solutions: self-describing data with accessible semantics. This semantically enriched data enables easier sharing between applications, APIs, people, or creating what recently is being called data products.
  • Lack of dedicated tooling, specialized apps, libraries, or services associated with concrete ontologies that could be useful to solve concrete tasks. There are many valuable ontologies published, but they do not come with the necessary tooling that would help developers and data specialists to adapt them and use them in their own context (which might make Semantic Web technologies more popular).

To demonstrate this, we will implement an ontology-driven project and provide an interactive browser embedded demo. In order to avoid restricting our scope to a particular domain, this effort will focus on temporal relations between events and discovering several types of time intervals. After reading this article you will gain an insight on:

  • using ontologies with your data
  • using RDF to produce self-describing data

Project outline

Events data can be found everywhere – from enterprise data warehouses, application servers, IoT devices to application logs stored on your laptop or mobile phone. It can be found in every domain and very often as the input for various analytics processes. Performing advanced analytics tasks with raw event data might be difficult when discovering the temporal relationships between events is required (for example checking chains of overlapping events). Let us consider enriching the raw event data with relationships defined in a publicly available ontology and see how the generated RDF dataset could help us with further processing and sharing of our data.

Our goal is to transform an input dataset consisting of CSV records (or RDF triples) with start and end timestamps/dates into a new RDF dataset which maps the events from the input against each other using basic temporal relationships like temporal overlap, containment, etc. When given such a task one must decide whether to use an existing ontology or create a new one.

Choosing an ontology

(You may skip this section if you are an experienced ontologist).

An ontology describes the semantics of a given domain by defining properties and concepts and the relationships between them in a way that can be understood and shared by applications and people involved. A useful, well-designed ontology provides not only a vocabulary but also building blocks for creating data-driven applications and can be used as an artifact in software development. This is because the ontology-defined axioms, concepts and rules are also data which can be consumed and manipulated by application code.

It is recommended to look for an existing ontology to avoid reinventing it and to use something that was already successfully applied in a commercial or community project. A useful domain ontology2 should:

  • Support reusability in different contexts
  • Enable partial usage of its content (cherry picking)
  • Feature modular implementation to cover various aspects in case of a wider or complex domains
  • Contain natural language descriptions of the introduced concepts understandable to domain experts and its technical users (developers, data specialists, consultants, etc.)
  • Avoid unnecessary dependency on top-level ontologies

Software developers would quickly realize those criteria are similar to the ones related to choosing an appropriate software library for their project. In fact, many high-quality OWL ontologies resemble high-quality software libraries. They are open source, have a concrete maintainer and community support and are available online.

Sometimes it would be enough to extend an existing ontology rather than create a new one from scratch. When working in a less common domain or given a demanding task you might be unsure whether an appropriate ontology is available. It might be useful to reach out to Linked Data/Semantic Web/Knowledge Graph communities or experienced consultants when in doubt whether an existing ontology for a given project exists.

Time Ontology

In the case of our project, we can rely on the Time Ontology which satisfies all the above criteria of a user-friendly domain ontology. It defines generic categories and properties for modeling and representing temporal aspects in any context. The fact that time is one of the basic aspects of our reality requires usage of high-level, more abstract concepts like instants or intervals. However, to remain applicable in different areas, it avoids introducing too many assumptions and concepts and clearly describes its theoretical outlook. “The basic structure of the ontology is based on an algebra of binary relations on intervals (e.g., meets, overlaps, during) developed by Allen al-84af-97 for representing qualitative temporal information, and to address the problem of reasoning about such information”. Below is the diagram of the thirteen elementary relations between time periods (proper intervals whose beginning and end are different):


Google-Suchergebnisseite für die Suche ‘job postings teacher düsseldorf’. Über dem ersten Treffer wird ein Bereich angezeigt, in dem Stellenausschreibungen angezeigt werden.

For the scope of this project, we will search through data for the most common interval relations:

The hyperlinks above refer to the precise definitions of respective interval relations which can be easily translated into SPARQL queries. Since our use case requires modeling temporal intervals but not instants, we can assume each event to analyze has two data properties – start time and end time – each referring to an xsd:datexsd:dateTime or xsd:dateTimeStamp value3.

Implementation

To enable processing data entirely on the client-side, we decided to choose the Rust programming language to embed the Oxigraph SPARQL engine in a web browser using WebAssembly (Wasm). Since Oxigraph can be used not only as a triplestore server but also as a Rust library, it is a natural choice for implementing a Wasm-driven client-side app transforming raw event data into RDF triples.

To increase the performance of our interval detector, a SPARQL CONSTRUCT query is generated for each event with parametrized start and end timestamps (and not using a single query to analyze the entire dataset).



Interval finder demo

The Interval finder demo is available online under https://brox-it.github.io/time-intervals-wasm/.

You can use it with your own data without any restrictions because it does not upload the submitted dataset to any server, but instead processes it internally in your browser. You can start by submitting the predefined example RDF and CSV datasets. Before processing your data, please read the instructions on the demo website to learn how to handle custom properties and bigger datasets.

Submitting the example CSV will result in a response containing each of the supported interval types. Example:



What can be done with the RDF response? It can be easily shared between various applications and systems. You can use one of the commercial or open-source triplestores to gather analyzed interval data and perform additional data processing or data validation using SPARQL, SHACL or OWL. You could also use Linked Data integration platforms like eccenca Corporate Memory to integrate those results with even more data. Let us know if you need more information.

Conclusion

Semantic Web technologies are not restricted to big projects with big budgets and centralized architecture. This article demonstrates they can be used to solve concrete problems and make producing meaningful data much easier.

Footnotes

  1. KGC 2022 – Ora Lassila, Amazon – Will Knowledge Graphs Save Us From the Mess of Modern Data Practice – “self-describing data with accessible semantics”; Ora Lassila – Graph Abstractions Matter | CDW21 Presentations – support ontologies with predefined libraries (“ontology engines”).

  2. In a simplified way ontologies can be categorized into top-level, domain and specialized ontologies. Top-level or sometimes called upper-level ontologies are extremely broad in its scope targeting to be applicable for each domain which means they try describing the outline for the entire reality (including immaterial things like dispositions, attitudes, roles etc.). While some of their implicit or hidden ontological (or metaphysical) assumptions might fit your worldview and ideas embracing them might lead to making your domain ontology more verbose, less reusable or limit their applicability because of too restrictive and opinionated philosophical assumptions. In most cases the advice is to start with a domain ontology or its specializations (lightweight, task-specific specialized ontologies).

  3. A full, low-level Time Ontology integration would assume a separate pair of start/end instant objects for each interval, which would lead to the creation of many additional triples. Since we want to create a lightweight client-side (web browser) solution, we want to keep the data footprint minimal to avoid unnecessary processing overhead.

Piotr Nowara

Author


You found the article interesting?



Subscribe

Have we aroused your interest?
Please feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You may also like

Der Beitrag Time Ontology erschien zuerst auf brox IT-Solutions.

]]> Test management https://brox.de/en/blog/test-management/ Tue, 19 Mar 2024 14:20:46 +0000 https://brox.de/?p=19880 We explain how test management becomes the decisive factor for first-class software quality. Discover the essence of test management, from planning to test automation, and optimize your test processes for maximum efficiency and quality. We are happy to support you in optimizing your test management and help you dominate the market with outstanding software quality and performance.

Der Beitrag Test management erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-kzgxewmc-d5e2ebfe5a47ff595d19c9ac5baecc6a .av-parallax .av-parallax-inner{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2024/03/Testmanagement-Blogbild-2.jpg); background-position:50% 50%; background-attachment:scroll; }

Test management

Test management: your key to first-class software quality

PDF DOWNLOAD

Tuesday, March 19th, 2024


The indispensable role of test management

In the dynamic world of technology, software quality often determines the success or failure of a product. Given the unique challenges of each project, one thing remains constant: the need to deliver world-class software. This is where test management comes into play, often marking the fine line between triumph and defeat.

More than debugging: the essence of test management

Test management is far more than just tracking down bugs. It involves the careful planning, execution and monitoring of test activities to ensure compliance with the highest quality standards. By identifying and resolving problems at an early stage, not only can quality be improved, but significant time and cost savings can also be achieved.

Control and monitoring: the heart of test management

At the heart of test management lies the efficient control and monitoring of the entire test process. It is crucial that all test activities are systematically planned and executed and that their effectiveness is continuously evaluated. This process enables continuous improvement and ensures that the software meets the specified requirements.

Test automation and non-functional testing

In today’s world, test automation is essential to keep up with the speed of software releases.Automated tests ensure efficiency and consistency.At the same time, non-functional testing, which covers performance, security and usability, is critical to delivering robust software.

The power of test metrics

Test metrics are essential to objectively measure the success of testing activities and uncover areas for improvement.Regular evaluations and adjustments to test strategies can continuously increase the efficiency of test processes.

Tackle your test management

Is it time to revolutionize your test management? Start with a critical review of your current practices and identify opportunities for improvement.Whether it’s implementing new tools, expanding your automation strategy or upskilling your teams, the key is to be proactive.

We can help you optimize your test management. Together, we can ensure that your software products not only reach the market, but also dominate it with outstanding quality and performance.

Eike Näther

Author
Senior Consultant


You found the article interesting?



Subscribe to our newsletter

Have we aroused your interest?
Please feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You may also like

Der Beitrag Test management erschien zuerst auf brox IT-Solutions.

]]> Creation of an incident management dashboard https://brox.de/en/blog/incident-management-dashboard/ Tue, 08 Aug 2023 12:27:26 +0000 https://brox.de/?p=18003 Dashhboads visualize important key figures from analyses in graphics as required. Such visualizations can also be of great importance in incident management. Our colleague Patrick Heger reports on five important tips and hints for creating an incident management dashboard in our new blog post.

Der Beitrag Creation of an incident management dashboard erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-kzgxewmc-96e3366566b9b37abb428fe89af8ef09 .av-parallax .av-parallax-inner{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2023/07/shutterstock_2016196394-2-scaled.jpg); background-position:50% 100%; background-attachment:scroll; }

Creation of an incident management dashboard

5 Tips for Creating an Incident Management Dashboard

PDF DOWNLOAD

Friday, November 24th, 2023


1. Know your target group

There are many potential target groups for Incident Management data. In addition to process performers and process management, this also includes middle and senior management as well as customers and users of the company’s products. 

As a general rule, there should be one dashboard per role. Management needs a different view of the process than the Incident Management itself, and customers need a different perspective than hardware technicians.

2. Consider the recipient

After analyzing the target group, other fundamental factors come into focus: After the “Who?”, the “How?” needs to be answered. 

In other words, it is a matter of imagining the recipient in the very moment of reading the report: Is the dashboard actively presented and the elements explained one by one? Will the report be read at a quiet desk or in a crowded elevator? 

After answering these questions, it is clear how quickly the dashboard needs to be understandable and therefore how the metrics need to be prepared: 

The more time available and the more concentrated the reading, the more information can be incorporated into the dashboard. Deep analyses and filter options, for example, should only be included if there is enough time for the recipient to use them. 

3. Key figures, key figures, key figures

In the area of Incident Management, there is an almost unlimited number of useful metrics. Which ones are to be selected for the dashboard depends on many factors. 

Together with the target group, the focus and the observation period are also determined. A strategic dashboard usually contains key figures for several months or years, while a tactical view contains a few weeks to a few months. Operational metrics usually only include data from a few hours to a few weeks. 

We will share a small part of our most used metrics: 

Strategic key figures:

  • Medium and long-term trends of the overall incident process, including for example:
    • Incident volume 
    • Response, reply and resolution times 
    • Resolution depth incl. first resolution rate 
    • Compliance with Service Level Agreements (SLA) and Operational Level Agreements (OLA)
  • Trends in the allocation of CIs or services, including main centers of outages
  • Consideration of changes to the process, e.g. change of service provider 

Tactical key figures:

  • Considerations of time periods around major events such as high-priority incidents or releases 
  • Representation of seasonal or other cyclical variations
  • Use of issues to analyze and resolve common incidents 
  • Number of tickets in the current backlog by units

Operating key figures:

  • Content of currently open tickets / backlog 
  • Tickets shortly reaching response, reply or resolution time 
  • Current open Problems for long-term resolution of Incident (-clusters)
  • Current open Changes for the resolution of incidents 
  • Resolution depth 
  • Ticket volume by CI (for early detection of clusters)

The selection of the appropriate key figures is essential for the meaningfulness of the dashboard. Sufficient time should be planned for this step in order to be able to consider several variants. 

4.Place within current reporting

After all content-related questions have been clarified, a final question needs to be answered: 

Is this dashboard really needed? 

Many companies already have similar or even identical reports that can be used with little effort. Likewise, some key figures or considerations may already exist in other dashboards. A detailed analysis of the existing reporting system is therefore particularly important for the long-term benefit of the dashboard. 

If parts of the key figures are already used, it is quickly possible to check whether the existing report can be delivered to a larger target group. If not, the identical data source should be used in any case to avoid data-related inconsistencies and confusion. 

5. Form follows function

The question of form is usually asked too early. This should only be considered after all the key figures have been selected, so as not to cause any compromises in terms of content. 

Once all key figures have been defined, the question arises as to the sequence and form of presentation. There are three principles for this: 

1. Display similar key figures close to each other 

If a section of the process is to be illuminated from different sides, the key figures should be placed next to each other. The key figures for reaction time and response time form a logical block in the statement and should therefore also form a visible block in the display. 


Google-Suchergebnisseite für die Suche ‘job postings teacher düsseldorf’. Über dem ersten Treffer wird ein Bereich angezeigt, in dem Stellenausschreibungen angezeigt werden.

2. From critical to non-critical 

Particularly important key figures should be placed at the top left along the usual reading direction, while deeper analyses tend to be given a place at the end of the report. If something should come up while reading, the most important information will still be conveyed. 

Elements without their own information, such as filter settings, should also be placed in the lower part. 


Google-Suchergebnisseite für die Suche ‘job postings teacher düsseldorf’. Über dem ersten Treffer wird ein Bereich angezeigt, in dem Stellenausschreibungen angezeigt werden.

3. Clear and distinct diagrams 

Within the company, there is usually an unwritten law about what kind of diagrams are used for what use cases. Adhering to these conventions ensures that recipients quickly understand the content. 

If such conventions do not yet exist, it makes sense to use elements that are as simple as possible. An element is ideal if no simpler one with the same expressiveness exists. 


Google-Suchergebnisseite für die Suche ‘job postings teacher düsseldorf’. Über dem ersten Treffer wird ein Bereich angezeigt, in dem Stellenausschreibungen angezeigt werden.

Patrick Heger

Patrick Heger

Author
Consultant


You found the article interesting?



Subscribe to our newsletter

Have we aroused your interest?
Please feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You may also like

Der Beitrag Creation of an incident management dashboard erschien zuerst auf brox IT-Solutions.

]]> Embedded RDF https://brox.de/en/blog/embedded-rdf/ Thu, 04 May 2023 09:02:41 +0000 https://brox.de/?p=16283 RDF can be embedded in webpages with JSON-LD, Microdata, or RDFa. In services like search engines and social networks, this can improve visibility and drive more traffic to your website. Embedded RDF in external websites can potentially enrich your own knowledge graph. Read more about RDF and it’s applicability in this blog post.

Der Beitrag Embedded RDF erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-kzgxewmc-f25c7b6dd553a4e02d5f1e06129feb5e{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2023/04/3840-2160-max.png); background-position:0% 50%; background-attachment:scroll; }

Embedded RDF

Wie Webseiten RDF einbetten und wie man strukturierte Daten aus dem Web extrahieren kann

PDF DOWNLOAD

Thursday, May 04th, 2023


Summary

RDF can be embedded in webpages with JSON-LD, Microdata, or RDFa. In services like search engines and social networks, this can improve visibility and drive more traffic to your website. Embedded RDF in external websites can potentially enrich your own knowledge graph.

RDF

RDF, the foundation of the Semantic Web and Linked Data, is a standard for describing and exchanging data.

One of the advantages is that external RDF data can be quickly integrated in, and utilized by, your own RDF-based Knowledge Graph.

There are many publicly available datasets which can be downloaded in an RDF serialization (and some publishers also offer an endpoint which allows querying the data with SPARQL). To get an impression, the Linked Open Data Cloud lists some RDF datasets which are published under an open license.

But there is another possible source of RDF data: regular webpages which embed RDF as part of their HTML.

Why do webpages embed RDF?

There can be countless motivations for embedding RDF, but it should be safe to assume that most publishers do this to enable certain features in services like social networks and search engines.

In social networks, RDF can enable showing a preview of the webpage when the link gets shared.

In search engines, RDF (using the vocabulary Schema.org) can enable showing a richer result snippet for that page. This is relevant for SEO, as such rich results easily catch the eyes of the searchers, and this improved visibility can increase the click-through rate to your pages.

As an example, Google Search offers rich results for datasetsQ&As, and many more. The following screenshot shows the job postings rich result, which gets displayed at the top of the results page, even before the top-ranked regular results:


Google-Suchergebnisseite für die Suche ‘job postings teacher düsseldorf’. Über dem ersten Treffer wird ein Bereich angezeigt, in dem Stellenausschreibungen angezeigt werden.

Google Search query “job postings teacher düsseldorf”



How do webpages embed RDF?

There are three common syntaxes for embedding RDF in webpages:

JSON-LD gets embedded within its own HTML script element:



Microdata consists of attributes (e.g., itemprop) that get added to HTML elements:



RDFa, like Microdata, consists of attributes (e.g., property) that get added to HTML elements:



While Microdata and RDFa allow reusing the content that is already part of the HTML, JSON-LD requires duplicating the content.

How many webpages embed RDF?

The project Web Data Commons regularly analyzes the corpus of the project Common Crawl to find out how many of the crawled domains / pages embed triples (which includes the three syntaxes mentioned above, and certain Microformats): https://webdatacommons.org/structureddata/


Balkendiagramm mit den Jahren 2012 bis 2022 auf der X-Achse und der Anzahl in Millionen auf der Y-Achse

For each year between 2012 and 2022, this bar chart shows how many of the crawled pay-level domains published Microdata, JSON-LD, hCard (Microformats), and RDFa. (Screenshot taken from webdatacommons.org, 2023-03-07)



For the October 2022 crawl almost 50 % of the crawled pages, and around 40 % of the crawled pay-level domains, contained triples.

How to notice if a webpage embeds RDF?

By default, web browsers don’t give any indication that a page contains RDF. Apart from checking the HTML source code, browser extensions could be used to detect RDF.

An example would be the Structured Data Sniffer by OpenLink Software. It displays the RDF in an overlay in the top right corner:


Screenshot eines Browsers, der eine Stellenausschreibung auf LinkedIn geöffnet hat. Die Browsererweiterung OSDS legt ein Fenster über die Webseite, in dem das extrahierte RDF angezeigt wird.



How to extract the embedded RDF?

The above-mentioned Structured Data Sniffer allows to view, download, and upload (e.g., to a SPARQL endpoint) the extracted RDF. It supports the serializations JSON-LD, RDF/XML, and Turtle.

Another option, suitable for a programmatic approach, is the Python library and command-line tool extruct by Zyte. It outputs everything in one JSON object, which contains JSON-LD objects for the extracted RDF.

Join in!

Do you want to utilize embedded RDF? For example, to integrate it in your own knowledge graph?

Do you want to embed RDF in your webpages? For example, to increase visiblity for your search engine result?

Let’s get in touch to see if we can support you.

Stefan Götz

Author
Linked Data Consultant


YOU FOUND THE ARTICLE INTERESTING?



Subscribe

Have we aroused your interest?
Please feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You might also like

Der Beitrag Embedded RDF erschien zuerst auf brox IT-Solutions.

]]> Frequent transition management challenges in the sourcing lifecycle. https://brox.de/en/blog/frequent-transition-management-challenges-in-the-sourcing-lifecycle/ Thu, 16 Mar 2023 13:30:00 +0000 https://brox.de/?p=15031 The management of IT landscapes is becoming increasingly time-consuming due to their growing complexity and is therefore happily outsourced. After a strong partner has been found, the transition phase begins. In this blog post, we want to look at some of the challenges of this phase using two examples.

Der Beitrag Frequent transition management challenges in the sourcing lifecycle. erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-kzgxewmc-455f53a83bd3c8b696405c1424f254e7{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2023/03/iPAD.jpg); background-position:0% 50%; background-attachment:scroll; }

Transition Management

Common transition management challenges in the sourcing lifecycle

PDF DOWNLOAD

Thursday, March 16th, 2023


As a result of the constantly increasing complexity of IT landscapes, more and more companies want to employ an external partner to handle these tasks so that they can focus on their core competencies within the company. After a partner has been successfully found by the sourcing management, the tasks are then transferred to the external service provider. This process is called transition. In the following, we will look at some of the challenges of this phase using two examples:

  1. The initial transfer of service packages to the external service provider.
  2. The transfer of a service package from one service provider to a new service provider.

The transition phase can directly put the relationship between client and contractor, which is still quite young at this point, to a tough test. Based on our experience, we can say that it is precisely in this phase of the sourcing lifecycle that the greatest challenges and problems arise in the cooperation. The reasons for this are often poor project planning using the wrong methodologies, non-transparent communication between both parties, basic requirements that have not been created (e.g. with regard to the organizational structure and/or process organization), or even a lack of resources and availability.

What challenges arise in case study one and how can they be avoided?

In this case study, we assume that there is little to no experience with outsourcing and the necessary transitions in the company or on the client side. However, the lack of experience is usually not the root cause of the challenges that arise. The reasons are more likely to be an unprepared organizational structure and/or process organization.

Outsourcing a service package for the first time always means change within the company. These changes affect, among other things, previously lived processes, meeting structures, existing roles and functions, and the “mindset” of the affected employees. The contract usually strictly specifies which processes, roles and functions are to be handled and staffed on the service provider side, as well as which bodies are to be used to transparently report on the current status of service provision in the future.

This is accompanied by necessary changes on the client side, which significantly affect the employees who previously provided the outsourced services themselves. New roles and functions with a different mindset must therefore be created, as performance is to be monitored and controlled in the future.

Each individual employee will therefore have to rethink and grow into a new role in the future – away from their own provision of the service to the management of the future service provider. In order to be able to ensure adequate service provider management, new or adapted processes as well as roles and functions must therefore be described and created on the client side.

This necessary transformation within the own organization is often very underestimated and leads to a limited ability to perform the role of service provider management in practice. The result is often a defensive attitude on the part of the employees, a disturbed partnership from the start, and (in the worst case) a strategic goal that has not been achieved or only partially achieved.

It is therefore essential to prepare the employees and the organization for the changes. The current organizational structure and processes must be analyzed, particularly with regard to the currently existing roles, functions and committees as well as the delegation/escalation and information flows, in order to initiate the first necessary measures before the actual transition. Only if the company’s own organization is adapted to the future contract and service model, it can effectively support the transition with the right focus.

However, a transition always involves two parties. Thus, we would like to use the next question to identify further challenges caused by the service provider. Even if the question relates to the second case study, these errors can occur in the same way and additionally in the context described above.

What challenges can arise when, as described in example two, a new service provider takes the place of the old service provider?

In this case study, it is assumed that the client side has already gained experience in supporting transitions and that the organization is already aligned with the current contract structure. Thus, this question tends to focus on the service provider side.

After selection via the tender process, the contractor must now integrate and process considerable amounts of information in its organization within the transition. This includes the application of the right project management methodology, appropriate project planning including the establishment of a functioning project organization, transparent and open communication with the client side (customer) – especially in the event of problems and challenges – and the provision of qualified resources.

A transition must be planned before it begins, and the milestones to be achieved by the contractor (service provider/managed service partner) must be agreed and described. For this purpose, the right project management and tracking methodology must be selected jointly. In most cases, this coordination does not take place until the start of the transition and takes about 2-4 weeks. Since a transition is usually tightly scheduled for cost reasons, these 2-4 weeks are often missing at the end of the transition and cause delays already at the start of the project. We therefore recommend that this rather “formal coordination” is always carried out before the official transition start in the so-called “pre-transition phase”. This period, which is based on the complexity of the service packages to be transferred, ranges from four to eight weeks. In addition, this time can be used to start onboarding for key positions in the project. In this way, any delays within this process can be mitigated and the transition team’s ability to work at the start of the transition can be ensured.

Agreeing on the transition milestones in as much detail as possible ensures that a uniform understanding of the delivery services expected by the customer is created. If this coordination does not take place initially before the start of the transition, there is no detailed and, above all, no documented agreed target definition for the future service provider. This can lead to difficult and unnecessary discussions later in the course of the transition, especially if a postponement of the transition completion is expected, or can prevent the regular acceptance of the transition.

Just as important as aligning planning and milestones is selecting the right project management methodology. In most cases, a transition via the Cynefin model can be classified as “complicated”. Therefore, the application of “classic” project management methods such as the waterfall model or PRINCE2 is recommended here. However, the situation should be analyzed and evaluated individually. The application of the wrong methodology leads to inaccurate planning, insufficient documentation and thus causes delays and/or misunderstandings, which in turn lead to commercial repercussions on the part of both parties.

The correct selection of the project management methodology, detailed documentation of the completed coordination (e.g. via protocols) including the project plan as well as the detailed definition of individual project milestones is absolutely necessary in order to avoid delays or escalations in the further course of the transition. The documentation of the points is usually taken over by the service provider side and should be released in writing or by mail by the client.

At least as important as the documentation of the previously mentioned points is the correct communication, preparation and presentation of the transition progress. The service provider should regularly report on the transition progress and communicate transparently and openly when challenges arise.

Problems are often “dragged out” and target dates are postponed without taking a closer look at the effects. For such changes, a “change process” should be defined within the project, in which the client must agree, for example, to the postponement of a target date. Furthermore, the service provider should be required to document “project changes” and continuously assess the risks that arise. This documentation can be used as a basis for decision-making in the event of later escalation and is therefore correspondingly important.

The transition lays the foundation for further cooperation in regular operation. An unsuccessful and delayed transition usually leads to a disturbed relationship with the partner and the unexpected full delivery of services in the subsequent regular operation. With correct forward planning, consistent application of the right project management methodology and open, transparent and performance-oriented communication, many of the challenges highlighted can be avoided or reduced.

Sascha Brandt

Sascha Brandt

Author
Senior Consultant


You found the article interesting?



Subscribe

Have we aroused your interest?
Please feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You might also like

Der Beitrag Frequent transition management challenges in the sourcing lifecycle. erschien zuerst auf brox IT-Solutions.

]]> Linked Data and Job Postings https://brox.de/en/blog/linked-data-and-job-postings/ Thu, 10 Nov 2022 16:06:11 +0000 https://brox.de/?p=11671 Job postings typically include the required skills to perform the advertised job as unstructured text. Several different terms can be used to represent a set of similar skills. This makes searching for jobs that match a given skill set complicated. Read more about a solution that uses knowledge graphs to address this problem.

Der Beitrag Linked Data and Job Postings erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-kzgxewmc-eac730814b3cefb25853e8e64637fb8b{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2022/11/shutterstock_1493643161-scaled.jpg); background-position:50% 50%; background-attachment:scroll; }

Linked Data and Job postings

Automatically expanding job descriptions through knowledge graphs

PDF DOWNLOAD

Executive summary

Job listings usually contain the skills required for performing the advertised job as unstructured text. Several different terms can be used to represent a set of similar skills. This makes searching for jobs that match a given skill set hard, as search terms need to exactly match textual skill descriptions. This also applies when searching for candidates that describe their skill set in a text-based resume. For job platforms and staffing firms, this is a missed opportunity: possible matches between candidates and positions are not found, because the textual descriptions that represent very similar skills do not match.

brox IT-Solutions implemented a solution that helps to automatically identify skills and relate them to structured data, to allow a Knowledge Graph to solve this issue. The skills are automatically related to a linked open data source using NLP technologies. For staffing firms, this allows candidates to be assigned to open positions faster, which will increase revenue.

Objective

The job listings available are unstructured. The objective is to categorize the job listings by the field of work, along with skills required that are specific to the jobs. This makes it convenient for jobseekers to navigate the labour market and find new occupations, and understand which skills relate to which occupations to inform about the needed skills for desired occupations. This will also allow staffing firms to easier connect applicants to jobs.

Challenges

Several challenges needed to be overcome for a solution to be implemented: Firstly, the available job data is unstructured and has multiple sources. Knowledge graphs help to integrate the data. As data is sourced from different areas, there might be a problem that the different authors have different names to describe the same skills. Also, data is not connected to available open data sources about these skills.

Solution

The core parts of the solution were separated into two components: processing unstructured text from job descriptions and creating an integrated knowledge graph containing skills and job postings.

The text processing consisted of the following steps:

  • Extract the text content of the job postings from the respective job portals using web scraping tools.
  • Using pre-processing and data cleaning techniques to extract only the information that is relevant for the required skills for a job.

Creating the integrated knowledge graphs utilizes the following components

  • Processing the extracted and cleaned text with the wikifier API, which recognizes named entities and their Wikipedia ID.
  • Annotating the job postings with the related DBPedia entities from wikifier. An example of such an annotation is given in the following figure.
  • Storing the results (job postings, skills, linked DBPedia entities ) in a knowledge graph.



Business Benefits

Several business aspects can benefit from the described solution:

  • For job platforms and staffing firms, expanding job descriptions allows finding more candidates for open positions, which increases revenue.
  • Structured storage of job descriptions allows automatic mappings of positions and candidates, which reduces the amount of manual labour required and decreases costs.
  • The extracted data can also be used to do a more reliable analysis of market trends, as synonyms for skills are taken into account.
  • For companies advertising on job platforms, roles are filled more quickly, which will increase revenue, reduce the cost of hiring and reduce the risk of not finding good candidates in time.
  • Providing more value for companies advertising on job platforms will allow job platforms to attract more customers, who are willing to pay higher prices.
  • For job seekers having a structured representation of job advertisements would allow better querying and setting up more specialized alerts. This would attract more job seekers to a platform using those technologies which would increase revenue.

Dr. Matthias Jurisch

Dr. Matthias Jurisch

Author
Manager Information Management Unit


You found the article interesting?



Subscribe

Have we aroused your interest?
Please feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You might also like

Der Beitrag Linked Data and Job Postings erschien zuerst auf brox IT-Solutions.

]]> Practical example of a storage migration https://brox.de/en/blog/practical-example-of-a-storage-migration/ Fri, 30 Sep 2022 11:15:12 +0000 https://brox.de/?p=11016 In this practical example of a storage migration, the individual challenges that can accompany a storage migration are explained. The reason for the migration was a simple extinguishing systems test. Read more about how we dealt with the challenges and what lessons can be learned from that for other storage migration projects.

Der Beitrag Practical example of a storage migration erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-kzgxewmc-097e2b37bb15274d8b527b57caed462d{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2022/09/1831715035-huge-scaled.jpg); background-position:50% 50%; background-attachment:scroll; }

Storage Migration

Practical example

PDF DOWNLOAD

Friday, September 30th, 2022


Reasons for Storage Migration

Legacy landscapes are increasingly reaching their limits. The amount of data that needs to be stored is constantly growing and the requirements for storage infrastructures are constantly increasing. In our three-part blog on storage migration, we described in detail the reasons why storage migration is necessary.

Among other things, a storage migration can be carried out for pure performance optimization, but is often postponed for cost reasons, even though it leads to cost reductions in the long term. However, sooner or later, an outdated storage solution brings problems with it that make the migration unavoidable. The following describes a practical example in which a simple extinguishing system test in a data center room provided the fundamental reason for the migration.

Initial situation for the problem

 

The technology installed in data centers leads to increased heat generation, which is the reason sufficient cooling is indispensable. Basically, data centers must be kept at a constant temperature level within their rooms. If the temperature rises above the specified limits of the hardware at individual points, this can already lead to the failure of certain components.

At the same time, technical defects and short circuits generally cannot be ruled out. This combination leads to an increased risk of fire within data centers. To counteract this, legally prescribed safety precautions must be observed. These include the installation of an extinguishing system, which helps to get a fire under control in the event of an emergency. To ensure that the system functions properly, regular tests must be carried out. Failure to comply with the regulations could result in the loss of the operating license.

Furthermore, a functioning extinguishing system helps to keep the economic damage as small as possible in the event of a fire. The recovery of data after a fire is associated with high costs and effort. In the worst case, company-relevant data, such as patent, product development and warranty-relevant data, can be completely lost. Also, not to be neglected is the direct danger to employees if the fire spreads.

In our practical example, one of the server rooms poses a major problem for the upcoming extinguishing system test. Corporate-relevant data that is essential for daily business is stored on one of the servers located there. But the data is stored on an outdated storage system with conventional hard disks (HDDs with read heads). Most systems of this type have reached their end of service by now but are often kept in use for cost reasons. Due to the conditions of the system, a condition-compliant erasure test would pose a direct threat to the installed hardware and thus to the data on it.

An important part of the extinguishing system test is also the testing of the alarm sound. This comes with an increased volume and therefore leads to vibrations within the room.

The vibrations could cause the read heads of the HDDs to vibrate and damage themselves or the data carriers. This already happened in 2018 in a Swedish data center. An article on the incident can be found in Heise Online: https://www.heise.de/newsticker/meldung/Loeschanlagen-Ton-zerstoert-Festplatten-in-schwedischem-Rechenzentrum-4029730.html.

As a result, measures were necessary to protect the important data on the servers. These had to be planned and implemented from scratch, for which in our case only a small window of time was available. Postponement was not an option, as failure to meet the deadline threatened to result in the servers being shut down by the responsible authority.

To ensure that the problem would be solved in the long term and would not reoccur in the same form during the next extinguishing system test, it was decided to renew the hardware (long-term), as well as to migrate the storage (temporary solution).

Within the available time window, the hardware for the new storage solution had to be procured, installed, and the data migrated. But even the purchase of the hardware turned out to be a major hurdle due to supply chain problems as a result of the Corona crisis. It was sold out by all manufacturers and could not even be procured through direct contacts with the manufacturers. Even an intensive search on aftermarket sites was without success.

In order to prepare the move of the data, it is first necessary to identify the responsible persons as well as the know-how carriers. These are active in different areas of expertise and therefore have a variety of requirements for the new storage solution. In addition, there are various wishes for the timing of the data migration.

The totality of the given conditions led to the conclusion that the storage migration could not be carried out within the time window available for the erasure system test. Therefore, the migration was considered a long-term goal for the time being. Nevertheless, a transitional solution was needed to enable the extinguishing system test to be carried out.

 

Solution Variants

On the one hand, the transitional solution had to be implemented quickly due to the tight timeframe, and on the other hand, it had to be as resource-efficient as possible due to the upcoming storage migration. In addition, the server could not be shut down, as it might not have been possible to start it up again due to its age.

First and foremost, concepts were analyzed that did not require the immediate commissioning of another storage solution. In the course of this, possibilities were investigated to move the server to other facilities before the extinguishing system test, or to shield it sufficiently against the sound. However, both options carried a residual risk due to lack of experience.

None of the solutions examined could completely rule out a possible defect of the server, so the use of additional storage solutions was unavoidable.

Therefore, virtual machines were to be put into operation. Copying the data to the virtual machines was to ensure day-to-day business, as well as data recovery in the event of a defect of the old systems. This was implemented by accessing the data via the virtual machines and not directly via the server for the duration of the extinguishing system test.

The goal of the long-term planned storage migration is to build a new infrastructure to enable the move of all corporate data to virtual machines. This brings several advantages:

  • Consolidation: Multiple virtual servers on one physical server reduces investment and operating costs and simplifies data center operations.
  • Intelligent management: Virtual servers can be managed much more intelligently and flexibly. Automating many tasks is no longer a problem.
  • Fast provisioning: Virtual workloads can be easily scaled or moved. This makes it possible to respond more quickly to new requirements.
  • Security and availability: Virtual servers avoid application downtime and significantly accelerate disaster recovery.

After finishing all backup processes, the extinguishing system test could be carried out without any problems. According to the forecasts, one hard disk failed during the test. However, the data on it could be restored from the virtual machines to the original system via the backup without any problems.

After the extinguishing test, the data was retrieved again directly via the old server. The backup remains on the virtual machines for security reasons, as the failure of further hard disks cannot be ruled out due to their age. In addition, this created a safety net and the data is protected until the end of the upcoming storage migration.

 

Conclusion

By reacting quickly and appropriately to the circumstances, a solution was found and the extinguishing system test was possible within the specified timeframe. At the same time, the approved budget was adhered to, thus saving resources for the planned storage migration.

The practical example shows that, in addition to basic know-how, transfer knowledge is also essential for carrying out a storage migration – because the ability to react to individually occurring problems with flexible solutions is indispensable in every storage migration. Experience and best practices help to ensure that the actual goal is not lost sight of when dealing with such problems. In concrete terms, this means always balancing the long-term strategic planning of a migration strategy with the ability to act operationally. This should be reflected, among other things, in the design of decision-making paths and coordination. In this way, it should be possible to complete the project successfully despite the high dynamics and complexity of a storage migration.

Patrick Hanke

Patrick Hanke

Author
Projectmanager


Paul Stapf

Paul Stapf

Author
Junior Consultant


You found the article interesting?



Subscribe

Have we aroused your interest?
Please feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You might also like

Der Beitrag Practical example of a storage migration erschien zuerst auf brox IT-Solutions.

]]> Knowledge Graphs https://brox.de/en/blog/knowledge-graphs/ Thu, 15 Sep 2022 15:37:57 +0000 https://brox.de/?p=10296 Knowledge graphs are the main component of the work of our Information Management department. Despite the many possibilities that knowledge graphs offer, not everyone is yet familiar with them. Read this post to find out how knowledge graphs help you speed up your data integration processes and reduce costs.

Der Beitrag Knowledge Graphs erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-bjo4q6-f088d2f1187c46984b9da30713cf8123{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2022/09/Entwurf-Bild-Knowledge-Graph-scaled.jpg); background-position:0% 50%; background-attachment:scroll; }

Knowledge Graph

What is a Knowledge Graph?

Thursday, September 15th, 2022


Structure of a Knowledge Graph

A knowledge graph consists of a network of real-world entities (i.e., objects, situations, events, or concepts) and illustrates the relationship between them. Those entities can, in addition to other options, also be visualised in a graph structure, hence the name knowledge “graph”. 

Typically, a knowledge graph is created from various datasets from different sources, varying in structure. Through using ontologies as a schema layer, the knowledge graph allows for logical interference for retrieving implicit knowledge rather than only requesting explicit knowledge via queries. Therefore, the knowledge graph provides unified structure to diverse data, contextualizes, as well as links it and gives a framework for data integration, unification, analytics and sharing. 

Use of Knowledge Graphs

Knowledge graphs are prominently associated with search engines and other question-answering services such as the Google Search Engine or Amazon’s Alexa. The importance of knowledge graphs here lies with the ability to distinguish words with multiple meanings, for example the difference between apple, the fruit, and apple, the tech brand. Therefore, by using a knowledge graph, search engines are able to give more context than traditional results. 

brox uses knowledge graphs primarily for use cases surrounding data integration. Knowledge graphs are very useful in this domain because they are very flexible but still allow the usage of powerful query languages such as SPARQL, schema definitions such as SHACL as well as disambiguation. These properties can speed up data integration processes significantly and will help you reduce costs and achieve more in data integration projects. 

Emelie Steglich

Emelie Steglich

Author
Working Student


Dr. Matthias Jurisch

Dr. Matthias Jurisch

Co-Author
Manager Information Management Unit


You found the article interesting?



Subscribe

Have we aroused your interest?
Please feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You might also like

Der Beitrag Knowledge Graphs erschien zuerst auf brox IT-Solutions.

]]> Generating Lead Information https://brox.de/en/blog/generating-lead-information/ Thu, 04 Aug 2022 15:17:04 +0000 https://brox.de/?p=7854 Customer relationship management in the heavy industry domain is characterized by large customer databases that sometimes lack actionable information such as the business area the client is operating in. Gaining new actionable client information is vital, while manually finding this information is cumbersome…

Der Beitrag Generating Lead Information erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-kzgxewmc-36951f5449ac3ce5d580934717a6c663{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2022/06/shutterstock_1712203645_abgedunkelt-1-scaled.jpg); background-position:50% 50%; background-attachment:scroll; }

Generating Lead Information

Automatically generating lead information in the heavy machine industry domain

PDF DOWNLOAD

Executive summary

Customer relationship management in the heavy industry domain is characterized by large customer databases that sometimes lack actionable information such as the business area the client is operating in. Gaining new actionable client information is vital for a successful sales process. Manually finding this information is cumbersome and requires many man-hours. 

brox IT-solutions have approached this challenge with a knowledge-graph-based solution that investigates if a company has potential ties to a certain business area. Using this information potential customers can be prioritized and ascertained. This allows the customers to gain more revenue by generating leads faster and cutting costs because less manual work is required to process leads.

Objective

A heavy machine industry company needed to enrich its existing CRM data with information regarding business domains. To achieve this, a lead repository with the following properties is desired:

  • The repository contains data about the potential customer that is available on their website. 
  • It supports finding leads or information which can help the sales team for a better sales pitch as per the customer‘s requirements.

Another factor is to convert the available data into meaningful information which in turn would enable:

  • A focused approach to the customer and 
  • finding areas of interest for the potential customer.

Challenges

Three significant challenges can be seen in this problem:

  1.  Extracting the data: Since scraping web data comes with noise or unwanted data, an approach is required which reduces that noise and cleans the data. Cleaning of data comes with the additional challenge of not accidentally removing important information. So a process is required that not only reduces noise but retains the maximum useable data from the mined text. 
  2. Different companies use different terms to specify their business area that might be similar to each other. This requires a domain expert to identify similar terms in their domain which would require many man-hours.
  3. Converting the data into meaningful information: Once the data is processable the next step is to convert it into information. With generating information the focus is on how to present the data and how to make it meaningful. Therefore the challenge is to find a method that not only makes the data actionable but also provides the necessary context to make the information understandable. 

Solution

The first challenge was to obtain publicly available data. For that, a web crawler (Beautiful Soup, Scrapy) was applied to the target website to generate a set of text from the website. The extracted text needed to be converted into actionable information.

To identify which terms from the extracted are related to the required domain the following steps were conducted:

  • Extracting keywords from the text and attaching metadata to it, which is done via Wikifier.org which links the sites to dbpedia.org to enrich the data.
  • Creating a graph using the enriched data and storing it in a Knowledge base repository (GraphDBAllegroGraph).

The intuition is that if a company works in a particular domain then domain-related terms will be repeatedly mentioned in the extracted data. Hence, the following steps are taken to achieve this goal:

  • A SPARQL query was used to explore the information in the graph.
  • The query checks for entities that are directly or remotely related to the relevant domain.

The result of the query was used to determine how strongly a company is related to a domain.

Business benefits

The implementation of this solution brings some key advantages to the industry:

  • Automatic extraction of business area information is available – no manual searching of customer websites required which leads to cost savings 
  • The information is already machine filtered which enables more focused approach.
  • The information provides potential clues if the customer in focus deals with a certain domain which enables a more tailored approach that can increase revenue.
  • The solution helps to set up a priority-based approach within the list of potential customersThis focuses sales on the right clients which will increase revenue.
  • The information can be utilized in multiple domains which enables to approach most customers with different agendas.

Gaurav Mukherjee

Gaurav Mukherjee

Author
Data Engineer / Linked Data Consultant


Dr. Matthias Jurisch

Dr. Matthias Jurisch

Author
Manager Information Management Unit


You found the article interesting?



Subscribe

Have we aroused your interest?
Feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You might also like

Der Beitrag Generating Lead Information erschien zuerst auf brox IT-Solutions.

]]> Pharma Semantic Search https://brox.de/en/blog/pharma-semantic-search/ Wed, 08 Jun 2022 21:26:25 +0000 https://brox.de/?p=8223 The integration of information from regulatory documents and R&D databases often requires a manual search in various documents and databases. In cooperation with a pharmaceutical company, brox has developed a solution using Knowlede Graphs that enables a simple search in the data...

Der Beitrag Pharma Semantic Search erschien zuerst auf brox IT-Solutions.

]]>
.avia-section.av-kzgxewmc-c3bb6a6e293f4abb144a64fef32edf42{ background-repeat:no-repeat; background-image:url(https://brox.de/wp-content/uploads/2022/06/shutterstock_1962805639_abgedunkelt-1-scaled-1.jpg); background-position:50% 50%; background-attachment:scroll; }

Pharma Semantic Search

PHARMA SEMANTIC SEARCH: CONNECTING REGULATORY INFORMATION TO INTERNAL R&D DATA VIA A KNOWLEDGE GRAPH

PDF DOWNLOAD

Executive Summary 

Pharmaceutical companies need to go through a regulatory submission process when filing for the registration of new products. An internal view of R&D data is usually maintained separately from the data used in the submission process. Integrating the information from regulatory documents and R&D databases often includes a manual search through documents and databases, which requires a significant amount of effort.

brox IT Solutions has built a solution that integrates data from the regulatory process and R&D databases to overcome these challenges for a large German pharma company. The solution was built on a knowledge graph, which allows an uncomplicated extension of the data and contains an according search frontend to allow non-technical users to access the data effortlessly. The reduced effort of manually searching through databases and documents can lead to a significant decrease in cost.

Objective 

A pharma company needed to integrate the data represented in regulatory submission documents with data from internal databases, such as substances and molecules, as well as organizational master data. The firm wanted to integrate the data for:

  • ensuring data quality of submission documents
  • getting information on which substances are registered in which countries
  • directing research effort to areas that result in products

The frontend for exploring the data was required to allow searching for relevant and filtered information, along with allowing users with no data-science or analytics background to interact with the information.

Challenges 

One of the main challenges for this project was relating data from the R&D and regulatory domains to one another. Data from the regulatory domain included text-mined documents. Therefore, identifiers in the documents did not always exactly match identifiers and names used in other databases. After data cleansing, the data needed to be matched to the internal master data on substances and legal entities already maintained in a knowledge graph. The results of the matching process had to be stored in a knowledge graph to allow integration with other sources.

Another challenge was making the data available to non-technical users via a front-end. To use an interaction pattern that was known to these users, this frontend was chosen to be based on a search engine. That search engine had to be integrated with the knowledge graph and therefore allow a faceted search over the data represented in the graph.

Solution 

The solution built for these challenges consisted mainly of two aspects: the data integration and the frontend implementation.

Data integration was accomplished while using the following components:

  • An ETL-Software was used to extract data from text mining results and create a graph.
  • Matching the text mining results was done via matching patterns created with SPARQL queries.
  • This data was then ingested into an RDF database

For the frontend implementation, the following components were a part of the solution:

  • The search engine was built using elasticsearch
  • Ingestion into elasticsearch was done by using rdflib for extracting the data from the graph store and the elastic library for python for indexing the data.
  • Frontend for searching was implemented using searchkit, which allowed for a fast implementation due to easy and accessible templates.

 

Business Benefits 

The implementation of the solution allowed the pharma company access to several benefits that were not attainable without the solution:

  • Inconsistencies between regulatory and R&D data can be discovered. Finding these inconsistencies can reduce regulatory risk.
  • Connections between products, substances, and according legal entities that are allowed to sell them can now be found in the graph. Finding this information usually requires hours or even days of manual work searching through documents. Hence, this is a potential to cut costs.
  • Regulatory data can be easily accessed and filtered by country, internal substance identifiers, related company, and other aspects. These can help with getting an overview of the current market access of the company and hence make the search for new streams of revenue easier.
  • The solution is very extendable: now that the regulatory data is contained in the graph, additional use cases can be built on top of that data by extending the graph and building a new frontend on top.
  • Implementation of a solution was done within weeks because a graph was already present for the internal R&D data and tools like pentaho, graph databases, and searchkit were able to facilitate quick prototyping. Thus, the costs of new applications built with a similar approach can be reduced.
Dr. Matthias Jurisch

Dr. Matthias Jurisch

Author
Manager Information Management Unit


You found the article interesting?



Subscribe

Have we aroused your interest?
Please feel free to contact us!

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.

You might also like

Der Beitrag Pharma Semantic Search erschien zuerst auf brox IT-Solutions.

]]>