Gå till innehållet

NGP intelligence (NGPi) - index.md

The NGPi is a Hitachi Content Intelligence (HCI) data indexing solution capable of sifting through immense amounts of information to create structured indexes of metadata sourced from objects and object contents within NGPr. The HCI itself is divided into a cluster of nodes, workers, masters and index nodes, allowing worker nodes to autonomously sift through data, extracting relevant metadata, putting it into indexes that are then split (sharded) across multiple index nodes to increase search speed. The indexes are mostly held in memory, with attached NVMe storage to ensure high speed access. The cluster is managed by master nodes that keep track of all index nodes and worker nodes. The indexing cluster can grow organically, adding more workers and index nodes depending on current needs.

During indexing, data can be restructured, transformed and refined in multiple steps through an indexing pipeline. A pipeline can be designed for specific purposes and multiple pipelines can act on the same data. Thus, one pipeline can extract data related to QA of NGS-runs, to aggregate in a certain index, while another pipeline can extract variant information from the same data, to be used in a variant database index. Input data can come from a specific bucket in NGPr, multiple buckets, a single tenant, or several tenants. The indexing pipeline can be set to only work on new data coming into NGPr, ensuring that data is not continuously reindexed. Using specifically designed indexing pipelines, semi-structured data/metadata coming from a GMC can be refined into structured data following a certain standard such as OMOP or HL7/FHIR. All metadata-fields in an index can be restricted to certain users or user groups, thus, a comprehensive index can be split into multiple “views” based on the person or entity accessing the data, hiding metadata fields that are restricted.

Searches are performed using a Rest-API interface, almost universally through a graphical front end such as a webserver or service. Even though it is possible for a user to access the API and perform searches, it is not encouraged as the output is complex JSON-data.

To ensure high security, GMS has developed a federated login-solution based on SITHS-cards and the IDP of Inera. This enables access to NGP regardless of which region you work in, it also enables tracking and logging of individual searches within the platform.