ABSTAT (Knowledge Graph Profiling with ABstraction and STATistics) is a framework that computes and provides access to semantic profiles of Knowledge Graph represented in RDF.

These profiles, also named “summaries”, describe the content of RDF datasets in a synthetic manner, and have proved to be helpful for a variety of application domains such as data understanding, quality assessment, analytical modeling, vocabulary suggestion.

The core profiling primitives in ABSTAT are schema-level patterns, or, more simply, patterns, which are extracted from the Knowledge Graph and for which a number of statistics are computed. A pattern is a triple < Type1, P, Type2 > that tells there are instances of Type1 linked to instances of Type2  with the property P. The term type refers to either an ontology class (e.g., foaf:Person) or a datatype (e.g., xsd:DateTime). ABSTAT provides a minimalization mechanism to make the profile more compact without sacrificing expressiveness when the ontology used in the Knowledge Graph is considered. In addition to patterns, profiles provide several statistics for the patterns and their constituents, i.e., frequency of types and properties, frequency of patterns and cardinality descriptors.

Users can compute and control ABSTAT profiles using a web application (ABSTAT Backend), and explore the profiles using a web application (ABSTAT Frontend). ABSTAT APIs supports access to profiles for web applications.

Features

Features available before the project:

• Core profiling algorithms: algorithms for the extraction of patterns and the computation of pattern-specific statistics.
• ABSTAT Frontend: web application to access profiles (ABSTAT Browse, ABSTAT Search, ASBATAT Query).

Features developed during the project:

• ABSTAT Frontend: user-oriented web application to control the profiling process and compute, store and manage profiles.
• Cardinality descriptors’ algorithms: the algorithms to compute cardinality descriptors.
• ABSTAT Distributed Architecture: the distributed architecture used to control, compute, and store profiles, as well as to represent and share them via the HTTP protocol. In particular, the new architecture uses NoSQL database to store profiles for structured access and ElasticSearch for full-text search over the profiles.
• ABSTAT APIs: API to make profiles accessible for machines, including vocabulary suggestion APIs used in the ASIA semantic table annotation tool.

Usage in euBusinessGraph

ABSTAT provides profiles of company data already published in the business graph that are used by ASIA in order to provide mapping suggestions that makes it easier for new data publishers to map their data to the euBusinessGraph company data model.

Technical information

Contact person
matteo.palmonari@unimib.it