Guide to Application building
SDaaS™ is a software platform that helps to build Semantic Web Applications and Smart Data Plaforms.
SDaaS assists in constructing and managing a knowledge graph that organizes linked data resources annotated with application-specific semantics.
Instead of accessing various data silos, applications utilize the knowledge graph as a reliable central repository, offering a semantic query service. This repository contains a semantic enriched replica of all the data necessary for the application. Because of the inherently distributed nature of the web and the continuous changes in data, an application using the SDaaS platform adopts the Eventual Consistency model. This model is highly popular today as it represents a reasonable trade-off between performance and complexity.
What is a semantic web application
A semantic web application is a software system designed to leverage and utilize the principles and technologies of the Semantic Web.
These applications utilize linked data, ontologies, and metadata to create richer connections between different pieces of information on the internet. They typically involve:
Structured Data Representation: Semantic web apps use RDF (Resource Description Framework) to represent data in a structured and machine-readable format. This allows for better understanding and interpretation of relationships between different data points.
Ontologies and Vocabularies: They employ ontologies and vocabularies (such as OWL - Web Ontology Language) to define relationships and meaning between entities, making it easier for systems to understand the context of the data.
Data Integration and Interoperability: These applications facilitate data integration from diverse sources, enabling different systems to exchange and use information more effectively.
Inference and Reasoning: Semantic web apps can perform logical inference and reasoning to derive new information or insights from existing data based on defined rules and relationships.
Enhanced Search and Discovery: They enable more sophisticated search functionalities by understanding the semantics of the data, providing more relevant and contextualized results.
In summary, a semantic web application harnesses Semantic Web technologies to enable machines to comprehend and process data more intelligently, facilitating better data integration, discovery, and utilization across various platforms and domains.
What is a smart data platform
A Smart Data Platform refers to a technological infrastructure designed to collect, process, analyze, and leverage data intelligently to generate insights, make decisions, and power various applications or services. These platforms often incorporate advanced technologies such as artificial intelligence (AI), machine learning (ML), data analytics, and automation to handle vast amounts of data from diverse sources.
A Smart Data Platform typically integrates multiple functionalities, including data ingestion, storage, processing, analysis, visualization, and often includes features for data governance, security, and compliance.
What is Eventual Consistency
Eventual Consistency is a concept in distributed computing where, in a system with multiple replicas of data, changes made to the data will eventually propagate through the system and all replicas will converge to the same state. However, this convergence is not instantaneous; it occurs over time due to factors like network latency, system failures, or concurrent updates. The Knowledge Graph can be considered as a semantically enriched replica of the ingested distributed data
The typical SDaaS user is a DevOps professional who utilizes the commands provided by the platform to script the building and updating of a knowledge graph. This knowledge graph is then queried by an application using SPARQL or REST APIs. SDaaS developers and system integrators can extend the platform by adding custom modules and creating new commands.
More in details the typical SDaaS use case scenario is summarized by the following diagram:
Calling SDaaS commands
The SDaaS Platform operates through a set of bash commands and functions. The general syntax to call a SDaaS command is sd <module> <name> [*OPTIONS*] [*OPERANDS*]
, while the syntax of an SDaaS function is sd_<name>
.
The modules are bash script fragments that define a set of SDaaS functions, providing a namespace for them.
Before calling an SDaaS Function, you must explicitly load its module cache with sd_include <module>
core function. Core functions are contained in the core module that is loaded at startup. SDaaS commands automatically include the required modules.
SDaaS commands MAY depend on a set of context variables you can pass using options.The global configuration variable SD_DEFAULT_CONTEXT
provides a default local context used by all commands.
For instance these calls are all equivalent:
sd sparql graph urn:myapp:abox
sd sparql graph -s STORE -D "graph=urn:myapp:abox"
sd sparql graph -D "sid=STORE" -D "graph=urn:myapp:abox"
sd sparql graph -D "sid=STORE graph=urn:myapp:abox"
sd sparql graph -D "sid=OVERRDEN_BY-s graph=urn:myapp:overridden_by_operand" -s STORE urn:myapp:abox
SD_DEFAULT_CONTEXT="sid=STORE graph=urn:myapp:abox"; sd sparql graph
SDaaS scripting
The smart data service is usually includes SDaaS script and an application config file.
The SDaaS script is normal bash scrips that include the SDaaS platform with the command source $SDAAS_INSTALL_DIR/core
Usually you create an application config file that contains the definition of the dataset and rules used by ingestion, reasoning and publishing plan. For instance:
#!/usr/bin/env bash
source $SDAAS_INSTALL_DIR/core
sd store erase
## loads the language profile and the application specific configurations
sd view ontology | sd sparql graph urn:tbox
sd_curl -s -f https://schema.org/version/latest/schemaorg-current-http.nt | sd sparql graph urn:tbox
## loading some facts from dbpedia
for ld in https://dbpedia.org/resource/Lecco https://dbpedia.org/resource/Milan; do
sd_curl_rdf $ld | sd_rapper -g - $ld | sd sparql graph urn:abox
done
The script MAY implements a never-ending loop, similar to this pseudo-code using SDaaS Enterprise Edition Platform:
#!/usr/bin/env bash
source $SDAAS_INSTALL_DIR/core # Loads the SDaaS platform
while NEW_DATA_DISCOVERED ; do
# Boot and lock platform ######################
sd kees boot -L
## loads the language profile and the application specific configurations
sd -A view ontology | sd sparql graph urn:myapp:tbox
sd learn file /etc/myapp.config
## loading facts
sd learn dataset -D "activity_type=Learning trust=0.9" urn:myapp:facts
# reasoning window loop #########################
sd plan loop -D "activity_type=Reasoning trust=0.9" urn:myapp:reasonings
# publishing window ########################
sd -A plan run -D "activity_type=Publishing" urn:myapp:tests
sd kees unlock
sleep $TIME_SLOT
done
Application architectures enabled by SDaaS
In this chapter you find some typical architectures that are enabled by SDaaS
Use case 1: autonomous agent
an ETL agent that transform raw data into linked data:
The autonomous agent uses SDaaS to upload raw data in an intermediate form to a grap store and to use SPARQL rules to map the intermediate format into the application language profile.
Use case 2: linked data plaform
The SDaaS platform is used to implement an agents that transform and loads raw data into a knowledge graph, doing some ontology mappings and providing a Linked Data Platform interface to applications. It is compatible with a SOLID protocol
Linked-data proxy is a standard component providing support the VOiD ontology and HTTP cache features. Linked data center provides a free open source implementation that can be used out-of-the-box or as reference implementation for this component.
Use case 3: smart data warehouse
The typical SDaaS application architecture to build an RDF based data warehouse is the following:
You can distinguish two distinct threads: the development of a data management platform and the development of the application. The knowledge graph built in the data platform is used by the application as the primary source of all data. The data produced by the application can be reinjected into the data management platform.
The SDaaS platform is used in the development of the data management platform, primarily in the development of the smart data service and optionally in the Autonomous Discovery Agent.
More in detail, the main components of the data platform are:
- Autonomous Discovery Agent
- its an application-specific ETL process triggered by changes in data. This process transforms raw data into linked data annotated with information recognized by the application and stores it in a linked-data lake. Multiple Autonomous Discovery Agents may exist and operate concurrently. Each agent can access the Graph Store to identify enrichment opportunities or to detect changes in the data source.
- Linked-data lake
- it is a repository, for example an s3 bucket or a shared file system, that contains RDF files, that is Linked Data Platform RDF Sources [LDP-RS] expressed with a known language profile. This files can be mirrors or existing web resources, mappings of databases or even private data described natively in RDF.
- smart data service
- it is a service that includes the SDaaS platform and that contains a script processing data conforming with the KEES specifications.
- RDF Graph Store
- implements the Knowledge Graph supporting the SPARQL protocol interface. Linked data center provides a free full featured RDF graph database you can use to learn and thes the SDaaS platform
Use case 4: Smart data agent
All activities ar performed by the same agent that embeds its workflow
The workflow is just a definition of activities that should be completed by agent
Use case 5: semantic web agency
In this architecture, multiple agents run at the same time, agents coordinated using knowledge graph status and locks.
The workflow is just a definition of activities that should be completed by agent