Guide to Application building

best practice to build SDaaS applications.

SDaaS™ is a software platform that helps to build Semantic Web Applications and Smart Data Plaforms.

SDaaS assists in constructing and managing a knowledge graph that organizes linked data resources annotated with application-specific semantics.

Instead of accessing various data silos, applications utilize the knowledge graph as a reliable central repository, offering a semantic query service. This repository contains a semantic enriched replica of all the data necessary for the application. Because of the inherently distributed nature of the web and the continuous changes in data, an application using the SDaaS platform adopts the Eventual Consistency model. This model is highly popular today as it represents a reasonable trade-off between performance and complexity.

What is a semantic web application

A semantic web application is a software system designed to leverage and utilize the principles and technologies of the Semantic Web.

These applications utilize linked data, ontologies, and metadata to create richer connections between different pieces of information on the internet. They typically involve:

Structured Data Representation: Semantic web apps use RDF (Resource Description Framework) to represent data in a structured and machine-readable format. This allows for better understanding and interpretation of relationships between different data points.
Ontologies and Vocabularies: They employ ontologies and vocabularies (such as OWL - Web Ontology Language) to define relationships and meaning between entities, making it easier for systems to understand the context of the data.
Data Integration and Interoperability: These applications facilitate data integration from diverse sources, enabling different systems to exchange and use information more effectively.
Inference and Reasoning: Semantic web apps can perform logical inference and reasoning to derive new information or insights from existing data based on defined rules and relationships.
Enhanced Search and Discovery: They enable more sophisticated search functionalities by understanding the semantics of the data, providing more relevant and contextualized results.

In summary, a semantic web application harnesses Semantic Web technologies to enable machines to comprehend and process data more intelligently, facilitating better data integration, discovery, and utilization across various platforms and domains.

What is a smart data platform

A Smart Data Platform refers to a technological infrastructure designed to collect, process, analyze, and leverage data intelligently to generate insights, make decisions, and power various applications or services. These platforms often incorporate advanced technologies such as artificial intelligence (AI), machine learning (ML), data analytics, and automation to handle vast amounts of data from diverse sources.

A Smart Data Platform typically integrates multiple functionalities, including data ingestion, storage, processing, analysis, visualization, and often includes features for data governance, security, and compliance.

What is Eventual Consistency

Eventual Consistency is a concept in distributed computing where, in a system with multiple replicas of data, changes made to the data will eventually propagate through the system and all replicas will converge to the same state. However, this convergence is not instantaneous; it occurs over time due to factors like network latency, system failures, or concurrent updates. The Knowledge Graph can be considered as a semantically enriched replica of the ingested distributed data

The typical SDaaS user is a DevOps professional who utilizes the commands provided by the platform to script the building and updating of a knowledge graph. This knowledge graph is then queried by an application using SPARQL or REST APIs. SDaaS developers and system integrators can extend the platform by adding custom modules and creating new commands.

More in details the typical SDaaS use case scenario is summarized by the following diagram:

cloud "Linked Data cloud" as data
usecase "SDaas script\ndevelopment" as writesScript
usecase "smart data service\ndeployment" as managesSDaaS
usecase "application development" as developsApplication
usecase "queries\nKnowledge Graph" as usesKnowledge
usecase "installs\nSDaaS modules" as installsSDaaS
usecase "configure\nSDaaS" as configuresSDaaS
usecase "knowledge update" as updatesKnowledge
actor "App devops" as user
package "SDaaS distribution" as Distribution <<Docker image>>
node "smart data service" as SDaaS {
    component "SDaaS script" as Script
    package Module {
        component "SDaaS Command" as Command
        interface ConfigVariable
    }
}
database "Knowledge Graph" as Store
node Application

user .. developsApplication
user .. writesScript
user .. managesSDaaS
Command o-> ConfigVariable : uses
writesScript .. Script

managesSDaaS -- installsSDaaS
managesSDaaS -- configuresSDaaS
configuresSDaaS .. ConfigVariable 
installsSDaaS .. Module 
Command .. updatesKnowledge 
data . updatesKnowledge
updatesKnowledge . Store

Script o--> Command : calls
Distribution .. installsSDaaS

Application .. usesKnowledge

developsApplication .. Application
usesKnowledge .. Store

Calling SDaaS commands

The SDaaS Platform operates through a set of bash commands and functions. The general syntax to call a SDaaS command is sd <module> <name> [*OPTIONS*] [*OPERANDS*], while the syntax of an SDaaS function is sd_<name>.

The modules are bash script fragments that define a set of SDaaS functions, providing a namespace for them.

Before calling an SDaaS Function, you must explicitly load its module cache with sd_include <module> core function. Core functions are contained in the core module that is loaded at startup. SDaaS commands automatically include the required modules.

SDaaS commands MAY depend on a set of context variables you can pass using options.The global configuration variable SD_DEFAULT_CONTEXT provides a default local context used by all commands.

For instance these calls are all equivalent:

sd sparql graph urn:myapp:abox
sd sparql graph -s STORE -D "graph=urn:myapp:abox"
sd sparql graph -D "sid=STORE" -D "graph=urn:myapp:abox"
sd sparql graph -D "sid=STORE graph=urn:myapp:abox"
sd sparql graph -D "sid=OVERRDEN_BY-s graph=urn:myapp:overridden_by_operand"  -s STORE  urn:myapp:abox
SD_DEFAULT_CONTEXT="sid=STORE graph=urn:myapp:abox"; sd sparql graph

SDaaS scripting

The smart data service is usually includes SDaaS script and an application config file. The SDaaS script is normal bash scrips that include the SDaaS platform with the command source $SDAAS_INSTALL_DIR/core

Usually you create an application config file that contains the definition of the dataset and rules used by ingestion, reasoning and publishing plan. For instance:

#!/usr/bin/env bash
source $SDAAS_INSTALL_DIR/core

sd store erase

## loads the language profile and the application specific configurations
sd view ontology | sd sparql graph urn:tbox
sd_curl -s -f https://schema.org/version/latest/schemaorg-current-http.nt | sd sparql graph urn:tbox

## loading some facts from dbpedia
for ld in https://dbpedia.org/resource/Lecco https://dbpedia.org/resource/Milan; do
	sd_curl_rdf $ld | sd_rapper -g - $ld | sd sparql graph urn:abox
done

The script MAY implements a never-ending loop, similar to this pseudo-code using SDaaS Enterprise Edition Platform:

#!/usr/bin/env bash
source $SDAAS_INSTALL_DIR/core # Loads the SDaaS platform
while NEW_DATA_DISCOVERED ; do 

    # Boot and lock platform ######################
    sd kees boot -L 

    ## loads the language profile and the application specific configurations
    sd -A view ontology | sd sparql graph urn:myapp:tbox
	sd learn file /etc/myapp.config

    ## loading facts
    sd learn dataset -D "activity_type=Learning trust=0.9" urn:myapp:facts

    # reasoning window loop #########################
    sd plan loop -D "activity_type=Reasoning trust=0.9" urn:myapp:reasonings
	

    # publishing window  ########################
    sd -A plan run -D "activity_type=Publishing" urn:myapp:tests

    sd kees unlock

    sleep $TIME_SLOT
done

Application architectures enabled by SDaaS

In this chapter you find some typical architectures that are enabled by SDaaS

Use case 1: autonomous agent

an ETL agent that transform raw data into linked data:

Folder "Raw data" as Data
Folder "Linked data" as RDF

note right of RDF: RDF data according to\nan application language profile

package "ETL application" #aliceblue;line:blue;line.dotted;text:blue {
    node "Autonomous Agent" as aa #white;line:blue;line.dotted;text:blue
    database "graph store" as graphStore

    aa -(0 graphStore : run mapping\nrules

Data ..> aa
aa ..> RDF

The autonomous agent uses SDaaS to upload raw data in an intermediate form to a grap store and to use SPARQL rules to map the intermediate format into the application language profile.

Use case 2: linked data plaform

The SDaaS platform is used to implement an agents that transform and loads raw data into a knowledge graph, doing some ontology mappings and providing a Linked Data Platform interface to applications. It is compatible with a SOLID protocol

cloud "LinkedOpen Data Cloud" as Data

package "LOD smart cache" #aliceblue;line:blue;line.dotted;text:blue {
    node "Autonomous\nDiscovery Agent" as DiscoveryAgent #white;line:blue;line.dotted;text:blue
    database "graph store" as DataLake
    node "Linked data Proxy" as DataLakeProxy

    
    DiscoveryAgent -(0 DataLake : writes RDF data
    DataLake 0)- DataLakeProxy 
}

Data ..> DiscoveryAgent
DataLakeProxy -LDP

Linked-data proxy is a standard component providing support the VOiD ontology and HTTP cache features. Linked data center provides a free open source implementation that can be used out-of-the-box or as reference implementation for this component.

Use case 3: smart data warehouse

The typical SDaaS application architecture to build an RDF based data warehouse is the following:

cloud "1st, 2nd and 3rd-party raw data" as Data

package "data management platform" #aliceblue;line:blue;line.dotted;text:blue {
    node "Fact provider" as DiscoveryAgent #white;line:blue;line.dotted;text:blue
    folder "Linked-data lake" as DataLake
    node "smart data service" as SDaaSApplication #white;line:blue;text:blue
    database "RDF\nGraph Store" as GraphStore
    
    DiscoveryAgent --(0 DataLake : writes RDF data
    DataLake 0)-- SDaaSApplication : learn data
    SDaaSApplication --(0 GraphStore : updates


    note right of DiscoveryAgent
    Here the application injects
    its specific semantic in raw data
    end note

    note left of SDaaSApplication
    Here KEES cycle
	is implemented
    end note
}

interface "SPARQL QUERY" AS SQ
GraphStore - SQ

package "application" {
    node "Application backend" as Backend
    node "Application frontend" as Frontend
    database "application local data" as firstPartyData

 
    Backend 0)- Frontend : calls
    Backend --( SQ : queries
    firstPartyData 0)-- Backend  : writes
}

Data ..> DiscoveryAgent : gets raw data
Data <..... firstPartyData : copy 1st-party data

You can distinguish two distinct threads: the development of a data management platform and the development of the application. The knowledge graph built in the data platform is used by the application as the primary source of all data. The data produced by the application can be reinjected into the data management platform.

The SDaaS platform is used in the development of the data management platform, primarily in the development of the smart data service and optionally in the Autonomous Discovery Agent.

More in detail, the main components of the data platform are:

Autonomous Discovery Agent: its an application-specific ETL process triggered by changes in data. This process transforms raw data into linked data annotated with information recognized by the application and stores it in a linked-data lake. Multiple Autonomous Discovery Agents may exist and operate concurrently. Each agent can access the Graph Store to identify enrichment opportunities or to detect changes in the data source.
Linked-data lake: it is a repository, for example an s3 bucket or a shared file system, that contains RDF files, that is Linked Data Platform RDF Sources [LDP-RS] expressed with a known language profile. This files can be mirrors or existing web resources, mappings of databases or even private data described natively in RDF.
smart data service: it is a service that includes the SDaaS platform and that contains a script processing data conforming with the KEES specifications.
RDF Graph Store: implements the Knowledge Graph supporting the SPARQL protocol interface. Linked data center provides a free full featured RDF graph database you can use to learn and thes the SDaaS platform

Use case 4: Smart data agent

All activities ar performed by the same agent that embeds its workflow

cloud "3rd-party data" as ldc
database "Knowledge Graph" as GraphStore
folder "Reports" as RawData

Folder "Linked data lake" as ldl

node "Smart data agent" as agent #white;line:blue;text:blue

ldl --> agent
ldl <-- agent
ldc -> agent

agent --> GraphStore
agent -> RawData

The workflow is just a definition of activities that should be completed by agent

Use case 5: semantic web agency

In this architecture, multiple agents run at the same time, agents coordinated using knowledge graph status and locks.

cloud "3rd-party data" as ldc
database "Knowledge Graph" as GraphStore
folder "reports" as RawData


note bottom of RawData: can be used as raw data

Folder "Linked data lake" as ldl
note top of ldl: contains activity plans

package "Agency" #aliceblue;line:blue;line.dotted;text:blue {
    node "Smart data agent" as Ingestor #white;line:blue;text:blue
    node "Reasoning\nagent(s)" as Reasoner #white;line:blue;line.dotted;text:blue
    node "Enriching\nagent(s)" as Enricher #white;line:blue;line.dotted;text:blue
    node "Publishing\nagent(s)" as Publisher #white;line:blue;line.dotted;text:blue
}

ldl --> Ingestor
ldl <-- Enricher
ldc --> Enricher

Ingestor --> GraphStore
Reasoner <--> GraphStore
Publisher <-- GraphStore
Enricher <-- GraphStore
Publisher --> RawData

The workflow is just a definition of activities that should be completed by agent

Last modified May 29, 2024: updated (2a0ec1b)