SDaaS™ is a software platform that helps to build Semantic Web Applications and Smart Data Plaforms.
SDaaS assists in constructing and managing a knowledge graph that organizes linked data resources annotated with application-specific semantics.
Instead of accessing various data silos, applications utilize the knowledge graph as a reliable central repository, offering a semantic query service. This repository contains a semantic enriched replica of all the data necessary for the application. Because of the inherently distributed nature of the web and the continuous changes in data, an application using the SDaaS platform adopts the Eventual Consistency model. This model is highly popular today as it represents a reasonable trade-off between performance and complexity.
What is a semantic web application
A semantic web application is a software system designed to leverage and utilize the principles and technologies of the Semantic Web.
These applications utilize linked data, ontologies, and metadata to create richer connections between different pieces of information on the internet. They typically involve:
Structured Data Representation: Semantic web apps use RDF (Resource Description Framework) to represent data in a structured and machine-readable format. This allows for better understanding and interpretation of relationships between different data points.
Ontologies and Vocabularies: They employ ontologies and vocabularies (such as OWL - Web Ontology Language) to define relationships and meaning between entities, making it easier for systems to understand the context of the data.
Data Integration and Interoperability: These applications facilitate data integration from diverse sources, enabling different systems to exchange and use information more effectively.
Inference and Reasoning: Semantic web apps can perform logical inference and reasoning to derive new information or insights from existing data based on defined rules and relationships.
Enhanced Search and Discovery: They enable more sophisticated search functionalities by understanding the semantics of the data, providing more relevant and contextualized results.
In summary, a semantic web application harnesses Semantic Web technologies to enable machines to comprehend and process data more intelligently, facilitating better data integration, discovery, and utilization across various platforms and domains.
What is a smart data platform
A Smart Data Platform refers to a technological infrastructure designed to collect, process, analyze, and leverage data intelligently to generate insights, make decisions, and power various applications or services. These platforms often incorporate advanced technologies such as artificial intelligence (AI), machine learning (ML), data analytics, and automation to handle vast amounts of data from diverse sources.
A Smart Data Platform typically integrates multiple functionalities, including data ingestion, storage, processing, analysis, visualization, and often includes features for data governance, security, and compliance.
What is Eventual Consistency
Eventual Consistency is a concept in distributed computing where, in a system with multiple replicas of data, changes made to the data will eventually propagate through the system and all replicas will converge to the same state. However, this convergence is not instantaneous; it occurs over time due to factors like network latency, system failures, or concurrent updates. The Knowledge Graph can be considered as a semantically enriched replica of the ingested distributed data
The typical SDaaS user is a DevOps professional who utilizes the commands provided by the platform to script the building and updating of a knowledge graph. This knowledge graph is then queried by an application using SPARQL or REST APIs. SDaaS developers and system integrators can extend the platform by adding custom modules and creating new commands.
More in details the typical SDaaS use case scenario is summarized by the following diagram:
cloud "Linked Data cloud" as data
usecase "SDaas script\ndevelopment" as writesScript
usecase "smart data service\ndeployment" as managesSDaaS
usecase "application development" as developsApplication
usecase "queries\nKnowledge Graph" as usesKnowledge
usecase "installs\nSDaaS modules" as installsSDaaS
usecase "configure\nSDaaS" as configuresSDaaS
usecase "knowledge update" as updatesKnowledge
actor "App devops" as user
package "SDaaS distribution" as Distribution <<Docker image>>
node "smart data service" as SDaaS {
component "SDaaS script" as Script
package Module {
component "SDaaS Command" as Command
interface ConfigVariable
}
}
database "Knowledge Graph" as Store
node Application
user .. developsApplication
user .. writesScript
user .. managesSDaaS
Command o-> ConfigVariable : uses
writesScript .. Script
managesSDaaS -- installsSDaaS
managesSDaaS -- configuresSDaaS
configuresSDaaS .. ConfigVariable
installsSDaaS .. Module
Command .. updatesKnowledge
data . updatesKnowledge
updatesKnowledge . Store
Script o--> Command : calls
Distribution .. installsSDaaS
Application .. usesKnowledge
developsApplication .. Application
usesKnowledge .. Store
Calling SDaaS commands
The SDaaS Platform operates through a set of bash commands and functions. The general syntax to call a SDaaS command is sd <module> <name> [*OPTIONS*] [*OPERANDS*]
, while the syntax of an SDaaS function is sd_<name>
.
The modules are bash script fragments that define a set of SDaaS functions, providing a namespace for them.
Before calling an SDaaS Function, you must explicitly load its module cache with sd_include <module>
core function. Core functions are contained in the core module that is loaded at startup. SDaaS commands automatically include the required modules.
SDaaS commands MAY depend on a set of context variables you can pass using options.The global configuration variable SD_DEFAULT_CONTEXT
provides a default local context used by all commands.
For instance these calls are all equivalent:
sd sparql graph urn:myapp:abox
sd sparql graph -s STORE -D "graph=urn:myapp:abox"
sd sparql graph -D "sid=STORE" -D "graph=urn:myapp:abox"
sd sparql graph -D "sid=STORE graph=urn:myapp:abox"
sd sparql graph -D "sid=OVERRDEN_BY-s graph=urn:myapp:overridden_by_operand" -s STORE urn:myapp:abox
SD_DEFAULT_CONTEXT="sid=STORE graph=urn:myapp:abox"; sd sparql graph
SDaaS scripting
The smart data service is usually includes SDaaS script and an application config file.
The SDaaS script is normal bash scrips that include the SDaaS platform with the command source $SDAAS_INSTALL_DIR/core
Usually you create an application config file that contains the definition of the dataset and rules used by ingestion, reasoning and publishing plan. For instance:
#!/usr/bin/env bash
source $SDAAS_INSTALL_DIR/core
sd store erase
## loads the language profile and the application specific configurations
sd view ontology | sd sparql graph urn:tbox
sd_curl -s -f https://schema.org/version/latest/schemaorg-current-http.nt | sd sparql graph urn:tbox
## loading some facts from dbpedia
for ld in https://dbpedia.org/resource/Lecco https://dbpedia.org/resource/Milan; do
sd_curl_rdf $ld | sd_rapper -g - $ld | sd sparql graph urn:abox
done
The script MAY implements a never-ending loop, similar to this pseudo-code using SDaaS Enterprise Edition Platform:
#!/usr/bin/env bash
source $SDAAS_INSTALL_DIR/core # Loads the SDaaS platform
while NEW_DATA_DISCOVERED ; do
# Boot and lock platform ######################
sd kees boot -L
## loads the language profile and the application specific configurations
sd -A view ontology | sd sparql graph urn:myapp:tbox
sd learn file /etc/myapp.config
## loading facts
sd learn dataset -D "activity_type=Learning trust=0.9" urn:myapp:facts
# reasoning window loop #########################
sd plan loop -D "activity_type=Reasoning trust=0.9" urn:myapp:reasonings
# publishing window ########################
sd -A plan run -D "activity_type=Publishing" urn:myapp:tests
sd kees unlock
sleep $TIME_SLOT
done
Application architectures enabled by SDaaS
In this chapter you find some typical architectures that are enabled by SDaaS
Use case 1: autonomous agent
an ETL agent that transform raw data into linked data:
Folder "Raw data" as Data
Folder "Linked data" as RDF
note right of RDF: RDF data according to\nan application language profile
package "ETL application" #aliceblue;line:blue;line.dotted;text:blue {
node "Autonomous Agent" as aa #white;line:blue;line.dotted;text:blue
database "graph store" as graphStore
aa -(0 graphStore : run mapping\nrules
Data ..> aa
aa ..> RDF
The autonomous agent uses SDaaS to upload raw data in an intermediate form to a grap store and to use SPARQL rules to map the intermediate format into the application language profile.
Use case 2: linked data plaform
The SDaaS platform is used to implement an agents that transform and loads raw data into a knowledge graph, doing some ontology mappings and providing a Linked Data Platform interface to applications. It is compatible with a SOLID protocol
cloud "LinkedOpen Data Cloud" as Data
package "LOD smart cache" #aliceblue;line:blue;line.dotted;text:blue {
node "Autonomous\nDiscovery Agent" as DiscoveryAgent #white;line:blue;line.dotted;text:blue
database "graph store" as DataLake
node "Linked data Proxy" as DataLakeProxy
DiscoveryAgent -(0 DataLake : writes RDF data
DataLake 0)- DataLakeProxy
}
Data ..> DiscoveryAgent
DataLakeProxy -LDP
Linked-data proxy is a standard component providing support the VOiD ontology and HTTP cache features. Linked data center provides a free open source implementation that can be used out-of-the-box or as reference implementation for this component.
Use case 3: smart data warehouse
The typical SDaaS application architecture to build an RDF based data warehouse is the following:
cloud "1st, 2nd and 3rd-party raw data" as Data
package "data management platform" #aliceblue;line:blue;line.dotted;text:blue {
node "Fact provider" as DiscoveryAgent #white;line:blue;line.dotted;text:blue
folder "Linked-data lake" as DataLake
node "smart data service" as SDaaSApplication #white;line:blue;text:blue
database "RDF\nGraph Store" as GraphStore
DiscoveryAgent --(0 DataLake : writes RDF data
DataLake 0)-- SDaaSApplication : learn data
SDaaSApplication --(0 GraphStore : updates
note right of DiscoveryAgent
Here the application injects
its specific semantic in raw data
end note
note left of SDaaSApplication
Here KEES cycle
is implemented
end note
}
interface "SPARQL QUERY" AS SQ
GraphStore - SQ
package "application" {
node "Application backend" as Backend
node "Application frontend" as Frontend
database "application local data" as firstPartyData
Backend 0)- Frontend : calls
Backend --( SQ : queries
firstPartyData 0)-- Backend : writes
}
Data ..> DiscoveryAgent : gets raw data
Data <..... firstPartyData : copy 1st-party data
You can distinguish two distinct threads: the development of a data management platform and the development of the application. The knowledge graph built in the data platform is used by the application as the primary source of all data. The data produced by the application can be reinjected into the data management platform.
The SDaaS platform is used in the development of the data management platform, primarily in the development of the smart data service and optionally in the Autonomous Discovery Agent.
More in detail, the main components of the data platform are:
- Autonomous Discovery Agent
- its an application-specific ETL process triggered by changes in data. This process transforms raw data into linked data annotated with information recognized by the application and stores it in a linked-data lake. Multiple Autonomous Discovery Agents may exist and operate concurrently. Each agent can access the Graph Store to identify enrichment opportunities or to detect changes in the data source.
- Linked-data lake
- it is a repository, for example an s3 bucket or a shared file system, that contains RDF files, that is Linked Data Platform RDF Sources [LDP-RS] expressed with a known language profile. This files can be mirrors or existing web resources, mappings of databases or even private data described natively in RDF.
- smart data service
- it is a service that includes the SDaaS platform and that contains a script processing data conforming with the KEES specifications.
- RDF Graph Store
- implements the Knowledge Graph supporting the SPARQL protocol interface. Linked data center provides a free full featured RDF graph database you can use to learn and thes the SDaaS platform
Use case 4: Smart data agent
All activities ar performed by the same agent that embeds its workflow
cloud "3rd-party data" as ldc
database "Knowledge Graph" as GraphStore
folder "Reports" as RawData
Folder "Linked data lake" as ldl
node "Smart data agent" as agent #white;line:blue;text:blue
ldl --> agent
ldl <-- agent
ldc -> agent
agent --> GraphStore
agent -> RawData
The workflow is just a definition of activities that should be completed by agent
Use case 5: semantic web agency
In this architecture, multiple agents run at the same time, agents coordinated using knowledge graph status and locks.
cloud "3rd-party data" as ldc
database "Knowledge Graph" as GraphStore
folder "reports" as RawData
note bottom of RawData: can be used as raw data
Folder "Linked data lake" as ldl
note top of ldl: contains activity plans
package "Agency" #aliceblue;line:blue;line.dotted;text:blue {
node "Smart data agent" as Ingestor #white;line:blue;text:blue
node "Reasoning\nagent(s)" as Reasoner #white;line:blue;line.dotted;text:blue
node "Enriching\nagent(s)" as Enricher #white;line:blue;line.dotted;text:blue
node "Publishing\nagent(s)" as Publisher #white;line:blue;line.dotted;text:blue
}
ldl --> Ingestor
ldl <-- Enricher
ldc --> Enricher
Ingestor --> GraphStore
Reasoner <--> GraphStore
Publisher <-- GraphStore
Enricher <-- GraphStore
Publisher --> RawData
The workflow is just a definition of activities that should be completed by agent