This is the multi-page printable view of this section. Click here to print.
Customizing the SDaaS platform
1 - Architecture overview
The SDaaS platform is utilized for creating smart data platforms as backing services, empowering your applications to leverage the full potential of the Semantic Web.
The SDaaS platform out-of-the-box contains an extensible set of modules that connect to several Knowledge Graphs through optimized driver modules.
A smart data service conforms to the KEES specifications and is realized by a customization of the SDaaS docker image. It contains one or more scripts calling a set of commands implemented by modules. The command behavior can be modified through configuration variables.
node "SDaaS platform " as SDaaSDocker <<docker image>>{
component Module {
collections "commands" as Command
collections "configuration variables" as ConfigurationVariable
}
}
node "smart data service" as SDaaSService <<docker image>>{
component "SDaaS Script" as App
}
database "Knowledge Graphs" as GraphStore
cloud "Linked Data" as Data
SDaaSService ---|> SDaaSDocker : extends
Command ..(0 Data : HTTP
Command ..(0 GraphStore : SPARQL
Command o. ConfigurationVariable
App --> Command : calls
App --> ConfigurationVariable : set/get
It is possible to add a new modules to extend the power of the platform to match special needs.
Data Model
The SDaaS data model is designed around few concepts: Configuration Variables, Functions, Commands, and Modules.
SDaaS configuration variables
A SDaaS configuration variable is a bash environment variable that the platform uses as configuration option. Configuration variables have a default value; they can be changed statically in the Docker image or runtime in Docker run, Docker orchestration, or in user scripts.
The following taxonomy applies to SDaaS functions:
class "SDaaS Configuration variable" as ConfigVariable
class "SID Variable" as SidVariable
class "Platform Variable" as PlatformVariable <<read only>>
interface EnvironmentVariable
ConfigVariable --|> EnvironmentVariable
SidVariable --|> ConfigVariable
ConfigVariable <|- PlatformVariable
Environment Variable
It is a shell variable.
Platform variable
It is a variable defined by the SDaaS docker tha should not be changed outside the dokerfile.
SID Variable
It is a special configuration variable that states a graph store property. The general syntax is <sid>_<var name>
. For example the variable STORE_TYPE
refers to a driver module name that it must be used to access the graph store connected by the sid STORE
. Some driver can require/define other sid variables with their default values.
See all available configuration variables in the installation guide
SDaaS Functions
An SDaaS function is a bash function embedded in the platform. For example, sd_log
. A bash function accepts a fixed set of positional parameters, writes output on stdout, and returns 0 on success or an error code otherwise.
The following taxonomy applies to SDaaS functions:
Interface "Bash function" as BashFunction
Class "Sdaas function" as SdaasFunction
Class "Driver virtual function" as DriverVirtualFunction
Class "Driver function implementation" as DriverFunction
SdaasFunction --|> BashFunction
DriverFunction --|> SdaasFunction
DriverVirtualFunction --|> SdaasFunction
DriverFunction <- DriverVirtualFunction: calls
Bash Function
Is the interface of a generic function defined in the scope of a bash process.
Driver virtual function
It is a function that act as a proxy for a driver method, its first parameter is always the sid. A _driver virtual function has the syntax <sd_driver_<method name>
(e.g. sd_driver_load
) and has require a set of fixed position parameters.
Driver method implementation
It is a function that implements a driver virtual function for a specific graph store engine driver. A driver method implementation has the syntax <sd_<driver name>_<method name>
(e.g. sd_w3c_load
) and expects a set of fixed position parameters (unchecked). Driver function implementation functions should be called only by a driver virtual function.
SDaaS commands
A Command is a SDaaS function that conforms to the SDaaS command requirements. For example, sd_sparql_update
. A command writes output on stdout, logs on std error and returns 0 on success or an error code otherwise.
In a script, SDaaS commands should be called through the sd
function using the following syntax: sd <module name> <function name> [options] [operands]
. The sd
function allows a flexible output error management and auto-includes all required module. Direct calls to the command functions should be done only inside modules implementation.
For instance calling sd -A sparql update
executes the command function sd_sparql_update
including the sparql
module and aborting the script in case of error. This is equivalent to sd_include sparql; sd_sparql_update || sd_abort
The following taxonomy applies to commands:
Class Command
Class "Facts Provision" as DataProvidingCommand
Class "Ingestion Command" as IngestionCommand
Class "Query Command" as QueryCommand
Class "Learning Command" as LearningCommand
Class "Reasoning Command" as ReasoningCommand
Class "Enriching Command" as EnrichingCommand
Class "SDaaS function" as SDaaSFunction
Class "Store Command" as StoreCommand
Class "Compound Command" as CompoundCommand
Command --|>SDaaSFunction
DataProvidingCommand --|> Command
StoreCommand --|> Command
IngestionCommand --|> StoreCommand
QueryCommand --|> StoreCommand
LearningCommand -|> CompoundCommand
LearningCommand --|> DataProvidingCommand
LearningCommand --|> IngestionCommand
CompoundCommand <|- ReasoningCommand
ReasoningCommand --|> IngestionCommand
ReasoningCommand --|> QueryCommand
EnrichingCommand --|> ReasoningCommand
EnrichingCommand --|> LearningCommand
Compound Command
It is a command resulting from the composition of two or more commands, usually in a pipeline.
Facts Provision
It is a command that provides RDF triples in output.
Query Command
It is a command that can extract information from the knowledge graph.
Ingestion Command
It is a command that stores facts into a knowledge graph.
Reasoning Command
It is a command that both queries and ingests data into the same knowledge graph according to some rules.
Learning Command
It is a command that provides and ingests facts into the knowledge graph.
Enriching Command
It is a command that queries the knowledge base, discovers new data, and injects the results back into the knowledge base.
Store Command
It is a command that interact with a knowledge base. It accepts -s *SID*
and -D "sid=*SID*"
options.
SDaaS modules
A SDaaS module is a collection of commands and configuration variables that conforms to the module building requirements.
You can explicitly include a module content with the command sd_include
The module taxonomy is depicted in the following image:
class "SDaaS module" as Module
Abstract "Driver" as AbstractDriver
Class "Core Module" as Core
Class "Driver implementation" as DriverImplementation
Class "Command Module" as CommandModule
Class "Store module" as StoreModule
Core --|> Module
CommandModule --|> Module
AbstractDriver --|> Module
DriverImplementation <- AbstractDriver : calls
StoreModule --|> CommandModule
StoreModule --> AbstractDriver : includes
Core <- CommandModule : includes
Command Module
Modules that implement a set of related SDaaS command. They always include the Core Module and can depend from other modules.
Core Module
A module singleton exposes core commands and must be loaded before using any SDaaS feature.
Driver
A module singleton that exposes the the abstract Driver functions interface to handle connections with a generic graph store engine.
Driver implementation
Modules that implement the function interface exposed by the Abstract Driver for a specific graph store engine.
Store Module
Modules that export store commands that connects to a graph store using the functions exposed by the Driver module. A store module always includes the driver module.
The big picture
The resulting SDaaS platform data model big picture is:
package "User Application" {
class "User Application" as Application
}
package "SDaaS platform" #aliceblue;line:blue;line.dotted;text:blue {
class Command
class Module
class "SDaaS function" as SDaaSFunction
Abstract "Driver" as Driver
class ConfigVariable
Abstract "SID Variable" as SidVariable
interface "bash function" as Function
interface EnvironmentVariable
}
package "Backing services" {
interface "Backing service" as BakingService
interface "Graph Store" as GraphStore
interface "Knowledge Graph" as KnowledgeGraph
interface "Linked Data Platform" as RDFResource
}
package "smart data service" #aliceblue;line:blue;line.dotted;text:blue {
class "SDaaS script" as Script
class "smart data Service" as SDaaS
}
KnowledgeGraph : KEES compliance
GraphStore : SPARQL endpoint
RDFResource : URL
Command --|> SDaaSFunction
SDaaSFunction --|> Function
ConfigVariable --|> EnvironmentVariable
SidVariable --|> ConfigVariable
Driver --|> Module
GraphStore --|> BakingService
RDFResource --|> BakingService
KnowledgeGraph --|> GraphStore
BakingService <|-- SDaaS
Module *-- Command : exports
Module *-- ConfigVariable : declares
Command o.. RDFResource : learns
Driver --> KnowledgeGraph : connects
Module .> Module : includes
Function --o Script
EnvironmentVariable --o Script
Script -o SDaaS : contains
SidVariable <- Driver : uses
ConfigVariable .o Command : uses
Application ..> KnowledgeGraph : access
Backing service
It is a type of software that operates on servers, handling data storage, resource publishing , and processing tasks for an application.
Graph Store
It is backed service that provides support for SPARQL protocol and an optional support to Graph Store Protocol.
Knowledge Graph
It is a Graph Store compliant with the KEES specification.
Linked Data Platform
It is a web informative resource that exposes RDF data in one of supported serialization according to W3C LDP specifications.
SDaaS script
It is a bash script that uses SDaaS commands.
smart data service
It is a backing service that include the SDaaS platform and implements one or more SDaaS scripts.
User application
It is a (Sematic Web) Application that uses a knowledge graph.
2 - Module building
Conformance Notes
This chapter defines some conventions to extend the SDaaS platform.
Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words may not always appear in uppercase letters in this specification.
Module Requirements
An SDaaS module is a bash script file fragment that complies with the following restrictions:
- It MUST be placed in the
SDAAS_INSTALL_DIR
or in the modules directory in the$HOME/modules
directory. Modules defined in the$HOME/modules
directory take precedence over the ones defined in theSDAAS_INSTALL_DIR
directory. - Its name MUST match the regular expression
^[a-z][a-z0-9-]+$
. - The first line of the module MUST be
if [[ ! -z ${__module_MODULENAME+x} ]]; then return ; else __module_MODULENAME=1 ; fi
where MODULENAME is the name of the module. - all commands defined in the module SHOULD match sd_MODULENAME_FUNCTIONAME here the MODULENAME is the same name of the module that contains the command and FUNCTIONNAME is an unique name inside the module. To rewrite existing commands it is allowed but discouraged.
- all commands MUST follow the syntax conventions described below.
A module CAN contain:
- Constants (read only variables): constant SHOULD be prefixed by SDAAS_ and MUST be an unique name in the platform. Overrriding default
- Configuration variables: constant SHOULD be prefixed by SD_ and MUST be an unique name in the platform.
- Module commands definition.
- Module initialization: a set of bash commands that always runs on module loading
For example, see the modules implementation in SDaaS community edition
Command requirements
Commands MUST support the Utility Syntax Guidelines described in the Base Definitions volume of POSIX.1‐2017, Section 12.2, from section 3 to 10, inclusive.
All commands that accepts options and/or operands SHOULD accept also the option -D
. Such option is used for define local variables in the form of key-value that provides a context for the command.
Options SHOULD conform to the following naming conventions:
-A
- abort if the command returns is > 0 .
- -a, -D “accrual_method=ACCRUAL METHOD”
- accrual method, the method by which items are added to a collection. PUT and POST method SHOULD be recognized
- -f FILE NAME, -D “input_file=FILE NAME”
- to be use to refer a local file object with relative or absolute path
- -i INPUT FORMAT, -D “input_format=INPUT FORMAT”
- input format specification, these values SHOULD be recognized (from libraptor):
Format | Description |
---|---|
rdfxml | RDF/XML (default) |
ntriples | N-Triples |
turtle | Turtle Terse RDF Triple Language |
trig | TriG - Turtle with Named Graphs |
guess | Pick the parser to use using content type and URI |
rdfa | RDF/A via librdfa |
json | RDF/JSON (either Triples or Resource-Centric) |
nquads | N-Quads |
- -h
- prints an help description
- -o OUTPUT FORMAT, -D “output_format=OUTPUT FORMAT”
- output format specification This values SHOULD be recognized:
- csv
- csv-h
- csv-1
- csv-f1
- boolean
- tsl
- json
- ntriples
- xml
- turtle
- rdfxml
- test
- -p PRIORITY
- to be use to reference a priority
level | mnemonic | explanation |
---|---|---|
2 | CRITICAL | Should be corrected immediately, but indicates failure in a primary system - fix CRITICAL problems before ALERT - an example is loss of primary ISP connection. |
3 | ERROR | Non-urgent failures - these should be relayed to developers or admins; each item must be resolved within a given time. |
4 | WARNING | Warning messages - not an error, but indicated that an error will occur if action is not taken, e.g. file system 85% full - each item must be resolved within a given time. |
5 | NOTICE | Events that are unusual but not error conditions - might be summarized in an email to developers or admins to spot potential problems - no immediate action required. |
6 | INFORMATIONAL | Normal operational messages - may be harvested for reporting, measuring throughput, etc. - no action required. |
7 | DEBUG | Info is useful to developers for debugging the app, not useful during operations. |
- -s SID , “sid=SID”
- connect to Graph Store named SID (
STORE
by default) - -S SIZE
- to be use to reference a size
Evaluation of the local command context
The process of evaluating the local context for a command is as follows:
- First, the local context hardcoded in the command implementation is evaluated.
- The hardcoded local context can be overridden by the global configuration variable SD_DEFAULT_CONTEXT.
- The resulting context can be overridden by specific command options (e.g., -s SID). Command options are evaluated left to right
- The resulting context can be overridden by specific command operands.
For example, all these calls will have the same result:
sd_sparql_graph
: the hardcoded local context ingests data into a named graph inside the graph store connected to the sid STORE, using a generated UUID URI as the name of the named graph.sd_sparql_graph -s STORE $(sd_uuid)
: hardcoded local context overridden by specific options and operand.sd_sparql_graph -D "sid=STORE
: hardcoded local context overridden by the -D option.sd_sparql_graph -D "sid=STORE" -D "graph=$(sd_uuid)"
: the same as above but using multiple -D options (always evaluated left to right).SD_DEFAULT_CONTEXT="sid=STORE"; sd_sparql_graph $(sd_uuid)
: hardcoded local context overridden by SD_DEFAULT_CONTEXT and operand.SD_DEFAULT_CONTEXT="sid=XXX"; sd_sparql_graph -s YYY -D "sid=STORE graph=ZZZZ" $(sd_uuid)
: a silly command call that demonstrates the overriding precedence in local context evaluation.
3 - Driver building
Conformance Notes
This chapter defines some conventions to extend the SDaaS platform.
Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words may not always appear in uppercase letters in this specification.
Driver Requirements
- a driver MUST be a valid a [SDaaS module]({{ < ref “module building” >}})
- it MUST implements the driver interface with SDaaS functions that conforms to the following naming conventions:
sd_<driver name>_<driver method>
; for example if you want to implement a special the driver for thestardog
graph store you MUST implement thesd_stardog_query
function - all driver function MUST be positional, with no defaults. The validity check of the parameters is responsability of the caller (ie. the driver module)
- Following methods MUST be implemented:
method name( parameters ) | description |
---|---|
erase(sid) | erase the graph store |
load(sid, graph, accrual_method) | loads a stream of nTripes into a named graph in a graph store according the accrual policy " |
query(sid, mime_type) | execute in a graph store a SPARQL query read from std input, requesting in output one of the supported mime types |
size(sid) | return the number of triples in a store |
update(sid) | execute in a graph store a SPARQL update request read from std input |
All method parameters are strings that MUST matches following regular expressions:
- sid MUST match the
^[a-zA-Z]+$
regular expression and MUST point to an http(s) URL. It is assumed that sid is the name of a SID variable - graph MUST be a valid URI
- mime_type MUST be a valid mime type
- accrualMethod MUST match
PUT
orPOST
Implementing a new graph store driver
A driver is the implementation is described by the following UML class diagram:
package "SDaaS Community Edition" {
interface DriverInterface
DriverInterface : + erase(sid)
DriverInterface : + load(sid, graph, accrual_method)
DriverInterface : + query(sid, mime_type)
DriverInterface : + size(sid)
DriverInterface : + update(sid)
DriverInterface : + validate(sid)
class w3c
class testdriver
}
package "SDaaS Enterprise Edition" {
class blazegraph
class gsp
class neptune
}
testdriver ..|> DriverInterface
w3c ..|> DriverInterface
blazegraph --|> w3c
gsp --|> w3c
neptune --|> w3c
There is a reference implementation of the driver interface known as the w3c
driver compliant with W3C SPARQL 1.1 protocol specification, along with the testdriver
stub driver implementation to be used in unit tests. The SDaaS Enterprise Edition offers additional drivers that are specialized versions of the w3c
driver, optimized for specific graph store technologies:
- gsp driver: a
w3c
driver extension that uses the SPARQL 1.1 Graph Store HTTP Protocol. It defines the configuration variable<sid>_GSP_ENDPOINT
that contains the http address of the service providing a Graph Store Protocol interface. - blazegraph driver: a optimized implementation for Blazegraph graph store.
- neptune driver: a optimized implementation for AWS Neptune service.
In commands, do not call the driver method implementation function directly. Instead, call the corresponding abstract driver module functions.