This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Customizing the SDaaS platform

Learn how to tailor the platform to your needs

1 - Architecture overview

The components of the SDaaS platform.Start from here.

The SDaaS platform is utilized for creating smart data platforms as backing services, empowering your applications to leverage the full potential of the Semantic Web.

The SDaaS platform out-of-the-box contains an extensible set of modules that connect to several Knowledge Graphs through optimized driver modules.

A smart data service conforms to the KEES specifications and is realized by a customization of the SDaaS docker image. It contains one or more scripts calling a set of commands implemented by modules. The command behavior can be modified through configuration variables.

node "SDaaS platform " as SDaaSDocker <<docker image>>{
    component Module {
        collections "commands" as Command
        collections "configuration variables" as ConfigurationVariable
    }
}
node "smart data service" as SDaaSService <<docker image>>{
    component "SDaaS Script" as App
}
database "Knowledge Graphs" as GraphStore
cloud "Linked Data" as Data


SDaaSService ---|> SDaaSDocker : extends
Command ..(0 Data : HTTP
Command ..(0 GraphStore : SPARQL
Command o. ConfigurationVariable
App --> Command : calls
App --> ConfigurationVariable : set/get

It is possible to add a new modules to extend the power of the platform to match special needs.

Data Model

The SDaaS data model is designed around few concepts: Configuration Variables, Functions, Commands, and Modules.

SDaaS configuration variables

A SDaaS configuration variable is a bash environment variable that the platform uses as configuration option. Configuration variables have a default value; they can be changed statically in the Docker image or runtime in Docker run, Docker orchestration, or in user scripts.

The following taxonomy applies to SDaaS functions:

class "SDaaS Configuration variable" as ConfigVariable
class "SID Variable" as SidVariable
class "Platform Variable" as PlatformVariable <<read only>>
interface EnvironmentVariable

ConfigVariable --|> EnvironmentVariable
SidVariable --|> ConfigVariable
ConfigVariable <|- PlatformVariable

Environment Variable

It is a shell variable.

Platform variable

It is a variable defined by the SDaaS docker tha should not be changed outside the dokerfile.

SID Variable

It is a special configuration variable that states a graph store property. The general syntax is <sid>_<var name>. For example the variable STORE_TYPE refers to a driver module name that it must be used to access the graph store connected by the sid STORE. Some driver can require/define other sid variables with their default values.

See all available configuration variables in the installation guide

SDaaS Functions

An SDaaS function is a bash function embedded in the platform. For example, sd_log. A bash function accepts a fixed set of positional parameters, writes output on stdout, and returns 0 on success or an error code otherwise.

The following taxonomy applies to SDaaS functions:

Interface "Bash function" as BashFunction 
Class "Sdaas function" as SdaasFunction
Class "Driver virtual function" as DriverVirtualFunction
Class "Driver function implementation" as DriverFunction
SdaasFunction --|> BashFunction
DriverFunction --|> SdaasFunction
DriverVirtualFunction --|> SdaasFunction

DriverFunction <- DriverVirtualFunction: calls

Bash Function

Is the interface of a generic function defined in the scope of a bash process.

Driver virtual function

It is a function that act as a proxy for a driver method, its first parameter is always the sid. A _driver virtual function has the syntax <sd_driver_<method name> (e.g. sd_driver_load) and has require a set of fixed position parameters.

Driver method implementation

It is a function that implements a driver virtual function for a specific graph store engine driver. A driver method implementation has the syntax <sd_<driver name>_<method name> (e.g. sd_w3c_load) and expects a set of fixed position parameters (unchecked). Driver function implementation functions should be called only by a driver virtual function.

SDaaS commands

A Command is a SDaaS function that conforms to the SDaaS command requirements. For example, sd_sparql_update. A command writes output on stdout, logs on std error and returns 0 on success or an error code otherwise.

In a script, SDaaS commands should be called through the sd function using the following syntax: sd <module name> <function name> [options] [operands]. The sd function allows a flexible output error management and auto-includes all required module. Direct calls to the command functions should be done only inside modules implementation.

For instance calling sd -A sparql update executes the command function sd_sparql_update including the sparql module and aborting the script in case of error. This is equivalent to sd_include sparql; sd_sparql_update || sd_abort

The following taxonomy applies to commands:

Class Command
Class "Facts Provision" as DataProvidingCommand
Class "Ingestion Command" as IngestionCommand
Class "Query Command" as QueryCommand
Class "Learning Command" as LearningCommand
Class "Reasoning Command" as ReasoningCommand
Class "Enriching Command" as EnrichingCommand
Class "SDaaS function" as SDaaSFunction
Class "Store Command" as StoreCommand
Class "Compound Command" as CompoundCommand

Command --|>SDaaSFunction
DataProvidingCommand --|> Command
StoreCommand --|> Command
IngestionCommand --|> StoreCommand
QueryCommand --|> StoreCommand
LearningCommand -|> CompoundCommand
LearningCommand --|> DataProvidingCommand
LearningCommand --|> IngestionCommand
CompoundCommand <|- ReasoningCommand
ReasoningCommand --|> IngestionCommand
ReasoningCommand --|> QueryCommand
EnrichingCommand --|> ReasoningCommand
EnrichingCommand --|> LearningCommand

Compound Command

It is a command resulting from the composition of two or more commands, usually in a pipeline.

Facts Provision

It is a command that provides RDF triples in output.

Query Command

It is a command that can extract information from the knowledge graph.

Ingestion Command

It is a command that stores facts into a knowledge graph.

Reasoning Command

It is a command that both queries and ingests data into the same knowledge graph according to some rules.

Learning Command

It is a command that provides and ingests facts into the knowledge graph.

Enriching Command

It is a command that queries the knowledge base, discovers new data, and injects the results back into the knowledge base.

Store Command

It is a command that interact with a knowledge base. It accepts -s *SID* and -D "sid=*SID*" options.

SDaaS modules

A SDaaS module is a collection of commands and configuration variables that conforms to the module building requirements.

You can explicitly include a module content with the command sd_include

The module taxonomy is depicted in the following image:

class "SDaaS module" as Module
Abstract "Driver" as AbstractDriver
Class "Core Module"  as Core
Class "Driver implementation" as DriverImplementation
Class "Command Module" as CommandModule
Class "Store module" as StoreModule

Core --|> Module
CommandModule --|> Module
AbstractDriver --|> Module
DriverImplementation <- AbstractDriver : calls
StoreModule --|> CommandModule
StoreModule --> AbstractDriver : includes
Core <- CommandModule : includes

Command Module

Modules that implement a set of related SDaaS command. They always include the Core Module and can depend from other modules.

Core Module

A module singleton exposes core commands and must be loaded before using any SDaaS feature.

Driver

A module singleton that exposes the the abstract Driver functions interface to handle connections with a generic graph store engine.

Driver implementation

Modules that implement the function interface exposed by the Abstract Driver for a specific graph store engine.

Store Module

Modules that export store commands that connects to a graph store using the functions exposed by the Driver module. A store module always includes the driver module.

The big picture

The resulting SDaaS platform data model big picture is:

package "User Application" { 
    class "User Application" as Application
}
package "SDaaS platform" #aliceblue;line:blue;line.dotted;text:blue { 
    class Command
    class Module
    class "SDaaS function" as SDaaSFunction
    Abstract "Driver" as Driver
    class ConfigVariable
    Abstract "SID Variable" as SidVariable

    interface "bash function" as Function
    interface EnvironmentVariable
}
package "Backing services" {
    interface "Backing service" as BakingService
    interface "Graph Store" as GraphStore
    interface "Knowledge Graph" as KnowledgeGraph
    interface "Linked Data Platform" as RDFResource
}
package "smart data service" #aliceblue;line:blue;line.dotted;text:blue {
    class "SDaaS script" as Script
    class "smart data Service" as SDaaS 
}

KnowledgeGraph : KEES compliance

GraphStore : SPARQL endpoint
RDFResource : URL 
Command --|> SDaaSFunction
SDaaSFunction --|> Function
ConfigVariable --|> EnvironmentVariable
SidVariable --|> ConfigVariable
Driver --|> Module
GraphStore --|> BakingService
RDFResource --|> BakingService
KnowledgeGraph --|> GraphStore

BakingService <|-- SDaaS

Module *-- Command : exports
Module *-- ConfigVariable : declares

Command o.. RDFResource : learns

Driver --> KnowledgeGraph : connects
Module .> Module : includes

Function --o Script
EnvironmentVariable --o Script
Script -o SDaaS : contains
SidVariable <- Driver : uses
ConfigVariable .o Command : uses

Application ..> KnowledgeGraph : access

Backing service

It is a type of software that operates on servers, handling data storage, resource publishing , and processing tasks for an application.

Graph Store

It is backed service that provides support for SPARQL protocol and an optional support to Graph Store Protocol.

Knowledge Graph

It is a Graph Store compliant with the KEES specification.

Linked Data Platform

It is a web informative resource that exposes RDF data in one of supported serialization according to W3C LDP specifications.

SDaaS script

It is a bash script that uses SDaaS commands.

smart data service

It is a backing service that include the SDaaS platform and implements one or more SDaaS scripts.

User application

It is a (Sematic Web) Application that uses a knowledge graph.

2 - Module building

how to extend the platform creating new modules

Conformance Notes

This chapter defines some conventions to extend the SDaaS platform.

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words may not always appear in uppercase letters in this specification.

Module Requirements

An SDaaS module is a bash script file fragment that complies with the following restrictions:

  • It MUST be placed in the SDAAS_INSTALL_DIR or in the modules directory in the $HOME/modules directory. Modules defined in the $HOME/modules directory take precedence over the ones defined in the SDAAS_INSTALL_DIR directory.
  • Its name MUST match the regular expression ^[a-z][a-z0-9-]+$.
  • The first line of the module MUST be if [[ ! -z ${__module_MODULENAME+x} ]]; then return ; else __module_MODULENAME=1 ; fi where MODULENAME is the name of the module.
  • all commands defined in the module SHOULD match sd_MODULENAME_FUNCTIONAME here the MODULENAME is the same name of the module that contains the command and FUNCTIONNAME is an unique name inside the module. To rewrite existing commands it is allowed but discouraged.
  • all commands MUST follow the syntax conventions described below.

A module CAN contain:

  1. Constants (read only variables): constant SHOULD be prefixed by SDAAS_ and MUST be an unique name in the platform. Overrriding default
  2. Configuration variables: constant SHOULD be prefixed by SD_ and MUST be an unique name in the platform.
  3. Module commands definition.
  4. Module initialization: a set of bash commands that always runs on module loading

For example, see the modules implementation in SDaaS community edition

Command requirements

Commands MUST support the Utility Syntax Guidelines described in the Base Definitions volume of POSIX.1‐2017, Section 12.2, from section 3 to 10, inclusive.

All commands that accepts options and/or operands SHOULD accept also the option -D. Such option is used for define local variables in the form of key-value that provides a context for the command.

Options SHOULD conform to the following naming conventions:

-A
abort if the command returns is > 0 .
-a, -D “accrual_method=ACCRUAL METHOD
accrual method, the method by which items are added to a collection. PUT and POST method SHOULD be recognized
-f FILE NAME, -D “input_file=FILE NAME
to be use to refer a local file object with relative or absolute path
-i INPUT FORMAT, -D “input_format=INPUT FORMAT
input format specification, these values SHOULD be recognized (from libraptor):
FormatDescription
rdfxmlRDF/XML (default)
ntriplesN-Triples
turtleTurtle Terse RDF Triple Language
trigTriG - Turtle with Named Graphs
guessPick the parser to use using content type and URI
rdfaRDF/A via librdfa
jsonRDF/JSON (either Triples or Resource-Centric)
nquadsN-Quads
-h
prints an help description
-o OUTPUT FORMAT, -D “output_format=OUTPUT FORMAT
output format specification This values SHOULD be recognized:
  • csv
  • csv-h
  • csv-1
  • csv-f1
  • boolean
  • tsl
  • json
  • ntriples
  • xml
  • turtle
  • rdfxml
  • test
-p PRIORITY
to be use to reference a priority
levelmnemonicexplanation
2CRITICALShould be corrected immediately, but indicates failure in a primary system - fix CRITICAL problems before ALERT - an example is loss of primary ISP connection.
3ERRORNon-urgent failures - these should be relayed to developers or admins; each item must be resolved within a given time.
4WARNINGWarning messages - not an error, but indicated that an error will occur if action is not taken, e.g. file system 85% full - each item must be resolved within a given time.
5NOTICEEvents that are unusual but not error conditions - might be summarized in an email to developers or admins to spot potential problems - no immediate action required.
6INFORMATIONALNormal operational messages - may be harvested for reporting, measuring throughput, etc. - no action required.
7DEBUGInfo is useful to developers for debugging the app, not useful during operations.
-s SID , “sid=SID
connect to Graph Store named SID ( STORE by default)
-S SIZE
to be use to reference a size

Evaluation of the local command context

The process of evaluating the local context for a command is as follows:

  1. First, the local context hardcoded in the command implementation is evaluated.
  2. The hardcoded local context can be overridden by the global configuration variable SD_DEFAULT_CONTEXT.
  3. The resulting context can be overridden by specific command options (e.g., -s SID). Command options are evaluated left to right
  4. The resulting context can be overridden by specific command operands.

For example, all these calls will have the same result:

  • sd_sparql_graph: the hardcoded local context ingests data into a named graph inside the graph store connected to the sid STORE, using a generated UUID URI as the name of the named graph.
  • sd_sparql_graph -s STORE $(sd_uuid): hardcoded local context overridden by specific options and operand.
  • sd_sparql_graph -D "sid=STORE: hardcoded local context overridden by the -D option.
  • sd_sparql_graph -D "sid=STORE" -D "graph=$(sd_uuid)": the same as above but using multiple -D options (always evaluated left to right).
  • SD_DEFAULT_CONTEXT="sid=STORE"; sd_sparql_graph $(sd_uuid): hardcoded local context overridden by SD_DEFAULT_CONTEXT and operand.
  • SD_DEFAULT_CONTEXT="sid=XXX"; sd_sparql_graph -s YYY -D "sid=STORE graph=ZZZZ" $(sd_uuid): a silly command call that demonstrates the overriding precedence in local context evaluation.

3 - Driver building

how write a custom graph store driver

Conformance Notes

This chapter defines some conventions to extend the SDaaS platform.

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words may not always appear in uppercase letters in this specification.

Driver Requirements

  • a driver MUST be a valid a [SDaaS module]({{ < ref “module building” >}})
  • it MUST implements the driver interface with SDaaS functions that conforms to the following naming conventions: sd_<driver name>_<driver method>; for example if you want to implement a special the driver for the stardog graph store you MUST implement the sd_stardog_query function
  • all driver function MUST be positional, with no defaults. The validity check of the parameters is responsability of the caller (ie. the driver module)
  • Following methods MUST be implemented:
method name( parameters )description
erase(sid)erase the graph store
load(sid, graph, accrual_method)loads a stream of nTripes into a named graph in a graph store according the accrual policy "
query(sid, mime_type)execute in a graph store a SPARQL query read from std input, requesting in output one of the supported mime types
size(sid)return the number of triples in a store
update(sid)execute in a graph store a SPARQL update request read from std input

All method parameters are strings that MUST matches following regular expressions:

  • sid MUST match the ^[a-zA-Z]+$ regular expression and MUST point to an http(s) URL. It is assumed that sid is the name of a SID variable
  • graph MUST be a valid URI
  • mime_type MUST be a valid mime type
  • accrualMethod MUST match PUT or POST

Implementing a new graph store driver

A driver is the implementation is described by the following UML class diagram:

package "SDaaS Community Edition" {
    interface DriverInterface
    DriverInterface : + erase(sid)
    DriverInterface : + load(sid, graph, accrual_method)
    DriverInterface : + query(sid, mime_type)
    DriverInterface : + size(sid)
    DriverInterface : + update(sid)
    DriverInterface : + validate(sid)

    class w3c
    class testdriver
}
package "SDaaS Enterprise Edition" {
    class blazegraph
    class gsp
    class neptune
}

testdriver ..|> DriverInterface
w3c ..|> DriverInterface 
blazegraph --|> w3c
gsp --|> w3c
neptune --|> w3c

There is a reference implementation of the driver interface known as the w3c driver compliant with W3C SPARQL 1.1 protocol specification, along with the testdriver stub driver implementation to be used in unit tests. The SDaaS Enterprise Edition offers additional drivers that are specialized versions of the w3c driver, optimized for specific graph store technologies:

  • gsp driver: a w3c driver extension that uses the SPARQL 1.1 Graph Store HTTP Protocol. It defines the configuration variable <sid>_GSP_ENDPOINT that contains the http address of the service providing a Graph Store Protocol interface.
  • blazegraph driver: a optimized implementation for Blazegraph graph store.
  • neptune driver: a optimized implementation for AWS Neptune service.

In commands, do not call the driver method implementation function directly. Instead, call the corresponding abstract driver module functions.