Getting started

A quick tour of the SDaaS platform

The SDaaS Platform offers a programmatic approach to building and update Knowledge Graphs. It includes a language and a command-line interface (CLI), offering optimized access to one or more RDF graph stores.

The subsequent chapters assume you’ve installed docker and have some familiarity with the bash shell, and SPARQL.

Create an RDF Graph Store

SDaaS requires access to a RDF SPARQL service: let’s launch a graph store using a public docker image in a vpn named myvpn

docker network create myvpn
docker run --network myvpn --name kb -d -p 8080:8080 linkeddatacenter/sdaas-rdfstore:2.2.1

This will run in background a small, full featured RDF Graph Store instance compliant with SDaaS requirements. You can access the workbench at http://localhost:8080/sdaas

Enter SDaaS

Once you have a knowledge graph you can get the SDaaS Community edition prompt with:

docker run --network myvpn --rm -ti linkeddatacenter/sdaas-ce:4.2.1

Your terminal will show the SDaaS command prompt as an extension of the bash shell:

         ____  ____              ____  
        / ___||  _ \  __ _  __ _/ ___| 
        \___ \| | | |/ _` |/ _` \___ \ 
         ___) | |_| | (_| | (_| |___) |
        |____/|____/ \__,_|\__,_|____/ 

        Smart Data as a Service platform - Pitagora
        Community Edition 4.2 connected to http://kb:8080/sdaas/sparql (w3c)

        Copyright (C) 2018-2024 LinkedData.Center
        more info at https://linkeddata.center/sdaas

sdaas >

What is happening behind the scene?

SDaaS needs to connect to a graph store; to create such connection you have to specify one sid (store ID) that is an environment variable containing the URL of a SPARQL service endpoint for the graph store. By convention, the default sid is named STORE. The SDaaS platform comes out-of-the-box configured with a default sid named STORE=http://kb:8080/sdaas/sparql that you can change in any moment.

All SDaaS commands that require to access a graph store provides the -s <sid> option. If omitted, SDaaS platform will use use the name STORE.

Each sid_ requires a driver specified by the driver variable <sid>_TYPE. For instance, the store engine driver for STORE is defined in STORE_TYPE. By default SDaaS uses the w3c driver that is suitable for any standard SPARQL service implementations. In addition to the standard driver, SDaaS Enterprise Edition provides some optimized drivers.

For instance: given that the linkeddatacenter/sdaas-rdfstore Docker image is based on blazegraph engine, in Enterprise Edition you have the flexibility to utilize an optimized driver. To enable this use STORE_TYPE=blazegraph

The first look to the platform

SDaaS provides a set of commands to introspect the platform. Try typing:


# to get the platform version:
sd view version

# to see SDaaS configuration variables:
sd view config

# to list all installed platform modules. Modules are cached on first use.  The cached modules are flagged with "--cached".
sd view modules

# to see all commands exported by exported by the [sparql module](/module/sparql):
sd view module sparql

see the Calling SDaaS commands section in the Application building guide to learn more about SDaaS commands

Boot the knowledge base

Cleanup a knowledge graph using the sd store erase command (be careful, this zap your default knowledge graph):


sd store erase

You can verify that the store is empty with


sd store size

You should read 0 RDF triple in the knowledge graph

Ingest facts with SPARQL update

The sparql update command are executed by SPARQL service. Therefore, the resource must be visible to the graph store server. For example, to load the entire definition of schema.org:


echo 'LOAD <https://schema.org/version/latest/schemaorg-current-http.ttl> INTO GRAPH <urn:graph:0>' | sd sparql update

See more ingestion methods.

Querying the knowledge graph

To query the store you can use SPARQL :


cat <<-EOF | sd sparql query -o csv 
SELECT ?g (COUNT (?s) AS ?subjects) WHERE {
        GRAPH ?g{?s?p ?o}
} GROUP BY ?g
EOF

This command prints a CSV table with all named graphs and the number of triples they contain. The -o option specifies the format you want for the result.

The sd sparql query command, by default, outputs XML serialization. However, it allows for specification of a preferred serialization using the -o flag. Additionally, the sparql module provides convenient command aliases, e.g.:

sd sparql list to print a select query as csv without header on stdout
sd sparql rule to print the result of a SPARQL CONSTRUCT as a stream of nTriples on stdout

e.g.echo "SELECT DISTINCT ?class WHERE { ?s a ?class} LIMIT 10" | sd sparql list

Quitting the platform

When you type exit you can safely destroy the sdaas container but the created data will persist in the external store.

Free allocated docker resources by typing:

docker rm -f kb
docker network rm myvpn