Facts' ingestion
How to populate your knowledge graph
The SDaaS Platform offers a programmatic approach to building and update Knowledge Graphs. It includes a language and a command-line interface (CLI), offering optimized access to one or more RDF graph stores.
The subsequent chapters assume you’ve installed docker and have some familiarity with the bash shell, and SPARQL.
SDaaS requires access to a RDF SPARQL service: let’s launch a graph store using a public docker image in a vpn named myvpn
docker network create myvpn
docker run --network myvpn --name kb -d -p 8080:8080 linkeddatacenter/sdaas-rdfstore:2.2.1
This will run in background a small, full featured RDF Graph Store instance compliant with SDaaS requirements. You can access the workbench at http://localhost:8080/sdaas
Once you have a knowledge graph you can get the SDaaS Community edition prompt with:
docker run --network myvpn --rm -ti linkeddatacenter/sdaas-ce:4.2.1
Your terminal will show the SDaaS command prompt as an extension of the bash shell:
____ ____ ____
/ ___|| _ \ __ _ __ _/ ___|
\___ \| | | |/ _` |/ _` \___ \
___) | |_| | (_| | (_| |___) |
|____/|____/ \__,_|\__,_|____/
Smart Data as a Service platform - Pitagora
Community Edition 4.2 connected to http://kb:8080/sdaas/sparql (w3c)
Copyright (C) 2018-2024 LinkedData.Center
more info at https://linkeddata.center/sdaas
sdaas >
What is happening behind the scene?
SDaaS needs to connect to a graph store; to create such connection you have to specify one sid (store ID) that is an environment variable containing the URL of a SPARQL service endpoint for the graph store. By convention, the default sid is named STORE
. The SDaaS platform comes out-of-the-box configured with a default sid named STORE=http://kb:8080/sdaas/sparql
that you can change in any moment.
All SDaaS commands that require to access a graph store provides the -s <sid>
option.
If omitted, SDaaS platform will use use the name STORE
.
Each sid_ requires a driver specified by the driver variable <sid>_TYPE
. For instance, the store engine driver for STORE
is defined in STORE_TYPE
. By default SDaaS uses the w3c
driver that is suitable for any standard SPARQL service implementations. In addition to the standard driver, SDaaS Enterprise Edition provides some optimized drivers.
For instance: given that the linkeddatacenter/sdaas-rdfstore
Docker image is based on blazegraph engine, in Enterprise Edition you have the flexibility to utilize an optimized driver. To enable this use STORE_TYPE=blazegraph
SDaaS provides a set of commands to introspect the platform. Try typing:
# to get the platform version:
sd view version
# to see SDaaS configuration variables:
sd view config
# to list all installed platform modules. Modules are cached on first use. The cached modules are flagged with "--cached".
sd view modules
# to see all commands exported by exported by the [sparql module](/module/sparql):
sd view module sparql
see the Calling SDaaS commands section in the Application building guide to learn more about SDaaS commands
Cleanup a knowledge graph using the sd store erase command (be careful, this zap your default knowledge graph):
sd store erase
You can verify that the store is empty with
sd store size
You should read 0 RDF triple in the knowledge graph
The sparql update command are executed by SPARQL service. Therefore, the resource must be visible to the graph store server. For example, to load the entire definition of schema.org:
echo 'LOAD <https://schema.org/version/latest/schemaorg-current-http.ttl> INTO GRAPH <urn:graph:0>' | sd sparql update
See more ingestion methods.
To query the store you can use SPARQL :
cat <<-EOF | sd sparql query -o csv
SELECT ?g (COUNT (?s) AS ?subjects) WHERE {
GRAPH ?g{?s?p ?o}
} GROUP BY ?g
EOF
This command prints a CSV table with all named graphs and the number of triples they contain. The -o
option
specifies the format you want for the result.
The sd sparql query command, by default, outputs XML serialization. However, it allows for specification of a preferred serialization using the -o
flag. Additionally, the sparql module provides convenient command aliases, e.g.:
sd sparql list
to print a select query as csv without header on stdoutsd sparql rule
to print the result of a SPARQL CONSTRUCT as a stream of nTriples on stdoute.g.echo "SELECT DISTINCT ?class WHERE { ?s a ?class} LIMIT 10" | sd sparql list
When you type exit
you can safely destroy the sdaas container but the created data will persist in the external store.
Free allocated docker resources by typing:
docker rm -f kb
docker network rm myvpn
How to populate your knowledge graph
How to reasoning about facts in the knowledge graph
Tip and tricks to unveil SDaaS potential