Facts' ingestion
To ingest RDF data wit SDaaS, you have some options available:
- using the sd sparql update command to execute SPARQL Update operations for server side ingestion.
- using a ETL pipeline with the sd sparql graph command : to transfer and load a stream of RDF triples to a named graph in the the graph store.
- using the learn module that provides some specialized shortcuts to ingest data and KEES metadata
Using SPARQL update
the sparql update commands are executed by SPARQL service. Therefore, the resource must be visible to the graph store server. For example, to load the entire definition of schema.org:
echo 'LOAD <https://schema.org/version/latest/schemaorg-current-http.ttl> INTO GRAPH <urn:graph:0>' | sd sparql update
Using SPARQL graph
sd sparql graph command is executed by the SDaaS processor that store a stream of RDF triples (in nTriples serialization) into a named graph inside the graph store. This command offers increased control over the resource by allowing enforcement of the resource type. SDaaS optimizes the transfer of resource triples to the graph store using the most driver-optimized method.
In a ETL process, this command realizes the load stage. It is tipally used in a piped command.
Some examples:
# get data from a command
sd view ontology \
| sd sparql graph urn:sdaas:tbox
# get data from a local file
sd_cat mydata.nt \
| sd sparql graph
# retrieve linked data from a ntriple remote resource
sd_curl -s -f https://schema.org/version/latest/schemaorg-current-http.nt \
| sd sparql graph
# retrieve RFD data serialized with turtle
sd_rapper -i turtle https://dbpedia.org/data/Milan.ttl \
| sd sparql graph https://dbpedia.org/resource/Milan
# retrieve a linked data resource with content negotiation
ld=https://dbpedia.org/resource/Lecco
sd_curl_rdf $ld \
| sd_rapper -g - $ld \
| sd sparql graph -a PUT $ld
# same as above but with KEES metadata
sd_curl_rdf $ld \
| sd_rapper -g - $ld \
| sd kees metadata -D "activity_type=Learning source=$ld trust=0.8" $ld \
| sd sparql graph -a PUT $ld
sd_curl, sd_curl_rdf, sd_rapper, sd_cat are just wrappers for standard bash commands to trap and log errors.
The sd sparql graph supports two method for graph accrual:
-a PUTfor override named graph; it creates new named graph if needed-a POST(default) to append data to a named graph; it creates new named graph if needed.
The gsp driver implementation is capable of utilizing the SPARQL 1.1 Graph Store Protocol (GSP). To enable this support, define the driver type <sid>_TYPE=gsp and set <sid>_GSP_ENDPOINT to point to the URL of the service providing the Graph Store Protocol.
WARNING: many graph store engines have limitations regarding the size of data ingestion using just SPARQL update features with the default w3c driver. Whenever possible, utilize a driver optimized for your graph store or a GSP capable endpoint.
Using the learn module (EE)
The learn module provides some shortcuts to loads linked data into a graph store together to ther KEES metadata.
Here are some examples that loads RDF triples:
sd learn resource -D "graph=urn:dataset:dbpedia" https://dbpedia.org/resource/Milan
sd learn file /etc/app.config
sd learn dataset urn:dataset:dbpedia
sd learn datalake https://data.exemple.org/
WARNING:
if the sd learn dataset command fails, the target named graph could be incomplete, or annotated with a prov:wasInvalidatedBy property