This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Guide to Application building

best practice to build SDaaS applications.

    SDaaS™ is a software platform that helps to build Semantic Web Applications and Smart Data Plaforms.

    SDaaS assists in constructing and managing a knowledge graph that organizes linked data resources annotated with application-specific semantics.

    Instead of accessing various data silos, applications utilize the knowledge graph as a reliable central repository, offering a semantic query service. This repository contains a semantic enriched replica of all the data necessary for the application. Because of the inherently distributed nature of the web and the continuous changes in data, an application using the SDaaS platform adopts the Eventual Consistency model. This model is highly popular today as it represents a reasonable trade-off between performance and complexity.

    What is a semantic web application

    A semantic web application is a software system designed to leverage and utilize the principles and technologies of the Semantic Web.

    These applications utilize linked data, ontologies, and metadata to create richer connections between different pieces of information on the internet. They typically involve:

    1. Structured Data Representation: Semantic web apps use RDF (Resource Description Framework) to represent data in a structured and machine-readable format. This allows for better understanding and interpretation of relationships between different data points.

    2. Ontologies and Vocabularies: They employ ontologies and vocabularies (such as OWL - Web Ontology Language) to define relationships and meaning between entities, making it easier for systems to understand the context of the data.

    3. Data Integration and Interoperability: These applications facilitate data integration from diverse sources, enabling different systems to exchange and use information more effectively.

    4. Inference and Reasoning: Semantic web apps can perform logical inference and reasoning to derive new information or insights from existing data based on defined rules and relationships.

    5. Enhanced Search and Discovery: They enable more sophisticated search functionalities by understanding the semantics of the data, providing more relevant and contextualized results.

    In summary, a semantic web application harnesses Semantic Web technologies to enable machines to comprehend and process data more intelligently, facilitating better data integration, discovery, and utilization across various platforms and domains.

    What is a smart data platform

    A Smart Data Platform refers to a technological infrastructure designed to collect, process, analyze, and leverage data intelligently to generate insights, make decisions, and power various applications or services. These platforms often incorporate advanced technologies such as artificial intelligence (AI), machine learning (ML), data analytics, and automation to handle vast amounts of data from diverse sources.

    A Smart Data Platform typically integrates multiple functionalities, including data ingestion, storage, processing, analysis, visualization, and often includes features for data governance, security, and compliance.

    What is Eventual Consistency

    Eventual Consistency is a concept in distributed computing where, in a system with multiple replicas of data, changes made to the data will eventually propagate through the system and all replicas will converge to the same state. However, this convergence is not instantaneous; it occurs over time due to factors like network latency, system failures, or concurrent updates. The Knowledge Graph can be considered as a semantically enriched replica of the ingested distributed data

    The typical SDaaS user is a DevOps professional who utilizes the commands provided by the platform to script the building and updating of a knowledge graph. This knowledge graph is then queried by an application using SPARQL or REST APIs. SDaaS developers and system integrators can extend the platform by adding custom modules and creating new commands.

    More in details the typical SDaaS use case scenario is summarized by the following diagram:

    cloud "Linked Data cloud" as data
    usecase "SDaas script\ndevelopment" as writesScript
    usecase "smart data service\ndeployment" as managesSDaaS
    usecase "application development" as developsApplication
    usecase "queries\nKnowledge Graph" as usesKnowledge
    usecase "installs\nSDaaS modules" as installsSDaaS
    usecase "configure\nSDaaS" as configuresSDaaS
    usecase "knowledge update" as updatesKnowledge
    actor "App devops" as user
    package "SDaaS distribution" as Distribution <<Docker image>>
    node "smart data service" as SDaaS {
        component "SDaaS script" as Script
        package Module {
            component "SDaaS Command" as Command
            interface ConfigVariable
        }
    }
    database "Knowledge Graph" as Store
    node Application
    
    user .. developsApplication
    user .. writesScript
    user .. managesSDaaS
    Command o-> ConfigVariable : uses
    writesScript .. Script
    
    managesSDaaS -- installsSDaaS
    managesSDaaS -- configuresSDaaS
    configuresSDaaS .. ConfigVariable 
    installsSDaaS .. Module 
    Command .. updatesKnowledge 
    data . updatesKnowledge
    updatesKnowledge . Store
    
    Script o--> Command : calls
    Distribution .. installsSDaaS
    
    Application .. usesKnowledge
    
    developsApplication .. Application
    usesKnowledge .. Store
    

    Calling SDaaS commands

    The SDaaS Platform operates through a set of bash commands and functions. The general syntax to call a SDaaS command is sd <module> <name> [*OPTIONS*] [*OPERANDS*], while the syntax of an SDaaS function is sd_<name>.

    The modules are bash script fragments that define a set of SDaaS functions, providing a namespace for them.

    Before calling an SDaaS Function, you must explicitly load its module cache with sd_include <module> core function. Core functions are contained in the core module that is loaded at startup. SDaaS commands automatically include the required modules.

    SDaaS commands MAY depend on a set of context variables you can pass using options.The global configuration variable SD_DEFAULT_CONTEXT provides a default local context used by all commands.

    For instance these calls are all equivalent:

    sd sparql graph urn:myapp:abox
    sd sparql graph -s STORE -D "graph=urn:myapp:abox"
    sd sparql graph -D "sid=STORE" -D "graph=urn:myapp:abox"
    sd sparql graph -D "sid=STORE graph=urn:myapp:abox"
    sd sparql graph -D "sid=OVERRDEN_BY-s graph=urn:myapp:overridden_by_operand"  -s STORE  urn:myapp:abox
    SD_DEFAULT_CONTEXT="sid=STORE graph=urn:myapp:abox"; sd sparql graph
    

    SDaaS scripting

    The smart data service is usually includes SDaaS script and an application config file. The SDaaS script is normal bash scrips that include the SDaaS platform with the command source $SDAAS_INSTALL_DIR/core

    Usually you create an application config file that contains the definition of the dataset and rules used by ingestion, reasoning and publishing plan. For instance:

    #!/usr/bin/env bash
    source $SDAAS_INSTALL_DIR/core
    
    sd store erase
    
    ## loads the language profile and the application specific configurations
    sd view ontology | sd sparql graph urn:tbox
    sd_curl -s -f https://schema.org/version/latest/schemaorg-current-http.nt | sd sparql graph urn:tbox
    
    ## loading some facts from dbpedia
    for ld in https://dbpedia.org/resource/Lecco https://dbpedia.org/resource/Milan; do
    	sd_curl_rdf $ld | sd_rapper -g - $ld | sd sparql graph urn:abox
    done
    

    The script MAY implements a never-ending loop, similar to this pseudo-code using SDaaS Enterprise Edition Platform:

    #!/usr/bin/env bash
    source $SDAAS_INSTALL_DIR/core # Loads the SDaaS platform
    while NEW_DATA_DISCOVERED ; do 
    
        # Boot and lock platform ######################
        sd kees boot -L 
    
        ## loads the language profile and the application specific configurations
        sd -A view ontology | sd sparql graph urn:myapp:tbox
    	sd learn file /etc/myapp.config
    
        ## loading facts
        sd learn dataset -D "activity_type=Learning trust=0.9" urn:myapp:facts
    
        # reasoning window loop #########################
        sd plan loop -D "activity_type=Reasoning trust=0.9" urn:myapp:reasonings
    	
    
        # publishing window  ########################
        sd -A plan run -D "activity_type=Publishing" urn:myapp:tests
    
        sd kees unlock
    
        sleep $TIME_SLOT
    done
    

    Application architectures enabled by SDaaS

    In this chapter you find some typical architectures that are enabled by SDaaS

    Use case 1: autonomous agent

    an ETL agent that transform raw data into linked data:

    Folder "Raw data" as Data
    Folder "Linked data" as RDF
    
    note right of RDF: RDF data according to\nan application language profile
    
    package "ETL application" #aliceblue;line:blue;line.dotted;text:blue {
        node "Autonomous Agent" as aa #white;line:blue;line.dotted;text:blue
        database "graph store" as graphStore
    
        aa -(0 graphStore : run mapping\nrules
    
    Data ..> aa
    aa ..> RDF
    

    The autonomous agent uses SDaaS to upload raw data in an intermediate form to a grap store and to use SPARQL rules to map the intermediate format into the application language profile.

    Use case 2: linked data plaform

    The SDaaS platform is used to implement an agents that transform and loads raw data into a knowledge graph, doing some ontology mappings and providing a Linked Data Platform interface to applications. It is compatible with a SOLID protocol

    cloud "LinkedOpen Data Cloud" as Data
    
    package "LOD smart cache" #aliceblue;line:blue;line.dotted;text:blue {
        node "Autonomous\nDiscovery Agent" as DiscoveryAgent #white;line:blue;line.dotted;text:blue
        database "graph store" as DataLake
        node "Linked data Proxy" as DataLakeProxy
    
        
        DiscoveryAgent -(0 DataLake : writes RDF data
        DataLake 0)- DataLakeProxy 
    }
    
    Data ..> DiscoveryAgent
    DataLakeProxy -LDP
    

    Linked-data proxy is a standard component providing support the VOiD ontology and HTTP cache features. Linked data center provides a free open source implementation that can be used out-of-the-box or as reference implementation for this component.

    Use case 3: smart data warehouse

    The typical SDaaS application architecture to build an RDF based data warehouse is the following:

    cloud "1st, 2nd and 3rd-party raw data" as Data
    
    package "data management platform" #aliceblue;line:blue;line.dotted;text:blue {
        node "Fact provider" as DiscoveryAgent #white;line:blue;line.dotted;text:blue
        folder "Linked-data lake" as DataLake
        node "smart data service" as SDaaSApplication #white;line:blue;text:blue
        database "RDF\nGraph Store" as GraphStore
        
        DiscoveryAgent --(0 DataLake : writes RDF data
        DataLake 0)-- SDaaSApplication : learn data
        SDaaSApplication --(0 GraphStore : updates
    
    
        note right of DiscoveryAgent
        Here the application injects
        its specific semantic in raw data
        end note
    
        note left of SDaaSApplication
        Here KEES cycle
    	is implemented
        end note
    }
    
    interface "SPARQL QUERY" AS SQ
    GraphStore - SQ
    
    package "application" {
        node "Application backend" as Backend
        node "Application frontend" as Frontend
        database "application local data" as firstPartyData
    
     
        Backend 0)- Frontend : calls
        Backend --( SQ : queries
        firstPartyData 0)-- Backend  : writes
    }
    
    Data ..> DiscoveryAgent : gets raw data
    Data <..... firstPartyData : copy 1st-party data
    

    You can distinguish two distinct threads: the development of a data management platform and the development of the application. The knowledge graph built in the data platform is used by the application as the primary source of all data. The data produced by the application can be reinjected into the data management platform.

    The SDaaS platform is used in the development of the data management platform, primarily in the development of the smart data service and optionally in the Autonomous Discovery Agent.

    More in detail, the main components of the data platform are:

    Autonomous Discovery Agent
    its an application-specific ETL process triggered by changes in data. This process transforms raw data into linked data annotated with information recognized by the application and stores it in a linked-data lake. Multiple Autonomous Discovery Agents may exist and operate concurrently. Each agent can access the Graph Store to identify enrichment opportunities or to detect changes in the data source.
    Linked-data lake
    it is a repository, for example an s3 bucket or a shared file system, that contains RDF files, that is Linked Data Platform RDF Sources [LDP-RS] expressed with a known language profile. This files can be mirrors or existing web resources, mappings of databases or even private data described natively in RDF.
    smart data service
    it is a service that includes the SDaaS platform and that contains a script processing data conforming with the KEES specifications.
    RDF Graph Store
    implements the Knowledge Graph supporting the SPARQL protocol interface. Linked data center provides a free full featured RDF graph database you can use to learn and thes the SDaaS platform

    Use case 4: Smart data agent

    All activities ar performed by the same agent that embeds its workflow

    cloud "3rd-party data" as ldc
    database "Knowledge Graph" as GraphStore
    folder "Reports" as RawData
    
    Folder "Linked data lake" as ldl
    
    node "Smart data agent" as agent #white;line:blue;text:blue
    
    ldl --> agent
    ldl <-- agent
    ldc -> agent
    
    agent --> GraphStore
    agent -> RawData
    

    The workflow is just a definition of activities that should be completed by agent

    Use case 5: semantic web agency

    In this architecture, multiple agents run at the same time, agents coordinated using knowledge graph status and locks.

    cloud "3rd-party data" as ldc
    database "Knowledge Graph" as GraphStore
    folder "reports" as RawData
    
    
    note bottom of RawData: can be used as raw data
    
    Folder "Linked data lake" as ldl
    note top of ldl: contains activity plans
    
    package "Agency" #aliceblue;line:blue;line.dotted;text:blue {
        node "Smart data agent" as Ingestor #white;line:blue;text:blue
        node "Reasoning\nagent(s)" as Reasoner #white;line:blue;line.dotted;text:blue
        node "Enriching\nagent(s)" as Enricher #white;line:blue;line.dotted;text:blue
        node "Publishing\nagent(s)" as Publisher #white;line:blue;line.dotted;text:blue
    }
    
    ldl --> Ingestor
    ldl <-- Enricher
    ldc --> Enricher
    
    Ingestor --> GraphStore
    Reasoner <--> GraphStore
    Publisher <-- GraphStore
    Enricher <-- GraphStore
    Publisher --> RawData
    

    The workflow is just a definition of activities that should be completed by agent