LUPOSDATE Demonstration

Logo LUPOSDATE

Table of Contents

LUPOSDATE System
Starting the Demonstration
SPARQL Operators

Logical Optimization Rules
Demonstration Examples
Literature

In the DFG project LUPOSDATE, we develop logically and physically optimized Semantic Web Database engines, which we call LUPOSDATE engines, or LUPOSDATE system. The LUPOSDATE system aims to efficiently process the RDF query language SPARQL [16] [11] over large-scale Semantice Web data. In the meanwhile also the rule language RIF BLD and the ontology languages OWL and RDFS are supported.

In order to present the funcationalities of LUPOSDATE engines, we have developed an online demonstration of our system in form of an applet or as application using Java Web Start. Please see Starting the Demonstration. In this article, we introduce the LUPOSDATE system, its functionalities, usage of the demo, and related knowledge.

To read more about Semantic Web databases and LUPOSDATE, please take a look into our Semantic Web book [9].

LUPOSDATE System

The LUPOSDATE SPARQL system supports various approaches to manage RDF data and process SPARQL queries: Index, RDF3X, Stream, Jena and Sesame. Jena [21] and Sesame [3] refer to third-party SPARQL engines. Index is our in-memory Engine presented in [6]. Stream is our stream-based implementation (see [10]). RDF3X is a re-implementation of [14], but is further enhanced with additional optimization strategies. 

All our approaches and reimplementations except of our Stream engine support full SPARQL 1.0 [16] and SPARQL 1.1 [11] and run the over 200 W3C test cases for SPARQL 1.0 [4] and nearly all of SPARQL 1.1 [15] successfully, which cover a broad range of technical details of the SPARQL language. The stream engine does not support named graphs, and thus it runs the W3C test cases successfully except of those, which require named graphs support.

The Index, RDF3X and Stream engine can be utilized to run also RIF BLD [2] rules. Our engines support the ontology languages OWL [12] and RDFS [13] by expressing their inference in RIF rules [17]  and processing them afterwards. The LUPOSDATE system supports to use general RIF inference rules applied to ontology and instance data, or to generate RIF inference rules based on the concrete ontology data and apply them on the instance data. The latter is supported in two versions: The first one follows the W3C description in [17], the second uses a more optimized variant. Furthermore, for SPARQL queries on inferred data LUPOSDATE have the options to materialize first all inferred triples before evaluating the given SPARQL query, or to integrate the execution plans of the inference rules and the SPARQL query and optimize them together with the goal to infer only the data which is asked for during processing the given SPARQL query. 

Phases of LUPOSDATE implementations

Figure 1: Architecture of LUPOSDATE system

Figure 1 presents the architecture of the LUPOSDATE engines. A given SPARQL query is first transformed into a coreSPARQL query. coreSPARQL is a core fragment of SPARQL, which eliminates redundant language constructs and allows only machine-friendly syntax (see [7]). The coreSPARQL hence simplifies the following processing of SPARQL. An operator graph is generated from the coreSPARQL query in order to further logically and physically optimize queries. Logical optimization aims to reorganize the operator graph into an equivalent operator graph, which generates the same output for any input as the original operator graph, but optimizes the execution time of query evaluation. Physical optimization aims to obtain the operator graph with the best estimated execution times by choosing physical operators for the logical ones. For all implementations except of Stream, the data is indexed. Stream is a stream-based implementation, which indexes only that fragment of data, which is necessary to answer the query during query evaluation. Therefore, Stream does not have an extra indexing phase and the data is directly transmitted to the query evaluation phase.

A similar approach applies to the evaluation of RIF rules, where RIF rules are first parsed into an abstract syntax tree and afterwards transformed into a simplified version of the abstract syntax tree before the evaluation continues with the transformation into an operator graph.

OWL and RDFS ontologies are supported by generating corresponding RIF inference rules, which are afterwards processed as described above. Figure 1 presents the architecture for the approach, where the operator graphs of RIF inference rules and the given SPARQL query are merged and afterwards optimized together to avoid inferring unnecessary triples. The other (not presented) approach would be to first infer all triples by evaluating the RIF inference rules and add all these triples to the index, such that they are considered for all subsequent SPARQL query evaluations.

Note that depending on the individual approach, the final operator graphs consist of different sets of operators. We present an overview of the supported logical and physical operators in the subsection after describing the Demonstration system.

Demonstration

Starting the Demonstration

Please use this link for loading the LUPOSDATE Web Client into your browser.

SPARQL Operators

Our LUPOSDATE SPARQL engnies support various operators, which are able to process full SPARQL [16] [11], and most of which are enumerated in the following table. For more details on these logical and physical operators, please have a look into the literature (about databases), if we do not refer to a special publication there.

Logical Operator

Physical Operator(s)

Approaches

Description

Join

HashJoin

All

HashJoin implements a round based hash join algorithm with a building phase and a probing phase. In the building phase, the smaller input data is divided into several partitions by using hash functions repeatedly until each partition fits into main memory. The larger input data is partitioned in the same way. Afterwards in the probing phase, the corresponding partitions are joined.

HashMapIndexJoin,

DBBPTreeIndexJoin,

HybridIndexJoin

All

These physical operators build an index for the input data of its operands. For each received input data, these join algorithms access the index of the other operand in order to compute the join result. The used indices are hash maps, disk based B+-trees or an index, which uses hash maps for data, which fits into main memory, otherwise disk based B+-trees.

NestedLoopJoin

All

The Nested-Loop-Join uses a nested loop in order to compute the join result.

DBMergeSortedBagMergeJoin, TreeBagMergeJoin

All

These operators first sort their input data and use then the merge join algorithm in order to compute the join result. For sorting their input data, TreeBagMergeJoin uses the TreeBag java class for sorting its input data, DBMergeSortedBagMergeJoin uses a disk-based merge sort for bags algorithm. Note that if all the input data fits into main memory, the latter does not use a hard disk and works fully in main memory.

MergeJoinWithoutSorting

RDF3X (- Presorting), Hexastore (- Presorting)

MergeJoinWithoutSorting expects its input data in the correct sorted way and applies the merge join algorithm directly on its input data.

SIPFilterOperator(Iterator)

SIPFilterOperator(Iterator)

RDF3X (- Presorting), Hexastore (- Presorting)

This operator constructs a bloom filter of its operands data and informs corresponding triple patterns about this bloom filter. The informed triple patterns use the bloom filter to early discard irrelevant intermediate results.

Union

Union

All

Union builds the union of the results of its operands.

MergeUnion

RDF3X (- Presorting), Hexastore (- Presorting)

MergeUnion expects its input data to be sorted and merges the results of its operands, such that the final result is still sorted.

Filter

Filter

All

Filter evaluates filter expressions of SPARQL.

Projection

Projection

All

Projection projects the result to a given list of variables.

Result

Result

All

This operator is the final operator and provides access to the final result of the query.

Construct

Construct

All

This operator transforms a bag of environments consisting of variable bindings into a set of triples based on templates of a given Construct clause in the query.

MakeBooleanResult

MakeBooleanResult

All

This operator transforms a bag of environments consisting of variable bindings into a Boolean value for an Ask-query.

(Fast)Sort

DBMergeSortedBagSort, HybridSortedBagSort, InsertionSort, QuickSort, TreeMapSort, FastSortLazyLiteral, FastSortPresortingNumbers, SortLimit

All

These operators sort their input data using a disk based merge sort algorithms for bags, the insertion sort algorithm, the quicksort algorithm or the TreeMap java class. The FastSort operators use literal ids or attached presorting numbers for fast sorting. SortLimit optimizes a sort operator with a succeeding Limit operator by early discarding irrelevant (larger) data.

Distinct

DBSetBlockingDistinct

All

DBSetBlockingDistinct uses the disk based merge sort algorithm for sets for duplicate elimination.

HashBlockingDistinct

All

HashBlockingDistinct uses a hash set for duplicate elimination.

HybridBlockingDistinct

All

HybridBlockingDistinct uses hash sets for input data, which fits into main memory, and otherwise a disk based merge sort algorithm for sets for duplicate elimination.

InMemoryDistinct

All

InMemoryDistinct uses a hash set for duplicate elimination, but is not blocking in comparison to HashBlockingDistinct, i.e. returns an item as early as possible if it is different to the previous items.

SortedDataDistinct

RDF3X ( - Presorting), Hexastore ( - Presorting)

SortedDataDistinct expects its input data to be sorted, such that the duplicate elimination is efficient by just comparing the current item with its previous item.

Limit

Limit

All

Limit implements the support for the Limit-clause.

Offset

Offset

All

Offset implements the support for the Offset-clause.

AddBinding

AddBinding

All

This operator might occur because of a logical optimization rule for constant propagation.

AddBindingFromOtherVar

AddBindingFromOtherVar

All

This operator might occur because of a logical optimization rule for variable propagation.

IndexCollection

IndexCollection

Index, RDF3X ( - Presorting), Hexastore ( - Presorting)

This is the root node for the indexing approaches and start point for their evaluations.

Index, RDF3XIndex, HexastoreIndex

Index,

RDF3XIndex,

HexastoreIndex

Index, RDF3X ( - Presorting), Hexastore ( - Presorting)

These operators implement the index access based on one (or more) triple patterns for the Index, RDF3X (-Presorting) and Hexastore (- Presorting) approaches.

PatternMatcher

SimplePatternMatcher

Stream

For each input RDF triple, the SimplePatternMatcher operator tests each triple pattern, if it matches the triple and transmits the triple to this triple pattern (see [10]).

HashPatternMatcher

Stream

The HashPatternMatcher uses a hash function in order to directly determine those triple patterns, which match an input triple (see [10]).

TriplePattern

TriplePattern

Stream

The Triple Pattern operator tests, if it matches an input RDF triple and compute the resultant variable bindings in this case (see [10]).

The Optional operator, which is not contained in this table, corresponds to a left-outer-semi-join. We support similar algorithms for the Optional operator as for the Join operator. The names of the Optional operators contain “Optional” instead of “Join” for the different implementations, i.e. DBBPTreeIndexOptional, DBMergeSortedBagOptional, HashMapIndexOptional, HybridIndexOptional, MergeOptionalSort, MergeWithoutSortingOptional, NaiveOptional (similar to NestedLoopJoin) and TreeBagOptional.

There are more operators for the SPARUL support, which we do not enumerate here.

Logical Optimization Rules

All the implemented logical optimization rules are presented on this webpage.

Demonstration Examples

We present some queries and RDF data used in the demo to demonstrate, e.g. which example queries can be optimized with which logical optimization rules and where the different engines are especially efficient. We present only queries from the SP2B benchmark here. Nevertherless, an analogous table could be created for the LUBM queries [5], which are also used in our demo.

Query (Queries)

Data

Engine(s)

Description

sp2b_q1.sparql

sp2b_demo.n3

Index ↔ RDF3X (- Presorting), Hexastore (- Presorting) ↔ Stream

All engines first transform the query into a CoreSPARQL query, which e.g. eliminates prefixes (see [7]). The operator graphs of this simple query show the different evaluation strategies of the different evaluators. For the Index engine, all three triple patterns are in an Index operator, which correspond to an evaluation of a right-deep join tree. For RDF3X (- Presorting) and Hexastore (- Presorting), collation orders of indices are used to apply merge joins in an optimized join order. The Stream engine applies the operators on RDF triples just when they are read from the input, such that they have to pass a pattern matcher, which distribute them to the specific triple patterns.

sp2b_q2.sparql

sp2b_demo.n3

RDF3X/Hexastore ↔ RDF3X/Hexastore - Presorting

As special optimization capability of RDF3X/Hexastore - Presorting, in the physical optimization, a Sort operator is replaced by using a MergeOptionalSort operator instead of a MergeWithoutSortingOptional operator.

sp2b_q3a.sparql,
sp2b_q3b.sparql,
sp2b_q3c.sparql

sp2b_demo.n3

Index, RDF3X (- Presorting), Hexastore (- Presorting)

These queries are examples, where the rule Constant Propagation can be used.

sp2b_q4.sparql,
sp2b_q8.sparql,
sp2b_q12a.sparql,
sp2b_q12b.sparql

sp2b_demo.n3

RDF3X (- Presorting), Hexastore (- Presorting)

These queries are examples, where the rule Push Filter can be used.

sp2b_q4.sparql,
sp2b_q5a.sparql,
sp2b_q5b.sparql,

sp2b_demo.n3

RDF3X/Hexastore ↔ RDF3X/Hexastore - Presorting

Here, as special optimization capability of RDF3X/Hexastore - Presorting, in the physical optimization, MergeJoinSort operators are used, such that Merge Join operators can be applied instead of HashMapIndexJoin.

sp2b_q4.sparql,
sp2b_q5a.sparql,
sp2b_q5b.sparql,

sp2b_demo.n3

RDF3X/Hexastore ↔ RDF3X/Hexastore - Presorting

Here, as special optimization capability of RDF3X/Hexastore - Presorting, in the physical optimization, SortedDataDistinct can be used instead of DBSetBlockingDistinct by using a previous MergeJoinSort or MergeOptionalSort.

sp2b_q5a.sparql

sp2b_demo.n3

Index, RDF3X (- Presorting), Hexastore (- Presorting)

The rule Variable Propagation cannot be applied, as ?name1 and ?name2 only appear as objects.

sp2b_q6.sparql

sp2b_demo.n3

Index, RDF3X (- Presorting), Hexastore (- Presorting), Stream

The rule Bound Variable in Optional is applied for this query.

sp2b_q1.sparql,
sp2b_q2.sparql,
sp2b_q4.sparql,
sp2b_q5a.sparql,
sp2b_q5b.sparql,
sp2b_q6.sparql,
sp2b_q7.sparql,
sp2b_q8.sparql,
sp2b_q12a.sparql,
sp2b_q12b.sparql

sp2b_demo.n3

Stream

For the Stream engine, the rule Binary Join replaces Join operators with more than two operands with binary Join operators.

sp2b_E1.sparql

sp2b_demo.n3

Index, RDF3X (- Presorting), Hexastore (- Presorting), Stream

The operator graph of this query demonstrates the application of the rule Bound Variable In Union.

sp2b_E2.sparql

sp2b_demo.n3

Index, RDF3X (- Presorting), Hexastore (- Presorting), Stream

The rule Constant Propagation of Filter with ORs is applied during optimizing this query.

sp2b_E3.sparql

sp2b_demo.n3

Index, RDF3X (- Presorting), Hexastore (- Presorting), Stream

The application of the rule Variable Propagation is demonstrated when optimizing this query.

sp2b_E4.sparql

sp2b_demo.n3

RDF3X (- Presorting), Hexastore (- Presorting)

A Sort operator can be eliminated using an already existing order in the intermediate results during the physical optimization.

sp2b_E4.sparql

sp2b_demo.n3

RDF3X (- Presorting), Hexastore (- Presorting)

A SortedDataDistinct operator can be applied using an already existing order in the intermediate results during the physical optimization.

Literature

  1. T. Berners-Lee, Notation 3 - An RDF language for the Semantic Web. W3C, 1998. Available at: http://www.w3.org/DesignIssues/Notation3.html
  2. Boley, H., Kifer, M., RIF Basic Logic Dialect, W3C Recommendation, 2010.
  3. Broekstra, J., Kampman, A., and van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. ISWC, Springer, Sardinia, 2002.
  4. Lee Feigenbaum (editor), DAWG Testcases, http://www.w3.org/2001/sw/DataAccess/tests/r2, 2008.
  5. Guo, Y., Pan, Z., and Heflin, J. LUBM: A Benchmark for OWL Knowledge Base Systems. Web Semantics 3(2), 2005. 
  6. Jinghua Groppe, Sven Groppe, Sebastian Ebers, Volker Linnemann, Efficient Processing of SPARQL Joins in Memory by Dynamically Restricting Triple Patterns, 24th ACM Symposium on Applied Computing (ACM SAC 2009), Waikiki Beach, Honolulu, Hawaii, USA, 2009.
  7. Groppe, J., Groppe, S., and Kolbaum, J., Optimization of SPARQL by using coreSPARQL, 11th International Conference on Enterprise Information Systems, Milan, Italy, 2009.
  8. Jinghua Groppe, Sven Groppe, Andreas Schleifer, Volker Linnemann, LuposDate: A Semantic Web Database System, 18th ACM Conference on Information and Knowledge Management (ACM CIKM 2009), Hong Kong, China, 2009.
  9. Sven Groppe, Data Management and Query Processing in Semantic Web Databases, Springer, 2011. ISBN 978-3-642-19356-9 Book Webpage
  10. Sven Groppe, Jinghua Groppe, Dirk Kukulenz, Volker Linnemann, A SPARQL Engine for Streaming RDF Data, 3rd International Conference on Signal-Image Technology & Internet-Based Systems (SITIS 2007), Shanghai, China, 2007. This paper received an honorable mention at the SITIS'07 Conference.
  11. Harris, S., Seaborne, A., SPARQL 1.1 Query Language, W3C Working Draft, 2012.
  12. Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P. F., Rudolph, S., OWL 2 Web Ontology Language Primer, W3C Recommendation, 2009.
  13. Manola, F., Miller, E., RDF Primer, W3C Recommendation, 2004.
  14. Thomas Neumann, Gerhard Weikum, RDF3X: a RISCstyle Engine for RDF, Proceedings of the 34th International Conference on Very Large Data Bases (VLDB), Auckland, New Zealand, 2008.
  15. Polleres, A., SPARQL1.1: Test case structure, Working Document, 2012.
  16. Prud’hommeaux E., Seaborne A. SPARQL Query Language for RDF, W3C Recommendation, 2008.
  17. Reynolds, D., OWL 2 RL in RIF, W3C Working Group Note, 2010.
  18. Schmidt, M., Hornung, T., Lausen, G., and Pinkel, C. SP2Bench: A SPARQL Performance Benchmark, 25th International Conference on Data Engineering (ICDE), Shanghai, China, 2009.
  19. A. Seaborne and G. Manjunath. SPARQL/Update, A language for updating RDF graphs. http://jena.hpl.hp.com/~afs/SPARQL-Update.html, 2008.
  20. Cathrin Weiss, Panagiotis Karras, Abraham Bernstein, Hexastore: Sextuple Indexing for Semantic Web Data Management, Proceedings of the 34th International Conference on Very Large Data Bases (VLDB), Auckland, New Zealand, 2008.
  21. Wilkinson, K., Sayers, C., Kuno, H. A., and Reynolds, D. Efficient RDF Storage and Retrieval in Jena2. In SWDB’03 co-located with VLDB 2003, Berlin, Germany, 2003.