ESWC 2007 Tutorial: SPARQL - Where are we?
Current state, theory and practice

Marcelo Arenas

Claudio Gutierrez

Bijan Parsia

Jorge Pérez

Axel Polleres

Andy Seaborne

Motivation and Objectives

After the data and ontology layers of the Semantic Web stack have achieved considerable stability through standard recommendations such as RDF and OWL, the query layer is the next item to be completed on W3C's agenda. This layer is realized by the SPARQL Protocol and RDF Query Language (SPARQL) currently under development by W3C's Data Access working group (DAWG). Although the SPARQL specification is not yet 100% stable - it just dropped a step back from candidate recommendation to working draft - people are taking up this specification at tremendous pace, driven by the strong need for a long awaited standard in querying the Semantic Web and being able to make use of the advantages of RDF together with common metadata-vocabularies at large scale.

This is just the right moment to reflect on the current state of the language and its applications, which we aim to provide in the proposed tutorial. The contributions of this tutorial will be along two complementary streams: On the one hand we will provide a practical introduction to SPARQL for newcomers, giving examples from various application domains, providing formal underpinnings and guiding attendees through the jungle of existing implementations, including those which reach beyond the current specification to query more expressive semantic web languages than RDF alone. Thus, participants will get a clear sense of the language as it is specified and as it exists in implementations. On the other hand, we will go further into depth of theoretical foundations of SPARQL, presenting recent results of SPARQL's complexity, formal foundations in terms of database theory, as well as its exact semantic relation to the other building blocks in the SW stack, namely, RDF Schema, OWL and the upcoming rules layer. Finally, we aim to bring these two streams together, and will identify the current limitations and challenges around SPARQL, pointing to possible extensions and emerging application fields.

The presenters of this tutorial tackle the topic of Semantic Web querying and SPARQL from various, complementary viewpoints. Andy Seaborne co-chairs the Data Access working group (DAWG) which is responsible for the development of SPARQL. The group from the Center for Web Research - Chile¹, Marcelo Arenas, Claudio Gutierrez, and Jorge Pérez, with their long-term experience and excellent record in database technologies for the Web and Semantic Web foundations, is main responsible for recent successes towards more formal backgrounds of SPARQL, and provided best-paper winning results on semantics and complexity of the language. Bijan Parsia was a long-term member of the DAWG and was also involved in the development of OWL. Axel Polleres has a strong background in deductive databases and rules and is a member of the Rules Interchange format (RIF) working group which currently works on the Semantic Web rules layer, the subsequent building block of the SW stack.

Goals

After the tutorial, attendees new to SPARQL should be able to formulate queries, understand the differences and overlaps of SPARQL with traditional Database query languages like SQL and have sufficient insight to understand issues in existing SPARQL engines that might affect their applications. The theoretical background given in the second half of the tutorial will provide deeper understanding of SPARQL's underlying semantics and complexity. We will see how corner cases are handled and understand limitations of the current language. Moreover, we will provide a detailed picture of SPARQL's position in the space of related Semantic Web standards. Finally we will give an outlook to emerging research challenges and possible future directions. Compared with previous, related tutorials at other conferences such as at WWW2005, we can offer a much more comprehensive view reporting significant progress on the language specification especially with respect to theoretical foundations of SPARQL, but also with respect to implementation experience, all of which influenced the selection of covered topics.

Outline and Intended Audience

The tutorial will be divided in two main parts: The morning part covering primarily the practical side of SPARQL, and an afternoon part going more into depth towards the foundational aspects of SPARQL and discussing Semantic Web data access in the bigger context of related standards.

The first part of the tutorial will have emphasis in providing a solid basis in the semantics and use of the language. This part will be of special interest for a wide range of attendees, from initial users wanting to make their first steps in SPARQL, to advanced developers willing to use the full potential of the SPARQL language in Semantic Web applications.

The second part will be centered around the formalization of the language, discussing corner cases, and open interesting problems raised by SPARQL and RDF data access, relating this theory with well known topics from Database theory. Moreover, the relation to OWL and RDFS as well as aspects of practically evaluating SPARQL over ontologies will be covered in detail. Finally we will discuss novel aspects such as the emerging Semantic Web Rules layer, where we will discuss the use of SPARQL itself as a powerful rules language as well as its relation to other deductive rules languages. Extensions to the normative specification of the language of practical use will also be discussed with emphasis in the theoretical implications/problems that they raise. The goal of this second part of the tutorial is to give people a theoretical framework for further research in, around and beyond SPARQL and RDF+OWL data access. The intended audience of this part are primarily researchers with some background on RDF, query languages, and complexity theory, although we will provide sufficient detail to convey the most important take-home messages for any attendant with a solid computer science background.

The whole tutorial is conceived as a full-day event which will take beginners from simple examples all the way through recent research and standardization challenges. Attendants with a mainly practical emphasis will receive a complete introduction with useful hints to start using and implementing SPARQL and SPARQL-based applications in the morning session. People with a more theoretical interest to the language and already familiar with the language basics might decide to attend only the afternoon session which complements and deepens the practical part form the morning session in several ways. Without claiming that these two parts should be viewed completely self-contained, we aim to allow for a logical entry/exit point also for people interested in parallel tutorials, thus also making the single parts ``consumable'' independently with the suggested structure.

More specifically, the tutorial will be organized in six units, where Units 1-3 mark the morning part and Units 4-6 the afternoon part, as follows.

Unit 1 - SPARQL Basics (90min): The first Unit is tailored to give a gentle introduction to all the features of SPARQL, starting from simple queries towards more complex, less known, but interesting features of the language. In this session we aim to guide new users, but also users already roughly familiar with SPARQL, through all major features of the language. Moreover, we will provide interesting insights in design rationale and requirements which guided the inclusion of these features in the process of the working group. This unit will be presented by Andy Seaborne, a core member of the W3C Data Access working group and editor of the current SPARQL query language specification.
Unit 2 - SPARQL Semantics (45min): In this Unit we will present the algebra underlying and defining a formal semantics for SPARQL ². The formal semantics presented here will be exemplified by several queries from Unit 1 and practitioners will get sufficient insight to understand the formal underpinnings of SPARQL. Here, we will restrict ourselves to only the level of detail necessary to understand implementation aspects covered in Unit 3. More in-depth considerations on theoretical foundations will be covered in Unit 4. This session will be presented by Andy Seaborne and Jorge Pérez, who received the best paper award at ISWC 2006 for his foundational work underlying SPARQL's formal semantics in its current state.
Unit 3 - SPARQL Implementations and Applications (45min): Following some basic implementation strategies for SPARQL, we will present several engines and their deployment in actual use cases. The focus in this Unit is to give practitioners hints on tools, available implementations, and APIs which they can use off-the-shelf or optimize in order to develop Semantic Web applications on top of SPARQL. This will include examining implementations that go beyond the current specification to evaluate SPARQL queries against RDFS and OWL datasets. This session will be presented by Andy Seaborne and Bijan Parsia.
Unit 4 - SPARQL Foundations (90min).: Although taken one by one the features of SPARQL are simple to describe and understand, it turns out that the combination of them makes SPARQL into a complex language even when querying ground RDF data with no special vocabulary. Thus, a clear and formal semantics is of paramount importance to developers and researchers. In this session of the tutorial we address this topic covering: I) Formal Semantics of SPARQL: algebraic syntax; compositional semantics for a core fragment; extension to the whole language; and comparison with the normative full semantics (continuation of Unit 2). II) Complexity of SPARQL: the computational complexity of evaluating a query for several fragments of the language; identifying the main sources of complexity. III) Ad-hoc optimization procedures: well-designed queries; operational (greedy) semantics; reordering and normal forms; optimization based on these normal forms. IV) Related work in the Databases community: Databases with incomplete information; classical optimization procedures, etc. This unit will be presented by Marcelo Arenas and Claudio Gutierrez, who have a long-term excellent research record in database as well as Semantic Web foundations.
Unit 5 - SPARQL and its neighbour components in the Semantic Web stack (90min).: The definitions in the current specification of SPARQL focus mainly on RDF simple entailment. In this unit we show how they can be extended towards coverage of the ontology layer in terms of RDFS and OWL entailment. As it turns out, this extension is not straightforward and a complete coverage of SPARQL imposes new challenges on OWL Reasoners. Next, we will treat the emerging Semantic Web rules layer and its relation to SPARQL. On the one hand, will see that a large part of SPARQL can be mapped to extended Datalog, a deductive rule based query language. On the other hand, we will discuss the use of SPARQL itself as a declarative rules language on top of RDF and OWL. As it turns out, the challenges arising in such a combination are closely related to those combining deductive non-monotonic rules languages with ontologies. The first part of this unit will be presented by Bijan Parsia, an active member of the OWL community who is currently driving extension of OWL towards its next version OWL 1.1. The second part will be presented by Axel Polleres, an active member of W3C's Rule Interchange Format working group.
Unit 6 - SPARQL Extensions and Outlook (45min).: In the last unit of this tutorial we will discuss further practical extensions of the current standard from simple extensions which simply did not find their way yet in the first version of the specification, to other extensions which seemingly easy will require significant more investigation and raise new research problems. This unit aims to spark further ideas on solving open issues by providing a down-to earth analysis of current limitations of the language and give an outlook to its future.

Hands-On Experience: The sessions focusing on practical aspects (especially Units 1, 3, but also 5) will have a hands-on part in the end of the respective unit where participants will have to solve simple practical examples under the guidance of the presenters. Engines and tools needed will be made available via a common Web-interface on a dedicated Tutorial webpage (see also Section 3.1).

Breaks: Coffee Breaks are foreseen after Unit 1 and Unit 4, but lengths of each unit may be adapted according to the fixed schedule prescribed by the ESWC organizers, if necessary.

Presentation Style and Technical Requirements

The attendee will be able to access a SPARQL engine via an online interface on the tutorial homepage to interactively follow presented examples. Especially in the first part we will make use of this interface in order to directly practice the learned features. On the tutorial homepage, we also plan to make accessible alternative engines and tools presented in Unit 3, as far as possible. Further, the complete slide sets shall be available in PDF format online on this page for attendees to conveniently follow the presentations on their laptops which has didactic advantages compared to hard copies only (e.g. color, overlays).

Technical requirements. We need a room with beamer and wireless internet connection. We expect participants to practice using their own laptops. We made positive experiences already in a similar setting for a tutorial on ``Answer Set Programming for the Semantic Web'' at last year's ESWC³. There, we provided a backup WiFi access point (which we can provide again) to a dedicated locally accessible server only for tutorial attendees in order to guarantee independence of network availability.

Intended Audience and Prerequisites

The tutorial is mainly directed to two categories of attendees:

-Beginners. Attendees with minimal knowledge about Semantic Web data access and querying will especially take advantage of Units 1, 2 and 3 and deepen their understanding in the remaining sessions.

-Expert and intermediate. The researcher with good general background in Database theory, query languages, and complexity theory, possibly seeking new and open research challenges in the context of SPARQL will benefit most from Units 4, 5, and 6.

Prerequisites. Although no specific knowledge beyond basic RDF is needed as a prerequisite, a certain background in computer science and database theory will allow attendees to better understand and follow the tutorial.

Relevance of the Tutorial to ESWC 2007

Semantic Web data access and querying are the key enabler to make Tim Berners Lee's often claimed vision of making ``all the data in the world look like one huge database'' come true. In this sense, the query layer which is now short before completion will play a central part in the Semantic Web's further development and take-up. The large number of paper submissions to ESWC's research track concerned with issues around SPARQL, despite the standard's pre-final state underline the importance of this issue for the whole community. As mentioned before, we think this is an ideal moment to take a reflecting view on the current state of the specification and its applications, which we aim to provide in the proposed tutorial. Also we are convinced that the topic might attract both practitioners from industry as well as scientists and we carefully chose the topics presented to serve both these audiences. As it is one of main objectives of the conferences to cater for both these groups of possible attendees and bring them together in a common frame event, we hope to have presented an attractive proposal for a tutorial meeting precisely this objective.

Information on Presenters

Marcelo Arenas, Pontificia Universidad Católica de Chile.

Home Page: http://www.ing.puc.cl/~marenas

Short Bio: Prof. Marcelo Arenas received B.Sc. degrees in Mathematics (1997) and Computer Engineering (1998) and a M.Sc. degree in Computer Science (1998) from the Pontificia Universidad Católica de Chile, and a Ph.D. degree in Computer Science (2005) from the University of Toronto, Canada. In 2005, he joined the Computer Science Department at the Pontificia Universidad Cat�lica de Chile as an Assistant Professor. His research interests are in different aspects of database theory, such as expressive power of query languages, database semantics, integrity constraints, inconsistency handling, database design, XML databases, data exchange and database aspects of the semantic web. Marcelo has received an IBM Ph.D. Fellowship (2004), three best paper awards (PODS 2003 in San Diego, California, PODS 2005 in Baltimore, Maryland and ISWC 2006 in Athens, Georgia) and an Honorable Mention Award in 2006 from the ACM Special Interest Group on Management of Data (SIGMOD) for his Ph.D dissertation, ``Design Principles for XML Data.''

Teaching experience: Dr. Arenas has experience in teaching several university lectures in the topics of Databases, XML and Logic for Computer Science. Moreover, in January of 2007 he will be giving a tutorial on foundations of RDF and SPARQL at the University of Edinburgh.

Selected Related Publications:

J. Pérez, M. Arenas and C. Gutierrez. Semantics and Complexity of SPARQL. In Proceedings of the 5th International Semantic Web Conference (ISWC'06), Athens, GA, USA, volume 4273 of LNCS, pages 30-43, Springer, 2006.
M. Arenas, P. Barcelo and L. Libkin. Combining Temporal Logics for Querying XML Documents. In Proceedings of the 11th International Conference on Database Theory (ICDT'07), Barcelona, Spain, volume 4353 of LNCS, pages 359-373, 2007.

Claudio Gutierrez, Department of Computer Science, Universidad de Chile.

Home page: http://www.dcc.uchile.cl/cgutierr/.

Short Bio: Claudio Gutierrez received degrees in mathematics and mathematical logic from Universidad de Chile and Pontificia Universidad Católica de Chile, and a Ph.D. degree in computer science from Wesleyan University, U.S.A. Currently, he is associated professor in the Computer Science Department at the Universidad de Chile, and associated researcher at the Center for Web Research. His research interest lies in the intersection of databases and the Semantic Web. He has received best research paper awards at the European Semantic Web Conference in 2005, and at the International Semantic Web Conference in 2006.

Teaching experience: C. Gutierrez has taught in several universities at undergraduate and graduate level, particularly on databases and Semantic Web.

Selected Related Publications:

C. Gutierrez, C. Hurtado, A. Vaisman. Introducing Time into RDF. IEEE Trans. Knowl. Data Eng. 19(2): 207-218 (2007)
J. Pérez, M. Arenas and C. Gutierrez. Semantics and Complexity of SPARQL. In Proceedings of the 5th International Semantic Web Conference (ISWC'06), Athens, GA, USA, volume 4273 of LNCS, pages 30-43, Springer, 2006.
C. Gutierrez, C. Hurtado, A. Vaisman. The Meaning of Erasing in RDF under the Katsuno-Mendelzon Approach. 9th. International Workshop on the Web and Databases 2006 (Co-located with SIGMOD), June 30, 2006.
C. Gutierrez, C. Hurtado, A. O. Mendelzon. Foundations of Semantic Web Databases. ACM Symposium on Principles of Database Systems (PODS), June 2004, pp. 95-106

Bijan Parsia, Information Management Group, School of Computer Science - University of Manchester, UK

Home page: http://homepages.manchester.ac.uk/~bparsia/.

Short Bio: Bijan Parsia is a lecturer (since 2006) in the School of Computer Science at the University of Manchester, UK. He has published over 50 papers in such areas as description logic reasoning, explanation, trust, ontology editing, planning, web service composition, ontology partitioning, and ontology visualization. He has been a member of the WSDL, WS-Architecture, Data Access, and WS-Policy working groups.

Teaching experience: He has experience in teaching several university lectures in Knowledge Representation and the Semantic Web. He co-organized a tutorial entitled, ``Learning from the Masters: Understanding Ontologies on the Web'' at ISWC 2007 and lectured on SPARQL at the 2006 Reasoning Web Summer School.

Selected Related Publications:

E. Sirin, B. Parsia. Optimizations for Answering Conjunctive ABox Queries. International Workshop on Description Logics, 2006.
B. Parsia. Querying the Web with SPARQL. Reasoning Web 2006, vol. 4126 of LNCS. Springer, 2006.
C. Halaschek-Wiener, B. Parsia, E. Sirin. Description Logic Reasoning with Syntactic Updates. OTM Conferences (1), 2006.

Jorge Pérez, Universidad de Talca - Chile.

Home Page: http://ing.utalca.cl/~jperez

Short Bio: Jorge Pérez received a B.Sc. degree in Computer Engineering and a M.Sc. degree in Computer Science from the Pontificia Universidad Católica de Chile. He is currently an Instructor Professor of the Computer Science Department at Universidad de Talca, and a Ph.D. student under the supervision of Prof. Marcelo Arenas. His research interests are primarily in database theory and the application of database technologies to the Web. Jorge has received the best research paper award at the 5th International Semantic Web Conference for work on SPARQL formalization from a database perspective.

Teaching experience: Jorge Pérez has experience in teaching several undergraduate courses lying in the core part of Computer Science curricula like Discrete Mathematics, Automata Theory, Algorithms and Datastructures, and Databases.

Selected Related Publications:

J. Pérez, M. Arenas and C. Gutierrez. Semantics and Complexity of SPARQL. In Proceedings of the 5th International Semantic Web Conference (ISWC'06), Athens, GA, USA, volume 4273 of LNCS, pages 30-43, Springer, 2006.

Axel Polleres, Universidad Rey Juan Carlos, Madrid.

Home page: http://www.polleres.net.

Short Bio: Dr Axel Polleres obtained his PhD in Computer Science at the Vienna University of Technology in 2003. He was working at DERI at the Leopold-Franzens Universitaet Innsbruck in the areas of Semantic Web Services, Ontologies, Rules Languages and Logic Programming from 2003 to early 2006. Continuing this research he currently works at Universidad Rey Juan Carlos, Madrid, under a ``Juan de la Cierva'' research fellowship. Dr. Polleres published more than 30 articles in journals, books and as refereed Conference and Workshop contributions. Ongoing research projects and working groups he is participating in include WSMO, WSML, and the W3C Rule Interchange Format (RIF) WG.
Teaching experience: Dr. Polleres has experience in teaching several university lectures and training courses in the topics of Logic Programming, Artificial Intelligence, Semantic Web and Web Services. Moreover, he co-organized a full-day tutorial on the topic of ``Answer Set Programming for the Semantic Web'' at last year's ESWC and will be a presenter at this year's Reasoning Web summer school.
Selected Related Publications:

Web Rule Language (WRL), September 2005. W3C member submission.
A. Polleres and R. Schindlauer. SPARQL: From SPARQL to rules. In International Semantic Web Conference (ISWC2006 - Posters Track), Athens, GA, USA, 2006.
T. Eiter, G. Ianni, A. Polleres, R. Schindlauer, and H. Tompits. Reasoning with rules and ontologies. Reasoning Web 2006, vol. 4126 of LNCS, pages 93-127. Springer, 2006.
J. de Bruijn, T. Eiter, A. Polleres, and H. Tompits. On representational issues about combinations of classical theories with nonmonotonic rules. 1st Int'l Conf. on Knowledge Science, Engineering and Management (KSEM'06), vol. 4092 of LNCS, pages 1-22, 2006. Springer. Invited paper.

Andy Seaborne, Hewlett-Packard Laboratories.

Home page: http://www.hpl.hp.com/people/afs/.

Short Bio: Dr. Andy Seaborne is a member of the Semantic Web Research Group in Hewlett-Packard Laboratories and he is based in Bristol, UK. He has been involved in RDF query languages since 2001, firstly with the development of RDQL for the Jena framework and latterly with the development of SPARQL. He is co-editor of the SPARQL query language specification. In addition, he has built two implementations of SPARQL, one, a reference implementation of SPARQL and one is a query engine that that is based on SQL.

Teaching experience: Dr. Seaborne gave, among others, tutorials on Jena at ISWC 2002, and on SPARQL at WWW2005 and at the 2006 Jena User Confernce.

Selected Related Publications:

E. Prud'hommeaux, A. Seaborne (eds.). SPARQL Query Language for RDF. RDF Data Access Working Group.
J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, and A. Seaborne, and K. Wilkinson. Jena: Implementing the Semantic Web Recommendations. WWW2004.
L. Miller, A. Seaborne, A. Reggiori. Three Implementations of SquishQL, a Simple RDF Query Language. International Semantic Web Conference 2002.
W3C Submissions: RDQL - A Query Language for RDF.

About this document ...

ESWC 2007 Tutorial: SPARQL - Where are we?
Current state, theory and practice

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)

The command line arguments were:
latex2html sparql-tutorial_final.tex -split 0

The translation was initiated by Axel Polleres on 2007-02-01

Footnotes

... Chile ¹: http://www.cwr.cl
...SPARQL\xspace ²: http://www.w3.org/2001/sw/DataAccess/rq23/rq24-algebra.html
... ESWC ³: http://www.eswc2006.org/tutorials.html#tutorial1

Axel Polleres 2007-02-01

ESWC 2007 Tutorial: SPARQL - Where are we? Current state, theory and practice

Footnotes

ESWC 2007 Tutorial: SPARQL - Where are we?
Current state, theory and practice