How do MemGraph and RedisGraph

Graph Query Language - Graph Query Language

GQL ( Graph Query Language ) is a proposed standard graph query language. In September 2019 a proposal for a project to create a new standard graphics query language (ISO / IEC 39075 Information Technology - Database Languages ​​- GQL) was approved by a vote of the national standardization bodies that are members of the Joint Technical Committee 1 of ISO / IEC (ISO / IEC JTC 1). JTC 1 is responsible for international information technology standards. GQL is intended to be a declarative database query language like SQL.

Project for a new International Standard Graph Query Language

The GQL project proposal states:

"Using a chart as the basic representation for data modeling is a new approach to data management. In this approach, the data set is modeled as a chart, representing each unit of data as the vertex (also known as a node) of the chart and any relationship due to the chart data model First, the chart model can be a natural fit for data sets that have hierarchical, complex, or even arbitrary structures. Such structures can be more convenient than the relational model, which normalizes the data set into a set of Second, the chart model enables the efficient execution of expensive queries or data analysis functions that need to observe multi-hop relationships between data entities, such as reachability queries, shortest or cheapest path queries, or centrality analyzes There are currently two diagram models in use: the RDF (Resource Description Framework) model and the Property Graph model. The RDF model has been standardized in a number of specifications by W3C. The property graph model, on the other hand, has a variety of implementations in graph databases, graph algorithms, and graph processing functions. However, there is no common, standardized query language for property diagrams (such as SQL for relational database systems). GQL is suggested to fill this gap. "

The GQL project is the culmination of converging initiatives from 2016, notably a private proposal from Neo4j to other database vendors in July 2016 and a proposal from Oracle technical staff as part of the ISO / IEC JTC 1 standard process later this year.

The GQL project is led by Stefan Plantikow (who was the first lead engineer on Neo4j's Cypher for Apache Spark project) and Stephen Cannan (Technical Corrigenda Editor of SQL). They are also the editors of the first early working drafts of the GQL specification.

As originally motivated, the GQL project aims to complement the work of creating an actionable normative specification in natural language with supportive community efforts that allow contributions from people who are unable or not interested in the formal process to participate in the definition of an international JTC 1 standard. In July 2019, the Linked Data Benchmark Council (LDBC) agreed to become the umbrella organization for the efforts of the community's technical working groups. The Existing Languages ​​and Property Graph Schema working groups were formed in late 2018 and early 2019, respectively. During the third GQL Community Update in October 2019, a working group was proposed to define the formal denotation semantics for GQL.

The data model of the GQL property graph

GQL is a query language specifically for property graphs. A property graph is very similar to a conceptual data model, as expressed in an entity relationship model or UML class diagram (although it does not contain many-ary relationships connecting more than two entities). Entities or concepts are modeled as nodes and relationships as edges in a diagram. Are property graphs Multigraphs : There can be many edges between the same pair of nodes. GQL charts can be mixed : They can contain directed edges, with one of the endpoint nodes of an edge being the end (or source) and the other node being the head (or destination or destination), but they can also contain undirected (bidirectional) or reflective) edges.

Nodes and edges, collectively referred to as elements, have attributes. These attributes can be data values ​​or labels. Property values ​​cannot be elements of diagrams or entire diagrams: These restrictions intentionally enforce a clean separation between the topology of a diagram and the attributes that data values ​​carry in the context of a diagram topology. Therefore, the property diagram data model intentionally prevents the nesting of diagrams or treating nodes in one diagram as edges in another. Each property chart can have a number of labels and a number of properties associated with the entire chart.

Current graph database products and projects often support a limited version of the model described here. For example, Apache Tinkerpop forces every node and edge to have a single label. With Cypher, nodes can have zero to many labels, but relationships only have a single label (called a reltype). Neo4j's database supports undocumented graphic-wide properties, Tinkerpop has chart values ​​that play the same role, and also supports "metaproperties" or properties on properties. Oracle PGQL supports zero to many labels on nodes and edges, while SQL / PGQ supports one to many labels for each type of element. The NGSI-LD information model specified by ETSI is an attempt to formally specify property graphs with node and relationship types (edges) that play the role of labels in the models mentioned above and support semantic referencing by inheriting classes that are defined in shared ontologies.

The GQL project will define a standard data model that will likely represent the superset of these variants, and at least the first version of GQL will likely allow vendors to decide on the cardinalities of labels in any implementation, as well as SQL / PGQ and to decide whether to support undirected relationships.

Additional aspects of the ERM or UML models (such as generalization or subtyping or entity or relationship cardinalities) can be captured by GQL schemas or types that describe possible instances of the general data model.

WG3: Extend SQL and create GQL

The GQL project will run for four years. Seven national standardization bodies (those of the United States, China, Korea, the Netherlands, the United Kingdom, Denmark and Sweden) have appointed national subject matter experts to work on the project, which will be carried out by Working Group 3 (Database Languages). of Subcommittee 32 (Data Management and Exchange) of ISO / IEC JTC 1, usually abbreviated as ISO / IEC JTC 1 / SC 32 WG3 or short WG3 . WG3 (and its direct predecessor committees within JTC 1) has been responsible for the SQL standard since 1987.

Extending existing chart query languages

The GQL project relies on multiple sources or inputs, particularly existing industrial languages ​​and a new section of the SQL standard. In preparatory discussions within the framework of WG3, surveys on the history and comparative content of some of these contributions were presented. GQL will be a declarative language with its own syntax that plays a role similar to SQL in creating a database application. Other graph query languages ​​have been defined that provide direct procedural functions such as branching and loops (Apache Tinkerpops Gremlin) and GSQL that allow iteratively to traverse a graph to execute a class of graph algorithms, but GQL does not do those directly include properties. However, GQL is intended as a specific instance of a more general class of diagram languages ​​that share a diagram type system and a calling interface for procedures for processing diagrams.

SQL / PGQ property graph query

Previous work by WG3 and SC32 mirror bodies, particularly in INCITS DM32, helped define a new planned part 16 of the SQL standard that would allow a read-only diagram query to be invoked in an SQL SELECT statement that uses a diagram pattern matches syntax that comes very close to Cypher, PGQL and G-CORE and returns a table with data values ​​as the result. SQL / PGQ also includes DDL to enable mapping of SQL tables to a chart object schema object with nodes and edges associated with label and data property sets. The GQL project coordinates closely with the SQL / PGQ "project division" of (extension to) ISO 9075 SQL, and the technical working groups in the USA (INCITS DM32) and at the international level (SC32 / WG3) have several experts who can do this contribute work on both projects. The GQL project proposal prescribes close coordination between SQL / PGQ and GQL, which indicates that GQL will generally be a superset of SQL / PGQ.

cipher

Cypher is a language originally developed by Andrés Taylor and colleagues at Neo4j Inc. and first implemented by that company in 2011. Since 2015 it has been made available as an open source language description with grammar tools, a JVM frontend that analyzes Cypher queries and a Technology Compatibility Kit (TCK) with over 2000 test scenarios in which Cucumber is used for the portability of the implementation language . The TCK reflects the language description and an extension for temporal data types and functions that are documented in a Cypher improvement proposal.

Cypher allows you to create, read, update, and delete chart elements, so it is a language that can be used for analysis engines and transactional databases.

Queries with visual path patterns

Cypher uses compact, fixed and variable length patterns that combine visual representations of node and relationship topologies (edges) with predicates for the presence of labels and property values. (These patterns are usually referred to as "ASCII-art" patterns and were originally created for annotating programs that used a child diagram API.) By matching such a pattern with diagram data items, a query can extract references to nodes, relationships, and paths from Interest. These references are output as a "binding table" in which column names are bound to a variety of diagram elements. The name of a column becomes the name of a "binding variable", the value of which is a particular chart element reference for each row of the table.

For example, a pattern generates a two-column output table. The first column mentioned contains references to nodes with a label. The second column contains references to nodes with a label indicating the city in which the person lives.

The binding variables and can then be dereferenced to gain access to property values ​​associated with the elements referenced by a variable. The sample query might end with a, resulting in a full query like the following:

MATCH (p: Person) - [: LIVES_IN] -> (c: City) RETURNp.first_name, p.last_name, c.name, c.state

This would result in a final four column table listing the names of the residents of the cities stored in the graph.

Pattern-based queries can express joins by combining multiple patterns that use the same binding variable to express a natural join using the following clause:

MATCH (p: Person) - [: LIVES_IN] -> (c: City), (p: Person) - [: NATIONAL_OF] -> (EUCountry) RETURNp.first_name, p.last_name, c.name, c.state

This query would only return the place of residence of EU citizens.

An outer link can be expressed by:

MATCH (p: Person) - [: LIVES_IN] -> (c: City) OPTIONAL MATCH (p: Person) - [: NATIONAL_OF] -> (ec: EUCountry) RETURNp.first_name, p.last_name, c.name, c .state, ec.name

This query would return the place of residence of each person in the graphic with residential information and, if an EU citizen, which country they come from.

Queries can therefore first project a subgraph of the graphics input into the query and then extract the data values ​​associated with that subgraph. Data values ​​can also be processed by functions, including aggregation functions, resulting in the projection of calculated values ​​that represent the information contained in the projected chart in various ways. Based on G-CORE and Morpheus, GQL aims to project the subgraphs defined by matching patterns (and the graphs then calculated using these subgraphs) as new graphs that are to be returned by a query.

Patterns of this nature have become ubiquitous in property graph query languages ​​and form the basis for defining the extended pattern sublanguage in SQL / PGQ, which is likely to become a subset of the GQL language. Cypher also uses patterns for insert and modification clauses (and), and suggestions have been made in the GQL project to collect node and edge patterns to describe diagram types.

Cypher implementations

Cypher is in the Neo4j database, in SAP's HANA Graph, in Redis Graph, in Cambridge Semantics' Anzograph, in Bitnines AgensGraph, in Memgraph and in open source projects Cypher for Gremlin, which are managed by Neueda Labs in Riga, and implemented in Cypher for Apache Spark (now renamed Morpheus) and in research projects such as Cypher.PL and Ingraph. Cypher as a language is managed as an openCypher project by an informal community that has held five personal openCypher implementation meetings since February 2017.

Cypher 9 and Cypher 10

The current version of Cypher (including the time extension) is called Cypher 9. Before the GQL project, a new version, Cypher 10 [ REF HEADING BELOW ], to create the functions like Scheme and composable chart queries contains and views. The first drafts for Cypher 10, including the creation and projection of graphics, were implemented in the Cypher for Apache Spark project from 2016.

PGQL

PGQL is a language developed and implemented by Oracle Inc., but is made available as an open source specification along with the JVM parsing software. PGQL combines the familiar SQL SELECT syntax, including SQL expressions, and the order and aggregation of results, with a pattern matching language very similar to that of Cypher. It allows the specification of the diagram to be queried and includes a function for macros to capture "sample views" or named sub-samples. It does not support insert or update operations as these are primarily designed for an analytics environment such as Oracle's PGX product. PGQL has also been implemented in Oracle Big Data Spatial and Graph, as well as in a research project, PGX.D / Async.

G-CORE

G-CORE is a research language developed by a group of academic and industrial researchers and language designers based on the functions of Cypher, PGQL and SPARQL. The project was carried out under the auspices of the Linked Data Benchmark Council (LDBC), starting with the creation of a Graph Query Language Task Force in late 2015, with most of the paperwork done in 2017. G-CORE is a composable language closed about diagrams: diagram inputs are processed to create diagram output. It uses chart projections and chart set operations to create the new chart. G-CORE queries are pure functions using graphics without any side effects. This means that the language does not define any operations that mutate (update or delete) stored data. G-CORE introduces views (named queries). It also contains paths as elements in a diagram ("Paths as First Class Citizens"), which can be queried independently of projected paths (which are calculated using node and edge elements at query time). G-CORE was partially implemented in open source research projects in the LDBC GitHub organization.

GSQL

GSQL is a language developed for the TigerGraph Inc. property chart database. TigerGraph language designers have been promoting and working on the GQL project since October 2018. GSQL is a Turing-complete language that includes procedural flow control and iteration, as well as a function for collecting and modifying calculated values ​​associated with program execution for the entire graph or for elements of a graph called accumulators. These functions enable iterative chart computations to be combined with data exploration and retrieval. GSQL diagrams must be described by a vertex and edge scheme that restricts all insertions and updates. This schema therefore has the closed-world nature of an SQL schema, and this aspect of GSQL (which is also reflected in design proposals coming from the Morpheus project) is proposed as an important optional feature of GQL.

Vertices and edges are referred to as schema objects that contain data but also define a subtype, much like SQL tables are data containers with an associated implicit row type. GSQL charts are then assembled from these vertex and edge sets, and multiple named charts can contain the same vertex or edge set. GSQL has developed new features since its release in September 2017, in particular the introduction of variable length edge pattern adjustments using a syntax similar to that in Cypher, PGQL and SQL / PGQ, but also close to that of the fixed length patterns offered by comes microsoft sql / server diagram

GSQL also supports the concept of multigraphs, with which subsets of a graph are given role-based access control. Multigraphs are important for enterprise-scale diagrams that require fine-grained access control for different users.

Morpheus: Multiple Charts and Composable Chart Queries in Apache Spark

The opencypher Morpheus project implements Cypher for Apache Spark users. As of 2016, this project originally ran alongside three related efforts that also included Morpheus designers: SQL / PGQ, G-CORE, and designing Cypher extensions to query and create multiple diagrams. The Morpheus project acted as a test environment for extensions to Cypher (known as "Cypher 10") in the two areas of graphics DDL and extensions to the query language.

Graph DDL features include:

  1. Definition of property diagram views via JDBC-linked SQL tables and Spark DataFrames
  2. Definition of diagram schemes or types, which are defined by putting together node type and edge type patterns with subtype
  3. Restrict the content of a diagram with a closed or fixed schema
  4. Create catalog entries for multiple named diagrams in a hierarchically organized catalog
  5. Draw data sources to form a federated, heterogeneous catalog
  6. Create Catalog Entries for Named Queries (Views)

The graphics query language enhancements include

  1. Graph Union
  2. Projection of graphs calculated from the results of pattern matches onto several input graphs
  3. Support for tables (Spark DataFrames) as inputs for queries ("Driving Tables")
  4. Views that accept named or projected charts as parameters.

These functions were proposed as inputs for the standardization of query languages ​​for property graphs in the GQL project.

See also

References