Groupings of queries Groupings of queries database database

Groupings of queries


One variant is to group the query language depending on the database categories.

  • relational (Microsoft SQL Server, Oracle, MySQL, MariaDB)
  • object-relational (PostgreSQL)
  • NoSQL
    • Key-value (Riak, Redis, Couchbase Server, MemcacheDB)
    • Columnar (HBase)
    • Document (MongoDV, CouchDB)
    • Graph (Neo4j)

So far, so good, but in reality the border line between the categories become thinner and thinner.

For example, we have graph support in Microsoft SQL Server and T-SQL we have syntax like the following:

-- Find Restaurants that John's friends likeSELECT Restaurant.name FROM Person person1, Person person2, likes, friendOf, RestaurantWHERE MATCH(person1-(friendOf)->person2-(likes)->Restaurant)AND person1.name='John';

In MongoDB, we have graph,too using graph lookup:

{   $graphLookup: {      from: <collection>,      startWith: <expression>,      connectFromField: <string>,      connectToField: <string>,      as: <string>,      maxDepth: <number>,      depthField: <string>,      restrictSearchWithMatch: <document>   }}

So, maybe the the highest-level grouping is just a group of database management system following the American National Standards Institute (ANSI) standards (relational and object-relational) and the others.


I will try to answer this question from analytics perspective.

Relational Database (DBMS):

SQL is one of the most common Functional Programming Languages that has been used to deal with the relationship between tables.

  • In terms of Data Analytics, we're using GROUP BY clause to summarize our data;

An important component for Analyst to summarize the data such as sales, profit, cost, and salary. Data Summarization is very helpful for the Analyst to create a visualization, conclude findings, and report writing. In SQL, GROUP BY Clause is one of the tools to summarize or aggregate the data series. For example, sum up the daily sales and combine in a single quarter and show it to senior management. Similarly, if you want to count how many employees in each department of the company. It groups the databases based on one or more column and aggregates the results., GROUP BY and HAVING Clause in SQL by Avinash Navlani

More in details:

Grouping in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.(1)

Simple Syntax

SELECT column1, function_name(column2)FROM table_nameWHERE conditionGROUP BY column1, column2ORDER BY column1, column2;
function_name: Name of the function used for example, SUM() , AVG().table_name: Name of the table.condition: Condition used.

Documents

  • Our example here will be about MongoDB.

When we're talking about Grouping in MongoDB, we've to mention the aggregation process when we're dealing with multiple documents.

  • Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together and can perform a variety of operations on the grouped data to return a single result. In SQL count(*) and with a group by is an equivalent of MongoDB aggregation. (2)

What's the difference between GROUPing in table and Document?

  • This question has to have 3 keys to answer is: (3)

    1- What kind of data are you using?

    • If you're using data that are connected, the best approach you can use is SQL.

    2- What type of process you want to do?

    • SQL databases are better for multi-row transactions, NoSQL are better for unstructured data like documents or JSON.

    3- What is your data scalability?

    • SQL databases are vertically scalable, NoSQL databases are horizontally scalable. Which means in terms of High-level-Grouping, SQL will be the winning card in terms of heavy and in-depth grouping beside more flexible in Normalization.

Graph

Example: Cypher (4)

Cypher is like SQL a declarative, textual query language, but for graphs.

It consists of clauses, keywords and expressions like predicates and functions, many of which will be familiar (like WHERE, ORDER BY, SKIP LIMIT, AND, p.unitPrice > 10).

  • Unlike SQL, Cypher is all about expressing graph patterns.

  • Grouping in Cypher is focusing on Virtualization aspect of the data to give you the big-picture. But, it's not useful in the processing aspect. In terms of heavy data, it will not be very efficient like Relational Tables but on the other hand, the data will be virtualized.

  • Grouping-with-high-level, cypher will be not recommended for it.


Other possibilities

Example: Dataframe/pandas

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas dataframe.groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.(5)

Syntax

Syntax: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)
Parameters :by: mapping, function, str, or iterableaxis: int, default 0level: If the axis is a MultiIndex (hierarchical), group by a particular level or levelsas_index: For aggregated output, return an object with group labels as the index. Only relevant for DataFrame input. as_index=False is effective “SQL-style” grouped outputsort: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.group_keys: When calling apply, add group keys to an index to identify piecessqueeze: Reduce the dimensionality of the return type if possible, otherwise return a consistent typeReturns: GroupBy object
  • If we'll compare between pandas and other methods that we mentioned above in terms of data analytics, Python pandas will definitely be the green card.

    • Scalability of pandas is HUGE!.

    • Light-weight compare to any functional programming.

    • It fits perfectly will largely amount of data.


Conclusion

At the end, each one of these methods works depending on three things that I mentioned before:

  1. What kind of data are you using.

  2. What type of process you want to do.

  3. What is your data scalability.


Reference

References has been attached into each section to be reachable.


You probably already have the answer...

I mean this grouping is also what I can think of.

I haven't used graph databases, but in the other two, Relational and NoSql, SQL, or relational languages, are used to query the multiple types of relationships as the name suggests which is its distinctive feature and they also have a fixed schema.

In the case of Document-based or NoSQL, the distinctive feature is that the schema is very flexible, also usually related data is stored inside the same doc.

Graph, I have little idea about them. But as far as I know, they are just NoSQL with the ability to query relationships. Combining the distinctive features of RBDMS and non-RBDMS (NoSQL).

Dataframes are usually used for fast operations as required in data processing. They are in-memory datastore. They don't have the ability to fetch relations on their own. We have to perform operations on them from scratch.