Why is selecting from stored procedure not supported in relational databases?
TL;DR: you can select from (table-valued) functions, or from any sort of function in PostgreSQL. But not from stored procedures.
Here's an "intuitive", somewhat database-agnostic explanation, for I believe that SQL and its many dialects is too much of an organically grown language / concept for there to be a fundamental, "scientific" explanation for this.
Procedures vs. Functions, historically
I don't really see the point of selecting from stored procedures, but I'm biased by years of experience and accepting the status quo, and I certainly see how the distinction between procedures and functions can be confusing and how one would wish them to be more versatile and powerful. Specifically in SQL Server, Sybase or MySQL, procedures can return an arbitrary number of result sets / update counts, although this is not the same as a function that returns a well-defined type.
Think of procedures as imperative routines (with side effects) and of functions as pure routines without side-effects. A SELECT
statement itself is also "pure" without side-effects (apart from potential locking effects), so it makes sense to think of functions as the only types of routines that can be used in a SELECT
statement.
In fact, think of functions as being routines with strong constraints on behaviour, whereas procedures are allowed to execute arbitrary programs.
4GL vs. 3GL languages
Another way to look at this is from the perspective of SQL being a 4th generation programming language (4GL). A 4GL can only work reasonably if it is restricted heavily in what it can do. Common Table Expressions made SQL turing-complete, yes, but the declarative nature of SQL still prevents its being a general-purpose language from a practical, every day perspective.
Stored procedures are a way to circumvent this limitation. Sometimes, you want to be turing complete and practical. So, stored procedures resort to being imperative, having side-effects, being transactional, etc.
Stored functions are a clever way to introduce some 3GL / procedural language features into the purer 4GL world at the price of forbidding side-effects inside of them (unless you want to open pandora's box and have completely unpredictable SELECT
statements).
The fact that some databases allow for their stored procedures to return arbitrary numbers of result sets / cursors is a trait of their allowing arbitrary behaviour, including side-effects. In principle, nothing I said would prevent this particular behaviour also in stored functions, but it would be very unpractical and hard to manage if they were allowed to do so within the context of SQL, the 4GL language.
Thus:
- Procedures can call procedures, any function and SQL
- "Pure" functions can call "pure" functions and SQL
- SQL can call "pure" functions and SQL
But:
- "Pure" functions calling procedures become "impure" functions (like procedures)
And:
- SQL cannot call procedures
- SQL cannot call "impure" functions
Examples of "pure" table-valued functions:
Here are some examples of using table-valued, "pure" functions:
Oracle
CREATE TYPE numbers AS TABLE OF number(10);/CREATE OR REPLACE FUNCTION my_function (a number, b number)RETURN numbersISBEGIN return numbers(a, b);END my_function;/
And then:
SELECT * FROM TABLE (my_function(1, 2))
SQL Server
CREATE FUNCTION my_function(@v1 INTEGER, @v2 INTEGER)RETURNS @out_table TABLE ( column_value INTEGER)ASBEGIN INSERT @out_table VALUES (@v1), (@v2) RETURNEND
And then
SELECT * FROM my_function(1, 2)
PostgreSQL
Let me have a word on PostgreSQL.
PostgreSQL is awesome and thus an exception. It is also weird and probably 50% of its features shouldn't be used in production. It only supports "functions", not "procedures", but those functions can act as anything. Check out the following:
CREATE OR REPLACE FUNCTION wow ()RETURNS SETOF INTAS $$BEGIN CREATE TABLE boom (i INT); RETURN QUERY INSERT INTO boom VALUES (1) RETURNING *;END;$$ LANGUAGE plpgsql;
Side-effects:
- A table is created
- A record is inserted
Yet:
SELECT * FROM wow();
Yields
wow---1
Speaking only for Microsoft SQL Server: Stored Procedures and Functions (at least scalar UDFs and Multi-statement TVFs) are different constructs.
Stored Procedures are pre-parsed query batches. You execute ad hoc queries batches or query batches saved in Stored Procedures. For example, from .NET there are two different Command Types:
Text
andStoredProcedure
. You cannot just execute a Function.Functions, even TVFs (which, if they are Inline-TVFs, are essentially Views that can take parameters) are not independently runnable pieces of code. They are not queries in themselves and hence need to be called within a query.
Also, unlike Stored Procedures, Functions can be optimized, meaning that they can be moved around the execution plan. The timing and frequency of their execution is not guaranteed to be how you specified in the query (such as, per row vs once and the result cached). In fact, this sometimes causes problems when non-deterministic results are desired but only a single value is returned for all rows. This is probably the main reason (of maybe a few) that Functions do not allow for changing state of the database and some other handy things: because you have no control over whether or not those things would actually happen or in what order, or how many times. Stored Procedures, on the other hand, are the execution plan.
That being said, for what it's worth, it is possible to select from a Stored Procedure without using OPENQUERY
/ OPENROWSET
, but it requires SQLCLR. In fact, most of the restrictions placed on T-SQL Functions can be overcome in SQLCLR code (such as "no Dynamic SQL"). However, this does not make SQLCLR functions immune from the Query Optimizer changing the timing and frequency of the execution from what you want / expect.
I don't think your question is really about stored procedures. I think it is about the limitations of table valued functions, presumably from a SQL Server perspective:
- You cannot use dynamic SQL.
- You cannot modify tables or the database.
- You have to specify the output columns and types.
- Gosh, you can't even use
rand()
andnewid()
(directly)
(Oracle's restrictions are slightly different.)
The simplest answer is that databases are both a powerful querying language and an environment that supports ACID properties of transactional databases. The ACID properties require a consistent view, so if you could modify existing tables, what would happen when you do this:
select t.*, (select count(*) from functionThatModifiesT()) -- f() modifies "t"from t;
Which t
is used in the from
? Actually, SQL Server sort of has answer to this question, but you get the same issue with multiple references in the same clause. In a sense, user defined functions are limited in the same way that this is not accepted:
select a = 1, a + 1
Defining the semantics is very, very tricky and not worth the effort because there are other powerful features that are more important.
In my opinion, though, the final straw in SQL Server is the ability for stored procedures to "return" multiple result sets. That simply has no meaning in the world of tables.
EDIT:
Postgres's use of create function
is very powerful. It does allow the function to modify the underlying database, which brings up interesting transactional issues. However, you still do have to define the columns and their types.