2023-04-06
One of the most valuable features of Power BI is the ease with which I can mash up data from various data sources. This, of course, requires importing data into Power BI's internal xVelocity engine. For many others, however, it is critical to creating Power BI reports and dashboards that reflect the current state of live data as it exists in an underlying relational database. Let's look at how this is accomplished and what some of the trade-offs are.
DirectQuery
Live data can only be visualized in Power BI through what Microsoft calls "DirectQuery." There are two ways to use DirectQuery with Power BI. First, one directly connects to the relational database with the "DirectQuery" option. Unfortunately, as of May 2018, this can only be done with SQL Server, Oracle, or Teradata data.
The second option is to create a DirectQuery mode tabular Analysis Services database. If Power BI can directly connect to SQL Server, why would anyone go to the trouble of creating an intermediate SSAS database? That depends on whatever additional measures might need to be defined. By defining advanced measures in a tabular mode SSAS database, those measures become immediately available to any Power BI report that connects to it. Measures defined within Power BI are only accessible to reports in that instance.
How Does DirectQuery Work?
Whether Power BI has directly connected to SQL Server in DirectQuery mode or whether Power BI has connected to a DirectQuery SSAS database, DirectQuery works in essentially the same way. Dax queries generated by dragging and dropping data onto visualizations are converted into SQL queries for the underlying database. Since no data is saved in Power BI, the visualizations always reflect current data. (It is possible, for performance reasons, to cache query results in Power BI's RAM.)
Since the conversion of DAX into SQL is not perfect, some Power BI operations will not work in DirectQuery mode, and some may work incorrectly. Special attention must be paid to DAX's time intelligence functions; they will often not behave correctly in DirectQuery mode.
You can use the SQL Server Profiler to examine the SQL queries generated by DirectQuery. The following is the translation of a straightforward DAX query. Note the TOP statement; DirectQuery cannot return more than one million rows, and the SQL query is modified to reflect this fact. This is not a limitation on the number of rows queried and aggregated to yield the desired results; it is only a limitation on the number of rows returned.
SELECT
TOP (10000001) [c36],SUM([c155])
AS [a0]
FROM
(
SELECT [t7].[c36],[t7].[c155]
FROM
(
SELECT [t1].[c36] AS [c36],[t7].[SalesAmount] AS [c155]
FROM
(
((SELECT * FROM FactSales) AS [t7]
left outer join
(
SELECT [t5].[StoreKey] AS [c92],[t5].[GeographyKey] AS [c93]
FROM
(
( SELECT [dbo].[DimStore].* FROM [dbo].[DimStore] )
)
AS [t5]
)
AS [t5] on
(
[t7].[StoreKey] = [t5].[c92]
)
)
left outer join
(
SELECT [t1].[GeographyKey] AS [c31],[t1].[RegionCountryName] AS [c36]
FROM
(
( SELECT [dbo].[DimGeography].* FROM [dbo].[DimGeography] )
)
AS [t1]
)
AS [t1] on
(
[t5].[c93] = [t1].[c31]
)
)
)
AS [t7]
)
AS [t7]
GROUP BY [c36]
Creating a DirectQuery Mode SSAS Database
Suppose you will be reporting on live relational data in various ways in the long run. In that case, it is probably better to create an SSAS database specifically for this purpose. This is essentially the same, but there are a few twists. First, since DirectQuery mode does not import data, there is no data to look at in the design environment as you develop measures. This is unpleasant and confusing, but Microsoft has provided for the creation of sample data. This is done by assigning a second partition and marking it as a "Sample." You are not permitted more than a single partition in DirectQuery mode unless it is a sample partition. Data generated by the query defining the sample partition will be imported into the design environment. From there, things are the same as designing any other tabular-mode database.
Strictly speaking, it is not necessary to mark the workspace database as DirectQuery; you can convert it to DirectQuery when you deploy. This is not a good idea. However, If the workspace database is marked as DirectQuery, you will be warned if you try to do something that is not permitted. Without this warning, your mistakes may be much harder to correct later.
Another Advantage of DirectQuery
When you import data into Power BI, those data must have sets of permissions applied to them. For an administrator, security for the same data must be managed in multiple locations. However, since data is not imported in DirectQuery mode, permissions are applied by the existing definitions in the underlying relational database, reducing the admin's workload.
Conclusions
When mashing up data from different sources, I cannot use DirectQuery. Since DirectQuery generates an SQL Query, there is no provision for getting data from multiple relational sources. However, the ease of creating a DirectQuery SSAS database or the ease of creating a DirectQuery connection to Oracle, SQL Server, or Teradata makes DirectQuery a terrific tool for developers and report writers who need access live data in Power BI.
Have queries about improving your relational database skills? Check out our SQL Training Courses!
This piece was originally posted on May 15, 2018, and has been refreshed with updated styling.