`
oxidy
  • 浏览: 162690 次
  • 性别: Icon_minigender_1
  • 来自: 吉林长春
社区版块
存档分类
最新评论

在数据库中如何分类、分组并总计SQL数据[builder.com.cn]

SQL 
阅读更多

您需要了解如何使用某些SQL子句和运算符来安排SQL数据,从而对它进行高效分析。下面这些建议告诉您如何建立语句,获得您希望的结果。

以有意义的方式安排数据可能是一种挑战。有时您只需进行简单分类。通常您必须进行更多处理——进行分组以利于分析与总计。可喜的是,SQL提供了大 量用于分类、分组和总计的子句及运算符。下面的建议将有助于您了解何时进行分类、何时分组、何时及如何进行总计。欲了解每个子句和运算符的详细信息,请查 看在线书籍

#1 :分类排序

通常,我们确实需要对所有数据进行排序。SQL的ORDER BY子句将数据按字母或数字顺序进行排列。因此,同类数据明显分类到各个组中。然而,这些组只是分类的结果,它们并不是真正的组。ORDER BY显示每一个记录,而一个组可能代表多个记录。

#2 :减少组中的相似数据

分类与分组的最大不同在于:分类数据显示(任何限定标准内的)所有记录,而分组数据不显示这些记录。GROUP BY子句减少一个记录中的相似数据。例如,GROUP BY能够从重复那些值的源文件中返回一个唯一的邮政编码列表:

SELECT ZIP

FROM Customers

GROUP BY ZIP

仅包括那些在GROUP BY和SELECT列列表中字义组的列。换句话说,SELECT列表必须与GROUP列表相匹配。只有一种情况例外:SELECT列表能够包含聚合函数。(而GROUP BY不支持聚合函数。)

记住,GROUP BY不会对作为结果产生的组分类。要对组按字母或数字顺序排序,增加一个ORDER BY子句(#1)。另外,在GROUP BY子句中您不能引用一个有别名的域。组列必须在根本数据中,但它们不必出现在结果中。

#3 :分组前限定数据

您可以增加一个WHERE子句限定由GROUP BY分组的数据。例如,下面的语句仅返回肯塔基地区顾客的邮政编码列表。

SELECT ZIP

FROM Customers

WHERE State = 'KY'

GROUP BY ZIP

在GROUP BY子句求数据的值之前,WHERE对数据进行过滤,记住这一点很重要。

和GROUP BY一样,WHERE不支持聚合函数。

#4 :返回所有组

当您用WHERE过滤数据时,得到的组只显示那些您指定的记录。符合组定义但不满足子句条件的数据将不会出现在组中。不管WHERE条件如何,如果 您想包括所有数据,增加一个ALL子句。例如,在前面的语句中增加一个ALL子句会返回所有邮政编码组,而不仅仅是肯塔基地区的组。

SELECT ZIP

FROM Customers

WHERE State = 'KY'

GROUP BY ALL ZIP

照这个样子,这两个子句会造成冲突,您可能不会以这种方式使用ALL子句。当您用聚合求一个列的值时,应用ALL子句很方便。例如,下面的语句计算每个肯塔基邮政编码的顾客数目,同时显示其它邮政编码值。

SELECT ZIP, Count(ZIP) AS KYCustomersByZIP

FROM Customers

WHERE State = 'KY'

GROUP BY ALL ZIP

得到的组由根本数据中的所有邮政编码值构成。但是,聚合列(KYCustomerByZIP)显示为0,因为除肯塔基邮政编码组外没有别的组。

远程查询不支持GROUP BY ALL。

#5 :分组后限定数据

WHERE子句(#3)在 GROUP BY子句之前求数据的值。当您希望在分组以后限定数据时,使用HAVING。通常,不管您使用WHERE还是HAVING,得到的结果相同。但要记住,这 两个子句不能互换,这点很重要。如果您存在疑问,这里有一条应用指南:过滤记录时使用WHERE;过滤组时使用HAVING。

一般,您会用HAVING,利用聚合来求一个组的值。例如,下面的语句返回一个邮政编码列表,但这个表内可能不包含根本数据源中的每个邮政编码:

SELECT ZIP, Count(ZIP) AS CustomersByZIP

FROM Customers

GROUP BY ZIP

HAVING Count(ZIP) = 1

仅仅那些只有一名顾客的组出现在结果中。

#6 :详细了解WHERE 和HAVING

如果您仍然对WHERE和HAVING的用法感到迷惑,应用下面的指导方法:

  • WHERE出现在GROUP BY之前;SQL在它分组记录前求WHERE子句的值。
  • HAVING出现在GROUP BY之后;SQL在它分组记录后求HAVING子句的值。

#7 :用聚合总计分组值

分组数据有助于对数据进行分析,但有时您还需要组本身以外的其它信息。您可以增加一个聚合函数来总计分组数据。例如,下面的语句为每次排序显示一个小计:

SELECT OrderID, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY OrderID

与其它的组一样,SELECT和GROUP BY列表必须相匹配。在SELECT子句中包含一个聚合是这一规则的唯一例外。

#8 :总计聚合

您可以通过显示每个组的小计进一步总计数据。SQL的ROLLUP运算符为每个组显示一个额外的记录,一个小计。那个记录是用聚合函数在每个组中求所有记录的值的结果。下面的语句为每个组合计OrderTotal列。

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY Customer, OrderNumber

WITH ROLLUP

一个包含20和25这两个OrderTotal值的组的ROLLUP行将显示OrderTotal值45。ROLLUP结果的第一个值是唯一的,因为它求所有组记录的值。那个值是整个记录集的总和。

ROLLUP不支持聚合函数中的DISTINCT或GROUP BY ALL子句。

#9 :总计每一列

CUBE运算符比ROLLUP更进一步,它返回每个组中每个值的总数。得到的结果与ROLLUP相似,但CUBE包括组中每一列的一个额外记录。下面的语句显示每个组的小计和每名顾客的一个额外总数。

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY Customer, OrderNumber

WITH CUBE

用CUBE得到的总计最为复杂。不仅完成聚合与ROLLUP的工作,而且还求定义组的其它列的值。也就是说,CUBE总计每一个可能的列组合。

CUBE不支持GROUP BY ALL。

#10 :给总计排序

当CUBE的结果杂乱无章时(一般都是这样),可以增加一个GROUPING函数,如下所示:

SELECT GROUPING(Customer), OrderNumber, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY Customer, OrderNumber

WITH CUBE

其结果包括每一行的两个额外的值。

  • 值1表明左边的值是一个总计值——ROLLUP或CUBE的运算符的结果。
  • 值0表明左边的值是一个原始GROUP BY子句产生的详细记录。

英文原版

10 tips for sorting, grouping, and summarizing SQL data

Takeaway: Arranging SQL data so that you can effectively analyze it requires an understanding of how to use certain SQL clauses and operators. These tips will help you figure out how to build statements that will give you the results you want.

This article is also available as a PDF download .

Arranging data in a manner that's meaningful can be a challenge. Sometimes all you need is a simple sort. Often, you need more--you need groups you can analyze and summarize. Fortunately, SQL offers a number of clauses and operators for sorting, grouping, and summarizing. The following tips will help you discern when to sort, when to group, and when and how to summarize. For detailed information on each clause and operator, see Books Online .

#1: Bring order with a sort

More often than not, all your data really needs is a little order. SQL's ORDER BY clause organizes data in alphabetic or numeric order. Consequently, similar values sort together in what appear to be groups. However, the apparent groups are a result of the sort; they aren't true groups. ORDER BY displays each record whereas a group may represent multiple records.

#2: Reduce similar values into a group

The biggest difference between sorting and grouping is this: Sorted data displays all the records (within the confines of any limiting criteria) and grouped data doesn't. The GROUP BY clause reduces similar values into one record. For instance, a GROUP BY clause can return a unique list of ZIP codes from a source that repeats those values:

SELECT ZIP

FROM Customers

GROUP BY ZIP

Include only those columns that define the group in both the GROUP BY and SELECT column lists. In other words, the SELECT list must match the GROUP BY list, with one exception: The SELECT list can include aggregate functions. (GROUP BY doesn't allow aggregate functions.)

Keep in mind that GROUP BY won't sort the resulting groups. To arrange groups alphabetically or numerically, add an ORDER BY clause (# 1). In addition, you can't refer to an aliased field in the GROUP BY clause. Group columns must be in the underlying data, but they don't have to appear in the results.

#3: Limit data before it's grouped

You can limit the data that GROUP BY groups by adding a WHERE clause. For instance, the following statement returns a unique list of ZIP codes for just Kentucky customers:

SELECT ZIP

FROM Customers

WHERE State = 'KY'

GROUP BY ZIP

It's important to remember that WHERE filters data before the GROUP BY clause evaluates it.

Like GROUP BY, WHERE doesn't support aggregate functions.

#4: Return all groups

When you use WHERE to filter data, the resulting groups display only those records you specify. Data that fits the group's definition but does not meet the clause's conditions won't make it to a group. Include ALL when you want to include all data, regardless of the WHERE condition. For instance, adding ALL to the previous statement returns all of the ZIP groups, not just those in Kentucky:

SELECT ZIP

FROM Customers

WHERE State = 'KY'

GROUP BY ALL ZIP

As is, the two clauses are in conflict, and you probably wouldn't use ALL in this way. ALL comes in handy when you use an aggregate to evaluate a column. For example, the following statement counts the number of customers in each Kentucky ZIP, while also displaying other ZIP values:

SELECT ZIP, Count(ZIP) AS KYCustomersByZIP

FROM Customers

WHERE State = 'KY'

GROUP BY ALL ZIP

The resulting groups comprise all ZIP values in the underlying data. However, the aggregate column (KYCustomersByZIP) would display 0 for any group other than a Kentucky ZIP.

Remote queries don't support GROUP BY ALL.

#5: Limit data after it's grouped

The WHERE clause (# 3) evaluates data before the GROUP BY clause does. When you want to limit data after it's grouped, use HAVING. Often, the result will be the same whether you use WHERE or HAVING, but it's important to remember that the clauses are not interchangeable. Here's a good guideline to follow when you're in doubt: Use WHERE to filter records; use HAVING to filter groups.

Usually, you'll use HAVING to evaluate a group using an aggregate. For instance, the following statement returns a unique list of ZIP codes, but the list might not include every ZIP code in the underlying data source:

SELECT ZIP, Count(ZIP) AS CustomersByZIP

FROM Customers

GROUP BY ZIP

HAVING Count(ZIP) = 1

Only those groups with just one customer make it to the results.

#6: Get a closer look at WHERE and HAVING

If you're still confused about when to use WHERE and when to use HAVING, apply the following guidelines:

  • WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
  • HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.

#7: Summarize grouped values with aggregates

Grouping data can help you analyze your data, but sometimes you'll need a bit more information than just the groups themselves. You can add an aggregate function to summarize grouped data. For instance, the following statement displays a subtotal for each order:

SELECT OrderID, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY OrderID

As with any other group, the SELECT and GROUP BY lists must match. Including an aggregate in the SELECT clause is the only exception to this rule.

#8: Summarize the aggregate

You can further summarize data by displaying a subtotal for each group. SQL's ROLLUP operator displays an extra record, a subtotal, for each group. That record is the result of evaluating all the records within each group using an aggregate function. The following statement totals the OrderTotal column for each group:

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY Customer, OrderNumber

WITH ROLLUP

The ROLLUP row for a group with two OrderTotal values of 20 and 25 would display an OrderTotal of 45. The first record in a ROLLUP result is unique because it evaluates all of the group records. That value is a grand total for the entire recordset.

ROLLUP doesn't support DISTINCT in aggregate functions or the GROUP BY ALL clause.

#9: Summarize each column

The CUBE operator goes a step further than ROLLUP by returning totals for each value in each group. The results are similar to ROLLUP, but CUBE includes an additional record for each column in the group. The following statement displays a subtotal for each group and an additional total for each customer:

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY Customer, OrderNumber

WITH CUBE

CUBE gives the most comprehensive summarization. It not only does the work of both the aggregate and ROLLUP, but also evaluates the other columns that define the group. In other words, CUBE summarizes every possible column combination.

CUBE doesn't support GROUP BY ALL.

#10: Bring order to summaries

When the results of a CUBE are confusing (and they usually are), add the GROUPING function as follows:

SELECT GROUPING(Customer), OrderNumber, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY Customer, OrderNumber

WITH CUBE

The results include two additional values for each row:

  • The value 1 indicates that the value to the left is a summary value--the result of the ROLLUP or CUBE operator.
  • The value 0 indicates that the value to the left is a detail record produced by the original GROUP BY clause.
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics