0

Selecting colums that are not in GROUP BY

Posted May 14th, 2007 in Databases by Olexandr Melnyk

After reading one more post on SitePoint forums about selecting not aggregated values of columns, which were not specified in the GROUP BY clause, I decided it is worth to highlight in my blog what is wrong with queries that make use of this approach.

At the start, I’d like to mention such queries are allowed only in MySQL, which is known for its unstrict behaviour. Other database servers with produce errors in such case.

One of the most common reasoning for such queries, for example:

select user_id
     , post_time
  from posts
group
    by user_id

is the wrong assumption that query will return the most recent entry for ungrouped columns (latest post time for each user in this case).

As MySQL manual says, returned value for ungrouped column can be any value from the group it belongs to. So the query above, is free to return time of user’s any post.

A solution for such cases is to use an aggregate function:

select user_id
     , max(post_time)
  from posts
group
    by user_id

This will guarant that returned post time for each group is the latest within the group.

Although, MySQL manual states that in case you are sure that values within a group are constant, using columns in SELECT (or HAVING), but not in GROUP BY can give performance increase by avoiding unnecessary grouping, I highly suggest to keep away from this technique as it is not standard (supported by MySQL only), and it makes behaviour of queries unclear, increasing possibility of queries working not as expected.

Share and Enjoy:
  • Facebook
  • VKontakte
  • MySpace
  • LinkedIn
  • Twitter
  • Identi.ca
  • Tumblr
  • Digg
  • Reddit
  • del.icio.us
  • StumbleUpon
  • Google Bookmarks
  • Google Buzz
  • Live
  • Technorati
  • HackerNews
  • Slashdot
  • Hyves
  • Tuenti
  • PDF
blog comments powered by Disqus