Breeding: sql - Efficient way of getting group ID without sorting -

Thursday, 15 May 2014

sql - Efficient way of getting group ID without sorting -

imagine have denormalized table so:

create table persons (     id           int identity primary key,     firstname    nvarchar(100),     countryname  nvarchar(100) )  insert persons values ('mark',    'germany'),        ('chris',   'france'),        ('grace',   'italy'),        ('antonio', 'italy'),        ('francis', 'france'),        ('amanda',  'italy');

i need build query returns name of each person, , unique id country. ids not have contiguous; more importantly, not have in order. efficient way of achieving this?

the simplest solution appears dense_rank:

select firstname,         countryname,         dense_rank() on (order countryname) countryid persons  -- firstname  countryname  countryid -- chris       french republic       1 -- francis     french republic       1 -- mark        federal republic of germany      2 -- amanda       italy        3 -- grace        italy        3 -- antonio      italy        3

however, incurs sort on countryname column, wasteful performance hog. came alternative, uses row_number well-known trick suppressing sort:

select p.firstname,         p.countryname,        c.countryid persons p      bring together (         select countryname,                 row_number() on (order (select 1)) countryid         persons          grouping countryname     ) c     on c.countryname = p.countryname  -- firstname  countryname  countryid -- mark        federal republic of germany      2 -- chris       french republic       1 -- grace        italy        3 -- antonio      italy        3 -- francis     french republic       1 -- amanda       italy        3

am right in assuming sec query perform improve in general (not on contrived info set)? there factors might create difference either way (such index on countryname)? there more elegant way of expressing it?

why think aggregation cheaper window function? ask, because have experience both, , don't have strong sentiment on matter. if pressed, guess window function faster, because not have aggregate info , bring together result in.

the 2 queries have different execution paths. right way see performs improve seek out. run both queries on big plenty samples of info in environment.

by way, don't think there right answer, because performance depends on several factors:

which columns indexed? how big data? fit in memory? how many different countries there?

if concerned performance, , want unique number, consider using checksum() instead. run risk of collisions. risk very, little 200 or countries. plus can test , if occur. query be:

select firstname, countryname, checksum(countryname) countryid persons;

sql sql-server tsql row-number dense-rank

Breeding

Thursday, 15 May 2014

sql - Efficient way of getting group ID without sorting -

No comments:

Post a Comment