Thursday, 15 May 2014

sql - Efficient way of getting group ID without sorting -



sql - Efficient way of getting group ID without sorting -

imagine have denormalized table so:

create table persons ( id int identity primary key, firstname nvarchar(100), countryname nvarchar(100) ) insert persons values ('mark', 'germany'), ('chris', 'france'), ('grace', 'italy'), ('antonio', 'italy'), ('francis', 'france'), ('amanda', 'italy');

i need build query returns name of each person, , unique id country. ids not have contiguous; more importantly, not have in order. efficient way of achieving this?

the simplest solution appears dense_rank:

select firstname, countryname, dense_rank() on (order countryname) countryid persons -- firstname countryname countryid -- chris french republic 1 -- francis french republic 1 -- mark federal republic of germany 2 -- amanda italy 3 -- grace italy 3 -- antonio italy 3

however, incurs sort on countryname column, wasteful performance hog. came alternative, uses row_number well-known trick suppressing sort:

select p.firstname, p.countryname, c.countryid persons p bring together ( select countryname, row_number() on (order (select 1)) countryid persons grouping countryname ) c on c.countryname = p.countryname -- firstname countryname countryid -- mark federal republic of germany 2 -- chris french republic 1 -- grace italy 3 -- antonio italy 3 -- francis french republic 1 -- amanda italy 3

am right in assuming sec query perform improve in general (not on contrived info set)? there factors might create difference either way (such index on countryname)? there more elegant way of expressing it?

why think aggregation cheaper window function? ask, because have experience both, , don't have strong sentiment on matter. if pressed, guess window function faster, because not have aggregate info , bring together result in.

the 2 queries have different execution paths. right way see performs improve seek out. run both queries on big plenty samples of info in environment.

by way, don't think there right answer, because performance depends on several factors:

which columns indexed? how big data? fit in memory? how many different countries there?

if concerned performance, , want unique number, consider using checksum() instead. run risk of collisions. risk very, little 200 or countries. plus can test , if occur. query be:

select firstname, countryname, checksum(countryname) countryid persons;

sql sql-server tsql row-number dense-rank

No comments:

Post a Comment