Monday, 15 September 2014

normalization - Normalizing data in Redshift -



normalization - Normalizing data in Redshift -

i've started using redshift housing millions of info points schema looks following:

create table metrics ( name varchar(100), value decimal(18,4), time timestamp ) sortkey (name, timestamp);

(the real schema bit more complex, satisfy question)

i'm wondering if makes sense normalize metric name (currently varchar(100)) mapping integer , storing integer. (e.g. {id: 1, name: metric1}). cardinality name ~100. adding mapping, create application logic quite bit more complex since has many streams of input. also, querying ahead of time require reverse mapping.

in traditional sql database, obvious yes, i'm not how redshift handles it's columnar info store. think nice have in general, assume redshift would/could similar mapping underneath hood since columns in table have lower cardinality others.

the reply no. redshift makes first-class utilize of compression , store few duplicates of name field.

however need ensure making utilize of redshift's compression options. section in docs should tell need know: http://docs.aws.amazon.com/redshift/latest/dg/t_compressing_data_on_disk.html

tl;dr: run analyze compression on table see compression redshift recommends, create new table using encodings, , insert info table.

normalization amazon-redshift

No comments:

Post a Comment