Breeding: hadoop - Hive output to s3 with comma separated values and a .csv or .txt file format .An alternative like using sqoop to export from hive to s3 will also work -

Monday, 15 June 2015

hadoop - Hive output to s3 with comma separated values and a .csv or .txt file format .An alternative like using sqoop to export from hive to s3 will also work -

i ve been trying work hive output s3 . have been successful @ resultant output not comma separated there delimiter such ^a suppose. had worked on using sqoop import , export info s3 psql haven't been able on hive, if solution work.

what have tried doing

set hive.io.output.fileformat=csvtextfile; insert overwrite directory "s3n://akshayhazari/results" select * books;

this working:

total jobs = 3 launching job 1 out of 3 number of  cut down tasks set 0 since there's no  cut down operator starting job = job_1403776308919_0011, tracking url = http://localhost:8088/proxy/application_1403776308919_0011/ kill command = /usr/local/hadoop/bin/hadoop job  -kill job_1403776308919_0011 hadoop job  info stage-1: number of mappers: 1; number of reducers: 0 2014-06-26 16:51:07,188 stage-1 map = 0%,   cut down = 0% 2014-06-26 16:51:29,868 stage-1 map = 100%,   cut down = 0%, cumulative cpu 2.95 sec mapreduce total cumulative cpu time: 2 seconds 950 msec ended job = job_1403776308919_0011 stage-3 selected  status resolver. stage-2 filtered out  status resolver. stage-4 filtered out  status resolver. moving   info to: s3n://akshayhazari/tmp/hive-hduser/hive_2014-06-26_16-50-41_646_3052840892739735120-1/-ext-10000 moving   info to: s3n://akshayhazari/results mapreduce jobs launched:  job 0: map: 1   cumulative cpu: 2.95 sec   hdfs read: 188 hdfs write: 0 success total mapreduce cpu time spent: 2 seconds 950 msec ok time taken: 55.726 seconds

where file such 000000_0 unreadable after downloading , converting txt gives me ^a delimiter file. want output csv or txt file straight , comma or tab separated values . if able utilize insert overwrite directory syntax produce above locally , of great help able work on s3.

adding detail original question (this added detail question still remains same): figured have produce gzipped output on s3. minimize on s3 usage. hive puts temp files on s3. optimize usage did this.

hive> set hive.exec.compress.output=true;  hive> set io.seqfile.compression.type=block; hive> set mapred.output.compression.codec = org.apache.hadoop.io.compress.gzipcodec; hive> insert overwrite directory "books" select * books;

this output in hdfs:

hduser@akshay:~$ hadoop fs -ls books found 1 items -rw-r--r--   1 hduser supergroup        161 2014-06-27 11:45 books/000000_0.gz

then utilize add together stuff s3:

hadoop fs -cp books/000000_0.gz s3n://akshayhazari/results

the output not text or csv , ureadable. delimiters unreadable. there work around in hive or have create script prepare file , delimiters.

any help appreciated

depending on version of hive you're using, may able do:

insert overwrite directory 's3n://akshayhazari/results' row format delimited fields terminated ',' select * books;

i think added in hive 0.11 or so.

edit: turns out above local directories.

you can do:

create external table tmp_table(cols...) location 's3n://akshayhazari/results' row format delimited fields terminated ',';  insert tmp_table select * books;  drop table tmp_table;

to pretty much same thing without specifying columns, like:

create table tmp_table(cols...) location 's3n://akshayhazari/results' row format delimited fields terminated ',' select * books;  alter table tmp_table set tblproperties('external'='true');  drop table tmp_table;

create-table-as-select has restriction cannot create external table, think should able mark external after fact, , drop it.

csv hadoop amazon-s3 hive

Breeding

Monday, 15 June 2015

hadoop - Hive output to s3 with comma separated values and a .csv or .txt file format .An alternative like using sqoop to export from hive to s3 will also work -

No comments:

Post a Comment