hadoop - Hive output to s3 with comma separated values and a .csv or .txt file format .An alternative like using sqoop to export from hive to s3 will also work -
i ve been trying work hive output s3 . have been successful @ resultant output not comma separated there delimiter such ^a suppose. had worked on using sqoop import , export info s3 psql haven't been able on hive, if solution work.
what have tried doing
set hive.io.output.fileformat=csvtextfile; insert overwrite directory "s3n://akshayhazari/results" select * books; this working:
total jobs = 3 launching job 1 out of 3 number of cut down tasks set 0 since there's no cut down operator starting job = job_1403776308919_0011, tracking url = http://localhost:8088/proxy/application_1403776308919_0011/ kill command = /usr/local/hadoop/bin/hadoop job -kill job_1403776308919_0011 hadoop job info stage-1: number of mappers: 1; number of reducers: 0 2014-06-26 16:51:07,188 stage-1 map = 0%, cut down = 0% 2014-06-26 16:51:29,868 stage-1 map = 100%, cut down = 0%, cumulative cpu 2.95 sec mapreduce total cumulative cpu time: 2 seconds 950 msec ended job = job_1403776308919_0011 stage-3 selected status resolver. stage-2 filtered out status resolver. stage-4 filtered out status resolver. moving info to: s3n://akshayhazari/tmp/hive-hduser/hive_2014-06-26_16-50-41_646_3052840892739735120-1/-ext-10000 moving info to: s3n://akshayhazari/results mapreduce jobs launched: job 0: map: 1 cumulative cpu: 2.95 sec hdfs read: 188 hdfs write: 0 success total mapreduce cpu time spent: 2 seconds 950 msec ok time taken: 55.726 seconds where file such 000000_0 unreadable after downloading , converting txt gives me ^a delimiter file. want output csv or txt file straight , comma or tab separated values . if able utilize insert overwrite directory syntax produce above locally , of great help able work on s3.
adding detail original question (this added detail question still remains same): figured have produce gzipped output on s3. minimize on s3 usage. hive puts temp files on s3. optimize usage did this.
hive> set hive.exec.compress.output=true; hive> set io.seqfile.compression.type=block; hive> set mapred.output.compression.codec = org.apache.hadoop.io.compress.gzipcodec; hive> insert overwrite directory "books" select * books; this output in hdfs:
hduser@akshay:~$ hadoop fs -ls books found 1 items -rw-r--r-- 1 hduser supergroup 161 2014-06-27 11:45 books/000000_0.gz then utilize add together stuff s3:
hadoop fs -cp books/000000_0.gz s3n://akshayhazari/results the output not text or csv , ureadable. delimiters unreadable. there work around in hive or have create script prepare file , delimiters.
any help appreciated
depending on version of hive you're using, may able do:
insert overwrite directory 's3n://akshayhazari/results' row format delimited fields terminated ',' select * books; i think added in hive 0.11 or so.
edit: turns out above local directories.
you can do:
create external table tmp_table(cols...) location 's3n://akshayhazari/results' row format delimited fields terminated ','; insert tmp_table select * books; drop table tmp_table; to pretty much same thing without specifying columns, like:
create table tmp_table(cols...) location 's3n://akshayhazari/results' row format delimited fields terminated ',' select * books; alter table tmp_table set tblproperties('external'='true'); drop table tmp_table; create-table-as-select has restriction cannot create external table, think should able mark external after fact, , drop it.
csv hadoop amazon-s3 hive
No comments:
Post a Comment