Wednesday, 15 May 2013

apache pig - Bulk loading json to HBase using pig -



apache pig - Bulk loading json to HBase using pig -

hi looking way load big number of json documents 1 per line

each line of format:

'{id :"id123", "c1":"v1", "c2":"v2", "c3" :"v3"...}'

each json document can have unknown number of fields. there way in pig? want load fields separate columns on hbase.

you want utilize udf. instance python , provided:

udf:

from com.xhaus.jyson import jysoncodec json com.xhaus.jyson import jsondecodeerror @outputschema( "rels:{t:('{id:chararray,c1:chararray, c2:chararray, c3:chararray...}')}" ) def parse_json(line): try: parsed_json = json.loads(line) except: homecoming none homecoming tuple(parsed_json.values())

pig:

register 'path-to-udf.py' using jython py_udf ; raw_data = load 'path-to-your-data' using pigstorage('\n') (line:chararray) ; -- parse lines using udf parsed_data = foreach cleanrawlogs generate flatten(py_f.parse_json(line)) ;

hbase apache-pig hbasestorage

No comments:

Post a Comment