apache pig - Bulk loading json to HBase using pig -
hi looking way load big number of json documents 1 per line
each line of format:
'{id :"id123", "c1":"v1", "c2":"v2", "c3" :"v3"...}' each json document can have unknown number of fields. there way in pig? want load fields separate columns on hbase.
you want utilize udf. instance python , provided:
udf:
from com.xhaus.jyson import jysoncodec json com.xhaus.jyson import jsondecodeerror @outputschema( "rels:{t:('{id:chararray,c1:chararray, c2:chararray, c3:chararray...}')}" ) def parse_json(line): try: parsed_json = json.loads(line) except: homecoming none homecoming tuple(parsed_json.values()) pig:
register 'path-to-udf.py' using jython py_udf ; raw_data = load 'path-to-your-data' using pigstorage('\n') (line:chararray) ; -- parse lines using udf parsed_data = foreach cleanrawlogs generate flatten(py_f.parse_json(line)) ; hbase apache-pig hbasestorage
No comments:
Post a Comment