Parse a csv file using Regex in java with '|' as seperator -
can help me regex ? got java program, reads in .csv files load database.
currently ist uses pattern csvpattern = pattern.compile("\\s*(\"[^\"]*\"|[^|]*)\\s*,?");
but wenn matcher = csvpattern.matcher(line);
read files line line null-values. files have next format (many morre lines, comma in it, '|' seperator , @ end of each line):
abstact of first file:
0|algeria|0| haggle. final deposits observe slyly agai| 1|argentina|1|al foxes promise slyly according regular accounts. bold requests alon| 2|brazil|1|y alongside of pending deposits. special packages ironic forges. slyly special |
second:
|customer#000000001|ivhziaperb ot,c,e|15|25-989-741-2988|711.56|building|to even, regular platelets. regular, ironic epitaphs nag e| 2|customer#000000002|xstf4,ncwdvawne6tegvwfmrchlxak|13|23-768-687-3665|121.65|automobile|l accounts. blithely ironic theodolites integrate boldly: caref| 3|customer#000000003|mg9kdtd2wbhm|1|11-719-748-3364|7498.12|automobile| deposits eat slyly ironic, instructions. express foxes observe slyly. blithely accounts abov| 4|customer#000000004|xxvsjslagtn|4|14-128-190-5944|2866.83|machinery| requests. final, regular ideas sleep final accou|
third:
5|supplier#000000005|gcdm2rjrzl5qltvzc|11|21-151-690-3663|-283.84|. slyly regular pinto bea| 6|supplier#000000006|tqxuvm7s7cnk|14|24-696-997-4969|1365.79|final accounts. regular dolphins utilize against furiously ironic decoys. | 7|supplier#000000007|s,4ticngb4uo6pasqnbuq|23|33-990-965-2201|6820.35|s unwind silently furiously regular courts. final requests deposits. requests wake quietly blit| 8|supplier#000000008|9sq4bbh2fqemafoocy45srtxo6yuog|17|27-498-742-3860|7627.85|al pinto beans. asymptotes haggl| 9|supplier#000000009|1khugzegwm3ua7dsymekybsk|10|20-403-398-8662|5302.37|s. unusual, requests along furiously regular pac|
fourth:
1|2|3325|771.64|, theodolites. regular, final theodolites eat after pending foxes. furiously regular deposits sleep slyly. bold realms above ironic dependencies haggle careful| 1|2502|8076|993.49|ven ideas. packages print. pending multipliers must have fluff| 1|5002|3956|337.09|after fluffily ironic deposits? blithely special dependencies integrate furiously excuses. blithely silent theodolites have haggle pending, express requests; fu| 1|7502|4069|357.84|al, regular dependencies serve after final pinto beans. furiously deposits sleep final, silent pinto beans. fluffily reg|
fifth:
1|155190|7706|1|17|21168.23|0.04|0.02|n|o|1996-03-13|1996-02-12|1996-03-22|deliver in person|truck|egular courts above the| 1|67310|7311|2|36|45983.16|0.09|0.06|n|o|1996-04-12|1996-02-28|1996-04-20|take return|mail|ly final dependencies: slyly bold |
sixth:
134823|saddle midnight thistle honeydew lime|manufacturer#4|brand#43|standard burnished brass|44|wrap can|1857.82|ges. furiously ir| 134824|coral reddish indian thistle sandy|manufacturer#5|brand#55|promo burnished copper|29|lg jar|1858.82|final p| 134825|saddle violet orchid cornsilk medium|manufacturer#4|brand#44|promo polished nickel|21|lg case|1859.82|nal accounts us| 134826|turquoise sky lime cornsilk peach|manufacturer#1|brand#11|small burnished tin|25|sm can|1860.82| haggle|
seventh:
0|africa|lar deposits. blithely final packages cajole. regular waters final requests. regular accounts according | 1|america|hs utilize ironic, requests. s|
eighth:
4|136777|o|32151.78|1995-10-11|5-low|clerk#000000124|0|sits. slyly regular warthogs cajole. regular, regular theodolites acro| 5|44485|f|144659.20|1994-07-30|5-low|clerk#000000925|0|quickly. bold deposits sleep slyly. packages utilize slyly|
(the csv created using dbgen-tool tpc fpr tpc-h, in case wonder)
i hope understand need , can help me out on this. give thanks much!
edit: using string.split("|");' sure seems obvious, thing is, programm i'm working quite complex , relies on regex.pattern , regex.matcher @ various parts. since i'm not familiar programme , java itself, solution me utilize given code , replace regular look 1 works me.
edit2: thing i'm trying utilize tpc-h implementation oltp-bench: https://github.com/ben-reilly/oltpbench/blob/master/src/com/oltpbenchmark/benchmarks/tpch/tpchloader.java#l347
where problematic line 347. it's total implemetation of tpc-h database benchmark, without info generator. utilize dbgen tool provided tpc generate csv files. can't in contact developer sadly.
given source, replace comma pipe, since comments, pattern split string on delimiter (except ones in double quotes)
eg: from
\\s*(\"[^\"]*\"|[^,]*)\\s*,?
to
\\s*(\"[^\"]*\"|[^|]*)\\s*\\|?
as number exception, need debug way you're calling csv loader.
i've never used tool before, if @ line 352
for (int = 0; < types.length; ++i) {
now @ switch block starts @ line 362: defines types each field should casted to.
switch(types[i]) { case double: prepstmt.setdouble(i+1, double.parsedouble(field)); break; ...
this type of conversion going cause issues if don't specify types.
java regex parsing csv
No comments:
Post a Comment