grep - Perl - How to handle huge files for searching similar words -
i'm working huge files. know, in opinion, best way handle huge files when know if word "x" in $file1, nowadays in sentence "y" in file2. files have more 20000 lines..
example:
this content of first file :
eat take breath alpha
this content of sec file :
eat,hungry love,lovers me,mine take,taken,give you,u,yo fun,funny
this content might expect 3rd file
eat : eat,hungry take : take,taken,give : you,u,yo
so can see, find matching look in sec file of word of first file.
my solutions - loop never ends -
solution1:
$file1= "words.txt"; $file2 = "expressions.txt"; $out = "out.txt"; open (w, "<", $file1); open (e, "<", $file2); open (out, ">", $out); while(defined($l = <w>)){ @a = split (/\n/, $l); force @w, @a; } while(defined($l2 = <e>)){ ($i = 0; $i < @w; $i++){ if (grep /\q\b$w[$i]\b\e/, $l2){ #or /\b$w[$i]\b/ print out "$w[$i] : $l2\n"; } } }
solution2:
$file1= "words.txt"; $file2 = "expressions.txt"; $out = "out.txt"; open (w, "<", $file1); open (e, "<", $file2); open (out, ">", $out); while(defined($l = <w>)){ @a = split (/\n/, $l); force @w, @a; while(defined($l2 = <e>)){ @b = split (/\n/, $l2); force @e, @b; } ($k = 0; $k < @e; $k++){ ($i = 0; $i < @w; $i++){ if (grep /\b$w[$i]\b/, $e[$k]){ print out "$w[$i] : $w[$l]\n"; } } }
how process look file first create dictionary map every word sentence, find if word in words.txt in dictionary? guess may faster. source code below:
#! /opt/vrtsperl/bin/perl $words = "words.txt"; $expressions = "expressions.txt"; $out = "out.txt"; open (e, "<", $expressions); open (w, "<", $words); open (out, ">", $out); %dic; while (my $sentence = <e>) { chomp($sentence); @words = split(/,/, $sentence); foreach $word (@words) { $dic{$word} .= "$sentence"; } } while (my $word = <w>) { chomp($word); if ($dic{$word}) { print out "$word : $dic{$word}\n" } }
perl grep filehandle
No comments:
Post a Comment