Thursday, 15 May 2014

grep - Perl - How to handle huge files for searching similar words -



grep - Perl - How to handle huge files for searching similar words -

i'm working huge files. know, in opinion, best way handle huge files when know if word "x" in $file1, nowadays in sentence "y" in file2. files have more 20000 lines..

example:

this content of first file :

eat take breath alpha

this content of sec file :

eat,hungry love,lovers me,mine take,taken,give you,u,yo fun,funny

this content might expect 3rd file

eat : eat,hungry take : take,taken,give : you,u,yo

so can see, find matching look in sec file of word of first file.

my solutions - loop never ends -

solution1:

$file1= "words.txt"; $file2 = "expressions.txt"; $out = "out.txt"; open (w, "<", $file1); open (e, "<", $file2); open (out, ">", $out); while(defined($l = <w>)){ @a = split (/\n/, $l); force @w, @a; } while(defined($l2 = <e>)){ ($i = 0; $i < @w; $i++){ if (grep /\q\b$w[$i]\b\e/, $l2){ #or /\b$w[$i]\b/ print out "$w[$i] : $l2\n"; } } }

solution2:

$file1= "words.txt"; $file2 = "expressions.txt"; $out = "out.txt"; open (w, "<", $file1); open (e, "<", $file2); open (out, ">", $out); while(defined($l = <w>)){ @a = split (/\n/, $l); force @w, @a; while(defined($l2 = <e>)){ @b = split (/\n/, $l2); force @e, @b; } ($k = 0; $k < @e; $k++){ ($i = 0; $i < @w; $i++){ if (grep /\b$w[$i]\b/, $e[$k]){ print out "$w[$i] : $w[$l]\n"; } } }

how process look file first create dictionary map every word sentence, find if word in words.txt in dictionary? guess may faster. source code below:

#! /opt/vrtsperl/bin/perl $words = "words.txt"; $expressions = "expressions.txt"; $out = "out.txt"; open (e, "<", $expressions); open (w, "<", $words); open (out, ">", $out); %dic; while (my $sentence = <e>) { chomp($sentence); @words = split(/,/, $sentence); foreach $word (@words) { $dic{$word} .= "$sentence"; } } while (my $word = <w>) { chomp($word); if ($dic{$word}) { print out "$word : $dic{$word}\n" } }

perl grep filehandle

No comments:

Post a Comment