Breeding: grep - Perl - How to handle huge files for searching similar words -

Thursday, 15 May 2014

grep - Perl - How to handle huge files for searching similar words -

i'm working huge files. know, in opinion, best way handle huge files when know if word "x" in $file1, nowadays in sentence "y" in file2. files have more 20000 lines..

example:

this content of first file :

eat take breath alpha

this content of sec file :

eat,hungry love,lovers me,mine take,taken,give you,u,yo fun,funny

this content might expect 3rd file

eat : eat,hungry take : take,taken,give : you,u,yo

so can see, find matching look in sec file of word of first file.

my solutions - loop never ends -

solution1:

$file1= "words.txt"; $file2 = "expressions.txt"; $out = "out.txt";  open (w, "<", $file1); open (e, "<", $file2); open (out, ">", $out);  while(defined($l = <w>)){     @a = split (/\n/, $l);      force @w, @a; }  while(defined($l2 = <e>)){     ($i = 0; $i < @w; $i++){         if (grep /\q\b$w[$i]\b\e/, $l2){ #or /\b$w[$i]\b/             print out "$w[$i] : $l2\n";         }     } }

solution2:

$file1= "words.txt"; $file2 = "expressions.txt"; $out = "out.txt";  open (w, "<", $file1); open (e, "<", $file2); open (out, ">", $out);  while(defined($l = <w>)){     @a = split (/\n/, $l);      force @w, @a;  while(defined($l2 = <e>)){     @b = split (/\n/, $l2);      force @e, @b; }  ($k = 0; $k < @e; $k++){     ($i = 0; $i < @w; $i++){         if (grep /\b$w[$i]\b/, $e[$k]){             print out "$w[$i] : $w[$l]\n";         }     } }

how process look file first create dictionary map every word sentence, find if word in words.txt in dictionary? guess may faster. source code below:

#! /opt/vrtsperl/bin/perl      $words = "words.txt";     $expressions = "expressions.txt";     $out = "out.txt";      open (e, "<", $expressions);     open (w, "<", $words);     open (out, ">", $out);      %dic;      while (my $sentence = <e>) {         chomp($sentence);         @words = split(/,/, $sentence);         foreach $word (@words) {             $dic{$word} .= "$sentence";         }        }      while (my $word = <w>) {         chomp($word);         if ($dic{$word}) {             print out "$word : $dic{$word}\n"         }        }

perl grep filehandle

Breeding

Thursday, 15 May 2014

grep - Perl - How to handle huge files for searching similar words -

No comments:

Post a Comment