Monday, June 16, 2008

Recursive search using "grep"

Typically, the answer is to use find, xargs, and grep. That's
horribly slow for a full filesystem search, and it's painfully
difficult to properly construct a pipeline that will avoid
searching binaries if you don't want to, won't get stuck on named
pipes or blow up on funky filenames (beginning with -, or sometimes
spaces, punctuation etc). There are ways around all these things,
but they are all ugly.

BTW, something that almost never gets mentioned but that I will
frequently use under conditions where it is appropriate is a
simple


grep pattern * */* */*/* 2>/dev/null

Not useful much beyond that, and may not even be good at that
except for certain starting points, but it's faster than any find
xargs pipeline can ever be if the set is small enough.


The simplistic approach using find is





find /whereveryouwantostart -exec grep whatever {} dev/null \;







That's not necessarily very efficient. Using xargs can help





find . xargs grep whatever







But it also has bugs if the filenames could have "-" at their
beginning. Fixing that can be a little nasty.


You may not want to grep binary files:





find . -type f -printxargs filegrep -i textcut -fl -d: xargs grep whatever







That's pretty awful, but it's what you have to get into if you
have special cases. Special cases are what makes this question more
difficult. If you have a small number of files and subdirs to
search, the simple approach may work fine for you. If not, you have
to get more creative.


Bill Campbell offers this Perl script:





I have a perlscript I call ``textfiles'' that I use for many

things like this:

textfiles dirname [dirname... ] xargs ...




Essentially it runs ``gfind @ARGV -type f'', then uses perl's -T

option on each file to determine whether it's a text file.




My textfiles script also has options to add options to the gnu

find command like -xdev, -mindepth, and -maxdepth.




Hell, it's short so I'm attaching it for anybody who wants to use

it. It does assume that the gnu version of find is in your PATH

named gfind (I make a symlink to /usr/bin/find on Linux systems

so that it works there as well).






#!/usr/local/bin/perl

eval ' exec /usr/local/bin/perl -S $0 "$@" '

if $running_under_some_shell;




# $Header: /u/usr/cvs/lbin/textfiles,v 1.7 2000/06/22 18:29:08 bill Exp $

# $Date: 2000/06/22 18:29:08 $

# @(#) $Id: textfiles,v 1.7 2000/06/22 18:29:08 bill Exp $

#

# find text files




( $progname = $0 ) =~ s!.*/!!; # save this very early




$USAGE = "

# Find text files

#

# Usage: $progname [-v] [file [file...]]

#

# Options Argument Description

# -f Follow symlinks

# -M maxdepth maxdepth argument to gfind

# -m mindepth mindepth argument to gfind

# -x Don't cross device boundaries

# -v Verbose

#

";




sub usage {

die join("\n",@_) .

"\n$USAGE\n";

}




do "getopts.pl";




&usage("Invalid Option") unless do Getopts("fM:m:xvV");




$verbose = '-v' if $opt_v;

$suffix = $$ unless $opt_v;




$\ = "\n"; # use newlines as separators.




# use current directory if there aren't any arguments

push(@ARGV, '.') unless defined($ARGV[0]);




$args = join(" ", @ARGV);

$xdev = '-xdev' if $opt_x;

$opt_f = '-follow' if $opt_f;

$opt_m = "-mindepth $opt_m" if $opt_m;

$opt_M = "-maxdepth $opt_M" if $opt_M;

$cmd = "gfind @ARGV -type f $xdev $opt_f $opt_m $opt_M ";

print STDERR "cmd = >$cmd<" if $verbose;




open(INPUT, $cmd);

while(<INPUT>) {

chop($name = $_);

print STDERR "testing $name..." if $verbose;

print $name if -T $name;

}









No comments: