Wednesday, April 6, 2011

Linux command to replace string in LARGE file with another string

I have a huge SQL file that gets executed on the server. The dump is from my machine and in it there are a few settings relating to my machine. So basically, I want every occurance of "c://temp" to be replace by "//home//some//blah"

How can this be done from the command line?

From stackoverflow
  • sed is a good choice for large files.

    sed -i.bak -e 's%C://temp%//home//some//blah%' large_file.sql
    

    It is a good choice because doesn't read the whole file at once to change it. Quoting the manual:

    A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed's ability to filter text in a pipeline which particularly distinguishes it from other types of editors.

    The relevant manual section is here. A small explanation follows

    -i.bak enables in place editing leaving a backup copy with .bak extension

    s%foo%bar% uses s, the substitution command, which substitutes matches of first string in between the % sign, 'foo', for the second string, 'bar'. It's usually written as s// but because your strings have plenty of slashes, it's more convenient to change them for something else so you avoid having to escape them.

    Example

    vinko@mithril:~$ sed -i.bak -e 's%C://temp%//home//some//blah%' a.txt
    vinko@mithril:~$ more a.txt
    //home//some//blah
    D://temp
    //home//some//blah
    D://temp
    vinko@mithril:~$ more a.txt.bak
    C://temp
    D://temp
    C://temp
    D://temp
    
    dalloliogm : You can use a different character to avoid having to quote the slashes, for example sed -e "s%C://temp%/home//some//blah%". Also, the -i option allows you to save the file inplace, when you are sure of the options.
    RD : This is the command I'm typing: sed -i.bak -e 's%C:\\temp\%/home/liveon/public_html/tmp' liveon.sql and this is the error I'm getting: sed: -e expression #1, char 41: unterminated `s' command Anyone?
    Vinko Vrsalovic : You are missing the final %, the command is s%foo%bar%
    Dave Jarvis : Also, RD, make sure to escape backslashes properly.
  • The sed command can do that. Rather than escaping the slashes, you can choose a different delimiter (_ in this case):

    sed -e 's_c://temp/_/home//some//blah/_' file1.txt > file2.txt
    
    dalloliogm : you missed the last underscore: "s_c://temp/_/home//some//blah_"
    stefanw : thanks! It's now fixed.
  • Try sed? Something like:

    sed 's/c:\/\/temp/\/\/home\/\/some\/\/blah/' mydump.sql > fixeddump.sql
    

    Escaping all those slashes makes this look horrible though, here's a simpler example which changes foo to bar.

    sed 's/foo/bar/' mydump.sql > fixeddump.sql
    

    As others have noted, you can choose your own delimiter, which would prevent the leaning toothpick syndrome in this case:

    sed 's|c://temp\\|home//some//blah|' mydump.sql > fixeddump.sql
    

    The clever thing about sed is that it operating on a stream rather than a file all at once, so you can process huge files using only a modest amount of memory.

  • Just for completeness. In place replacement using perl.

    perl -i -p -e 's{c://temp}{//home//some//blah}g' mysql.dmp
    

    No backslash escapes required either. ;)

    Telemachus : Please note that if you use the `-i` flag without an extension, you get *no backup*. If you want a backup, try `-i.bak` which will do the in-place edit *and* give you a backup of the original as `original.bak`, pretty much for free.
    jrockway : I let my version control system handle making the backups.
    Telemachus : @Jrockway: that's lovely for you I'm sure, but it assumes that the files in question are under version control and that you know what -i.bak does and have chosen not to use it. I just wish people who recommend the -i switch would take two seconds to explain the difference between -i and -i.bak. It will really hurt if the files you play with are not under version control and you make a simple typo (e.g, forget the -p flag).
  • There's also a non-standard UNIX utility, rpl, which does the exact same thing that the sed examples do; however, I'm not sure whether rpl operates streamwise, so sed may be the better option here.

    Vinko Vrsalovic : Heh, per chance, are you a friend of the developer of rpl? :-)
    Meredith L. Patterson : Nope, never heard of the guy outside of the util; it came in handy for doing a batch-replace job on a few thousand text files once and I've kept it in my toolbox.
    Telemachus : It would be worth saying *why* you recommend it in this case (or why you might, since you half take back the recommendation). That is, rather than just throw up the name of a utility, tell us what you liked about it, please.
    Tyler McHenry : rpl is nice for simple replacements because it has a much more user-friendly syntax than the combination of sed and find that it replaces. It also has a neat dry-run feature where it will tell you what it would replace without actually doing the replacement. It's main limitation is that it only does straight replacements and no regular expressions.
    Meredith L. Patterson : @Telemachus - Tyler nailed it.
  • perl -pi -e 's#c://temp#//home//some//blah#g' yourfilename
    

    The -p will treat this script as a loop, it will read the specified file line by line running the regex search and replace.

    -i This flag should be used in conjunction with the -p flag. This commands Perl to edit the file in place.

    -e Just means execute this perl code.

    Good luck

  • gawk

    awk '{gsub("c://temp","//home//some//blah")}1' file
    

0 comments:

Post a Comment