Monday, April 25, 2011

What is the best way to do string manipulation in a shell script?

I have a path as a string in a shell-script, could be absolute or relative:

/usr/userName/config.cfg

or

../config.cfg

I want to extract the file name (part after the last /, so in this case: "config.cfg")

I figure the best way to do this is with some simple regex? Is this correct? Should or should I use sed or awk instead?

Shell-scripting's string manipulation features seem pretty primative by themselves, and appear very esoteric.

Any example solutions are also appreciated.

From stackoverflow
  • Instead of string manipulation I'd just use

    file=`basename Filename`
    

    Edit:

    Thanks to unwind for some newer syntax for this (which assumes your filename is held in $filename):

    file=$(basename $filename)
    
    unwind : +1, but I'd recommend the newer syntax: "file=$(basename $filename)" assuming $filename holds the input name.
    RobS : @unwind. Thanks for this I wasn't aware of that syntax. Is it just preferred now (but essentially the same) or is it quicker?
    Paolo Tedesco : I think it's preferred as considered more readable, but that's all
    guns : The $() construct works in modern shells like ksh and bash, but I don't think it's strict POSIX sh. It would be a mistake to use it in a /bin/sh script, though it would still work in most systems (as it's usually a symlnk to bash)
    innaM : If your script does this (very) often, you will see a significant speed-up with the new syntax as your script doesn't have to shell out to determine the base name.
  • Most environments have access to perl and I'm more comfortable with that for most string manipulation.

    But as mentioned, from something this simple, you can use basename.

  • If you're okay with using bash, you can use bash string expansions:

    FILE="/path/to/file.example"
    FILE_BASENAME="${FILE##*/}"
    

    It's a little cryptic, but the braces start the variable expansion, and the double hash does a greedy removal of the specified glob pattern from the beginning of the string.

    Double %% does the same thing from the end of a string, and a single percent or hash does a non-greedy removal.

    Also, a simple replace construct is available too:

    FILE=${FILE// /_}
    

    would replace all spaces with underscores for instance.

    A single slash again, is non-greedy.

  • I typically use sed with a simple regex, like this:

    echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
    

    result:

    >echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
    config.cfg
    

0 comments:

Post a Comment