public:books:linux_shell_scripting_cookbook:chapter_4

  • Usage of grep:
    # searching for lines containing a pattern:
    $ grep "pattern" filename
    this is the line containing pattern
    
    # read from stdin:
    $ echo -e "this is a word\nnext line" | grep word 
    this is a word
    
    # search in multiple files:
    $ grep "match_text" file1 file2 file3 ... 
    
    # highlight the word in the line:
    $ grep word filename --color=auto
    this is the line containing word
    
    # use full set of regex:
    $ grep -E "[a-z]+" filename
    #or
    $ egrep "[a-z]+" filename
    
    # output only the matching portion of the line:
    $ echo this is a line. | egrep -o "[a-z]+\."
    line.
    
    # Print all lines except the ones containing match_pattern:
    $ grep -v match_pattern file # -v inverts the matches.
    
    # Count the number of matching lines:
    $ grep -c "text" filename
    10
    
    # count the number of matching items (could be many per line):
    $ echo -e "1 2 3 4\nhello\n5 6" | egrep -o "[0-9]" | wc -l
    6
    
    # print the line number of matching strings:
    $ cat sample1.txt
    gnu is not unix
    linux is fun
    bash is art
    
    $ cat sample2.txt
    planetlinux
    
    $ grep linux -n sample1.txt
    2:linux is fun
    
    # or
    
    $ cat sample1.txt | grep linux -n
    
    # print the character offset of the match per line:
    $ echo gnu is not unix | grep -b -o "not" # -b is always used with -o
    7:not
  • Recursively search in all text files in a directory:
    $ grep "text" . -R -n
    # for instance:
    $ cd src_dir
    $ grep "test_function()" . -R -n
    ./miscutils/test.c:16:test_function();
  • Ignore case of pattern:
    $ echo hello world | grep -i "HELLO"
    hello
  • grep by matching multiple patterns:
    $ echo this is a line of text | grep -e "this" -e "line" -o
    this
    line
    
    # or we could use a pattern file:
    $ cat pat_file
    hello
    cool
    
    $ echo hello this is cool | grep -f pat_file
    hello this is cool
  • Including and excluding files in a grep search:
    $ grep "main()" . -r  --include *.{c,cpp}
    
      # or 
      $ grep "main()" . -r --exclude "README" 
  • Using grep with xargs:
    $ echo "test" > file1
    $ echo "cool" > file2
    $ echo "test" > file3
    
    $ grep "test" file* -lZ | xargs -0 rm
  • Silent output for grep (when we only want to know if there was a match or not):
    #!/bin/bash 
    #Filename: silent_grep.sh
    #Desc: Testing whether a file contain a text or not 
    if [ $# -ne 2 ]; then
      echo "Usage: $0 match_text filename"
      exit 1
    fi
    match_text=$1 
    filename=$2
    grep -q "$match_text" $filename
    if [ $? -eq 0 ]; then
      echo "The text exists in the file"
    else
      echo "Text does not exist in the file"
    fi
  • Print lines before and after a match:
    # In order to print three lines after a match, use the -A option:
    $ seq 10 | grep 5 -A 3
    5
    6
    7
    8
    
    # In order to print three lines before the match, use the -B option:
    $ seq 10 | grep 5 -B 3
    2
    3
    4
    5
    
    # Print three lines after and before the match, and use the -C option as follows:
    $ seq 10 | grep 5 -C 3
    2
    3
    4
    5
    6
    7
    8
  • Usage of cut:
    #Prototype:
    cut -f FIELD_LIST filename
    
    # Example:
    $ cat student_data.txt 
    No  Name  Mark  Percent
    1  Sarath  45  90
    2  Alex  49  98
    3  Anu  45  90
    
    $ cut -f1 student_data.txt
    No 
    1 
    2 
    3 
    
    $ cut -f2,4 student_data.txt
    Name     Percent
    Sarath   90
    Alex     98
    Anu       90
    
    # print the inverted colum matches:
    $ cut -f3 --complement student_data.txt
    No  Name    Percent 
    1   Sarath  90
    2   Alex    98
    3   Anu     90
  • Specifying the delimiter characted can be done with -d:
    $ cut -f2 -d";" delimited_data.txt
  • We could also specify range of characters (-c), bytes (-b), defining fields (-f)
  • sed usage:
    # Prototype:
    $ sed 's/pattern/replace_string/' file
    Or:
    $ cat file | sed 's/pattern/replace_string/'
  • To save the changes in the source file we use the -i flag:
    $ sed -i 's/text/replace/' file
  • Additional usage:
    # for global replacement:
    $ sed 's/pattern/replace_string/g' file
    
    # could also stop replacement on Nth occurence:
    $ echo thisthisthisthis | sed 's/this/THIS/2g' 
    thisTHISTHISTHIS
    $ echo thisthisthisthis | sed 's/this/THIS/3g' 
    thisthisTHISTHIS
    $ echo thisthisthisthis | sed 's/this/THIS/4g' 
    thisthisthisTHIS
    
    # we can use any delimiter in sed:
    sed 's:text:replace:g'
    sed 's|text|replace|g'
    
    # need to escape delimiter if applicable:
    sed 's|te\|xt|replace|g'
    
    # remove blank lines:
    $ sed '/^$/d' file
    
    # Use the match string:
    $ echo this is an example | sed 's/\w\+/[&]/g'
    [this] [is] [an] [example]
    
    # use the substring matches:
    $ echo this is digit 7 in a number | sed 's/digit \([0-9]\)/\1/'
    this is 7 in a number
    
    $ echo seven EIGHT | sed 's/\([a-z]\+\) \([A-Z]\+\)/\2 \1/'
    EIGHT seven
    
    # Combination of expressions:
    $ sed 'expression' | sed 'expression'
    $ sed 'expression; expression'
    $ sed -e 'expression' -e expression'
    
    # supporting string evaluation (with double quotes)
    $ text=hello
    $ echo hello world | sed "s/$text/HELLO/" 
    HELLO world 
  • Structure of awk script:
    awk ' BEGIN{  print "start" } pattern { commands } END{ print "end" }' file
    
    # for example:
    $ awk 'BEGIN { i=0 } { i++ } END{ print i}' filename
  • When the arguments of print are comma separated, they are printed with a space delimiter:
    $ echo | awk '{ var1="v1"; var2="v2"; var3="v3"; \
    print var1,var2,var3 ; }'
    v1 v2 v3
    
    # otherwise we could do:
    $ echo | awk '{ var1="v1"; var2="v2"; var3="v3"; \
    print var1 "-" var2 "-" var3 ; }'
    v1-v2-v3
  • Special variables in awk:
    NR: current record number(eg. current line when lines are used as records)
    NF: current field number (separated by space in the current record)
    $0: text content of current line
    $1: text of first field
    $2: text of second field.
    
    # for instance:
    $ echo -e "line1 f2 f3\nline2 f4 f5\nline3 f6 f7" | \
    awk '{
    print "Line no:"NR",No of fields:"NF, "$0="$0, "$1="$1,"$2="$2,"$3="$3 
    }' 
    Line no:1,No of fields:3 $0=line1 f2 f3 $1=line1 $2=f2 $3=f3 
    Line no:2,No of fields:3 $0=line2 f4 f5 $1=line2 $2=f4 $3=f5 
    Line no:3,No of fields:3 $0=line3 f6 f7 $1=line3 $2=f6 $3=f7
    
    # print the last field with:
    print $NF,
    
    # The previous before:
    print $(NF-1)
  • Perform summation:
    $ seq 5 | awk 'BEGIN{ sum=0; print "Summation:" } 
    { print $1"+"; sum+=$1 } END { print "=="; print sum }' 
    Summation: 
    1+ 
    2+ 
    3+ 
    4+ 
    5+ 
    ==
    15
  • Passing variable to awk:
    $ VAR=10000
    $ echo | awk -v VARIABLE=$VAR '{ print VARIABLE }'
    10000
    
    # Or:
    $ var1="Variable1" ; var2="Variable2"
    $ echo | awk '{ print v1,v2 }' v1=$var1 v2=$var2
    Variable1 Variable2
    
    # When using a file input:
    $ awk '{ print v1,v2 }' v1=$var1 v2=$var2 filename
  • Explicitly read a line:
    $ seq 5 | awk 'BEGIN { getline; print "Read ahead first line", $0 } { 
    print $0 }'
    Read ahead first line 1
    2
    3
    4
    5
  • Specify conditions for line processing:
    $ awk 'NR < 5' # first four lines
    $ awk 'NR==1,NR==4' #First four lines
    $ awk '/linux/' # Lines containing the pattern linux (we can specify 
    regex)
    $ awk '!/linux/' # Lines not containing the pattern linux
  • We can set the delimiter with -F:
    $ awk -F: '{ print $NF }' /etc/passwd
    
    # or
    $ awk 'BEGIN { FS=":" } { print $NF }' /etc/passwd
    
    # We can set the output fields separator by setting OFS="delimiter" in the BEGIN block.
  • Read output of command from awk:
    $ echo | awk '{ "grep root /etc/passwd" | getline cmdout ; print cmdout }'
    root:x:0:0:root:/root:/bin/bash
  • Using for loop in awk:
    # Prototype:
    for(i=0;i<10;i++) { print $i ; }
    # or:
    for(i in array) { print array[i]; }
  • String manipulation in awk:
    length(string): This returns the string length.
    index(string, search_string): This returns the position at which search_string is found in the string.
    split(string, array, delimiter): This stores the list of strings generated by using the delimiter in the array.
    substr(string, start-position, end-position): This returns the substring created from the string by using the start and end character offsets.
    sub(regex, replacement_str, string): This replaces the first occurring regular expression match from the string with replacment_str.
    gsub(regex, replacment_str, string): This is similar to sub(), but it replaces every regular expression match.
    match(regex, string): This returns the result of whether a regular expression (regex) match is found in the string
    or not. It returns a non-zero output if a match is  found, otherwise it returns zero. Two special variables are 
    associated with match(). They are RSTART and RLENGTH. The RSTART variable contains the position at which the 
    regular expression match starts. The RLENGTH variable contains the length of the string matched by the regular 
    expression.
  • Scrip to use:
    #!/bin/bash
    #Name: word_freq.sh
    #Desc: Find out frequency of words in a file
    if [ $# -ne 1 ];
    then
      echo "Usage: $0 filename";
      exit -1
    fi
    filename=$1
    egrep -o "\b[[:alpha:]]+\b" $filename | \
    awk '{ count[$0]++ }
    END{ printf("%-14s%s\n","Word","Count") ;
    for(ind in count)
    {  printf("%-14s%d\n",ind,count[ind]);  }
    }'
  • Could use a script such as:
    $ cat sample.js |  \
    tr -d '\n\t' |  tr -s ' ' \
    | sed 's:/\*.*\*/::g' \
    | sed 's/ \?\([{}();,:]\) \?/\1/g' 
  • For decompression:
    $ cat obfuscated.txt | sed 's/;/;\n/g; s/{/{\n\n/g; s/}/\n\n}/g' 
  • paste can be used to do column wise concatenation:
    $ cat file1.txt
    1
    2
    3
    4
    5
    
    $ cat file2.txt
    slynux
    gnu
    bash
    hack
    
    $ paste file1.txt file2.txt -d ","
    1,slynux
    2,gnu
    3,bash
    4,hack
    5,
  • Using awk:
    $ awk '{ print $5 }' filename
    
    # or:
    $ ls -l | awk '{ print $1 " :  " $8 }'
    -rw-r--r-- :  delimited_data.txt
    -rw-r--r-- :  obfuscated.txt
    -rw-r--r-- :  paste1.txt
    -rw-r--r-- :  paste2.txt
  • Print a range of lines with awk:
    # To print the lines of a text in a range of line numbers, M to N, use the following syntax:
    $ awk 'NR==M, NR==N' filename
    
    # Or using stdin:
    $ cat filename | awk 'NR==M, NR==N'
    
    # to print lines in a section starting with start_pattern and ending with end_pattern, we use:
    $ awk '/start_pattern/, /end _pattern/' filename
    
    # for instance:
    $ cat section.txt 
    line with pattern1 
    line with pattern2 
    line with pattern3 
    line end with pattern4 
    line with pattern5 
    
    $ awk '/pa.*3/, /end/' section.txt 
    line with pattern3 
    line end with pattern4
  • we can use tac instead of cat:
    tac file1 file2 ...
    
    # for instance:
    $ seq 5 | tac
    5 
    4 
    3 
    2 
    1
    
    # separator can be specified with -s "separator"
  • Same thing with awk:
    $ seq 9 | \
    awk '{ lifo[NR]=$0 } 
    END{ for(lno=NR;lno>-1;lno--){ print lifo[lno]; } 
    }'
    
    # Note that in shell \ is used to break a single line command into multiple lines.
  • For email, the regex to use is:
    [A-Za-z0-9._]+@[A-Za-z0-9.]+\.[a-zA-Z]{2,4}
    
    # for instance:
    $ egrep -o '[A-Za-z0-9._]+@[A-Za-z0-9.]+\.[a-zA-Z]{2,4}'  url_email.txt
    slynux@slynux.com 
    test@yahoo.com 
    cool.hacks@gmail.com
  • For HTTP URL the regex pattern is:
    http://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,4}
  • Using sed for instance:
    $ sed 's/ [^.]*mobile phones[^.]*\.//g' sentence.txt
  • With find and sed:
    $ find . -name *.cpp -print0 |  xargs -I{} -0 sed -i 's/Copyright/Copyleft/g' {}
  • Or we can use the exec form:
    $ find . -name *.cpp -exec sed -i 's/Copyright/Copyleft/g' \{\} \;
    
    # or:
    $ find . -name *.cpp -exec sed -i 's/Copyright/Copyleft/g' \{\} \+
    # This second form will combine multiple filenames together before sending them to sed.
  • Replacing text techniques:
    $ var="This is a line of text"
    $ echo ${var/line/REPLACED} 
    This is a REPLACED of text
  • Produce a substring:
    ${variable_name:start_position:length}
    
    # for instance:
    $ string=abcdefghijklmnopqrstuvwxyz
    $ echo ${string:4}
    efghijklmnopqrstuvwxyz
    
    $ echo ${string:4:8}
    efghijkl
    
    # We can also specify counting from the end of the string:
    $ echo ${string:(-1)}
    z
    $ echo ${string:(-2):2}
    yz
  • public/books/linux_shell_scripting_cookbook/chapter_4.txt
  • Last modified: 2020/07/10 12:11
  • by 127.0.0.1