public:books:linux_shell_scripting_cookbook:chapter_2 [NervTech's Wiki]

This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong.
===== 2. Have a Good Command =====

==== Concatenating with cat ====

  * General way to read content with cat:<code>cat file1 file2 file3 ...</code>
  * Combine stdin with a file:<code>echo 'Text through stdin' | cat - file.txt</code>
  * Removing extra (more than 2 consecutive) blank lines:<code>cat -s file</code>
  * Other cat flags:<code># Display tabs as ^I:
cat -T file.py

# Display line numbers:
cat -n file.txt</code>

==== Recording and playing back of terminal sessions ====

  * We can start recording a session with:<code>script -t 2> timing.log -a output.session
...
# type commands here
...
exit</code>

  * Replay the commands with:<code>scriptreplay timing.log output.session</code>

==== Finding files and file listing ====
 
  * Find all the files in current directory:<code>find base_path

# For instance:
find . -print

# We can use -print0 to use '\0' as delimiting character.
# This is usefull when flename contains spaces</code>

  * Search base on filename or regular expression:<code>find /home/slynux -name "*.txt" -print

# Using the option iname to ignore case:
find . -iname "example*" -print

# or condition for multiple criteria:
find . \( -name "*.txt" -o -name "*.pdf" \) -print

# Using path argument:
find /home/users -path "*/slynux/*" -print

# Using regex argumen to match paths based on regular expressions:
find . -regex ".*\(\.py\|\.sh\)$"

# or iregex to ignore case:
find . -iregex ".*\(\.py\|\.sh\)$"
</code>

  * Negating arguments:<code># Exclude things that match a pattern:
find . ! -name "*.txt" -print
</code>

  * Search based on directory depth:<code># Only printing files in the current directory:
find . -maxdepth 1 -name "f*" -print

# Or using mindepth:
find . -mindepth 2 -name "f*" -print

# note that these flags should be specified as third argument for find to improve efficiency.</code>

  * Search based on file type:<code>
find . -type d -print  # find directories
find . -type f -print # find regular files
find . -type l -print # find symlinks
</code>

  * Search on file times:<code> # we can use the flags:
-atime : access time
-mtime : modification time
-ctime : change time

# The provided inter value is the number of days:
find . -type f -atime -7 -print # all files accessed within the last 7 days
find . -type f -atime 7 -print # all files accessed exactly 7 days ago.
find . -type f -atime +7 -print # all files that were accessed more that 7 days ago.

# we can also used minutes based flags:
-amin
-mmin
-cmin

# We can also find files newer that a given file:
find . -type f -newer file.txt -print
</code>

  * Search based on file size:<code>find . -type f -size +2k # files bigger than 2KB
find . -type f -size -2k # smaller than 2kB

# instead of 'k', we can use 'M', 'G'
</code>

  * Deleting file matches:<code>find . -type f -name "*.swp" -delete</code>
  * Match based on file permissions:<code>find . -type f -perm 644 -print

# For instance:
find . -type f -name "*.php" ! -perm 644 -print

# Or search based on user:
find . -type f -user slynux -print
</code>

  * Executing commands with find:<code>find . -type f -user root -exec chown slynux {} \;

# In the previous command:
# '{}' will be replaced by each filename.

# if we want to run a command with a list of files as parameters then
# we just replace ';' with '+'</code>

  * To concatenate multiple files for instance:<code>find . -type f -name "*.c" -exec cat {} \;>all_c_files.txt</code>
  
  * To copy all the .txt files that are older than 10 days to a directory OLD:<code> find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD  \;</code>
  
  * If we need multiple commands with exec then we have to write a script file.
  * Combine exec with printf:<code>find . -type f -name "*.txt" -exec printf "Text file: %s\n" {} \;</code>

  * Skipping specified directories in find:<code>find devel/source_path  \( -name ".git" -prune \) -o \( -type f -print \)</code>

==== Playing with xargs ====

  * Converting multiple lines to a single line output:<code>cat example.txt | xargs</code>
  * Converting single-line into multiple-line output:<code>cat example.txt | xargs -n 3</code>
  * Specify delimiter:<code>echo "splitXsplitXsplitXsplit" | xargs -d X</code>
  * Provide one/more arguments from a file listing to a command:<code>cat args.txt | xargs -n 1 ./cecho.sh

# To provide n arguments we use the prototype:
INPUT | xargs -n X

# to provide all the arguments at once:
cat args.txt | xargs ./ccat.sh
</code>
  * We can specify the -I flag to provide a replacement string (only with one argument per command execution):<code>cat args.txt | xargs -I {} ./cecho.sh -p {} -l</code>
  * Using xargs with find:<code>find . -type f -name "*.txt"  -print | xargs rm -f

# Safer implementation is:
find . -type f -name "*.txt" -print0 | xargs -0 rm -f
</code>

  * Count number of lines of C code:<code>find source_code_dir_path -type f -name "*.c" -print0 | xargs -0 wc -l

# One could also consider using the SLOCCount utility</code>

  * Using a subshell script instead of xargs:<code> cat files.txt | xargs -I {} cat {}

# is equivalent to:
cat files.txt  | ( while read arg; do cat $arg; done )
</code>

==== Translating with tr ====
 
  * Simple translation:<code>echo "HELLO WHO IS THIS" | tr 'A-Z' 'a-z'</code>
  * Other encryptions:<code>echo 12345 | tr '0-9' '9876543210' # encrypt
echo 87654 | tr '9876543210' '0-9' # decrypt

# ROT13 encryption:
echo "tr came, tr saw, tr conquered." | tr 'a-zA-Z' 'n-za-mN-ZA-M'
# decryption:
echo ge pnzr, ge fnj, ge pbadhrerq. | tr 'a-zA-Z' 'n-za-mN-ZA-M'
</code>

  * Converting tab to space:<code>tr '\t' ' ' < file.txt</code>
  
  * Deleting characters:<code>echo "Hello 123 world 456" | tr -d '0-9'</code>
  * Complementing character set:<code>echo hello 1 char 2 next 4 | tr -d -c '0-9 \n'</code>

  * Squeezing characters with tr:<code>echo "GNU is       not     UNIX. Recursive   right ?" | tr -s ' '</code>
  * Compute a sum from a file:<code># Assuming sum.txt contains one number per line:
cat sum.txt | echo $[ $(tr '\n' '+' ) 0 ]</code>
  * Can be used with character sets like: alnum, alpha, cntrl, digit, graph, lower, print, punct, space, upper, xdigit: <code>tr [:class:] [:class:]</code>

==== Checksum and verification ====

  * To compute the checksum we can use:<code>$ md5sum filename
68b329da9893e34099c7d8ad5cb9c940 filename

# We can redirect the output to file:
$ md5sum filename > file_sum.md5

# Prototype is:
$ md5sum file1 file2 file3 ...

# This will output one line per file.
</code>

  * To verify the integrity of a file:<code>$ md5sum -c file_sum.md5
# This will output a message whether checksum matches or not.
# Alternatively:
$ md5sum -c *.md5
</code>

  * Usage of SHA-1 is similar: replace **md5sum** with **sha1sum**.
  * We can compute checksum for directory with md5deep and sha2deep:<code>$ md5deep -rl directory_path > directory.md5
# -r to enable recursive traversal
# -l for using relative path. By default it writes absolute file path in 
output

# Alternatively we can use find:
$ find directory_path -type f -print0 | xargs -0 md5sum >> directory.md5</code>

==== Cryptographic tools and hashes ====

  * Encryption with crypt:<code>$ crypt <input_file >output_file
Enter passphrase:

# alternatively, we can provide the passphrase on the command line:
$ crypt PASSPHRASE <input_file >encrypted_file

# to decrypt:
$ crypt PASSPHRASE -d <encrypted_file >output_file</code>

  * Encryption with gpg:<code>$ gpg -c filename

# to decrypt:
$ gpg filename.gpg</code>

  * Encryption with base64:<code>$ base64 filename > outputfile

# or:
$ cat file | base64 > outputfile

# To decode:
$ base64 -d file > outputfile

# or:
$ cat base64_file | base64 -d > outputfile
</code>

  * md5sum and sha1sum can also be used to store passwords for instance (but bcrypt and sha512sum are recommended instead)

  * Generate shadow password with openssl:<code>$ opensslpasswd -1 -salt SALT_STRING PASSWORD
$1$SALT_STRING$323VkWkSLHuhbt1zkSsUG.</code>

==== Sorting unique and duplicates ====

  * Sort a given set of files:<code>$ sort file1.txt file2.txt > sorted.txt

# or:
$ sort file1.txt file2.txt -o sorted.txt

# For numerical sorting:
$ sort -n file.txt

# To sort in reverse order:
$ sort -r file.txt

# To sort by month:
$ sort -M months.txt

# To merge 2 sorted files:
$ sort -m sorted1 sorted2

# To find unique lines in sorted file:
$ sort file1.txt file2.txt | uniq
</code>

  * To check if a file is already sorted we check the result of sort:<code>#!/bin/bash
#Desc: Sort
sort -C filename ;
if [ $? -eq 0 ]; then
   echo Sorted;
else
   echo Unsorted;
fi</code>

  * Sort by a column in text file:<code>$ cat data.txt
1  mac    2000
2  winxp    4000
3  bsd    1000
4  linux    1000

# we use the -k flag to specify the column to use:

# Sort reverse by column1
$ sort -nrk 1  data.txt
4  linux    1000 
3  bsd    1000 
2  winxp    4000 
1  mac    2000 
# -nr means numeric and reverse

# Sort by column 2
$ sort -k 2  data.txt
3  bsd    1000 
4  linux    1000 
1  mac    2000 
2  winxp    4000
</code>

  * Specify a range for the key:<code>$ cat data.txt
1010hellothis
2189ababbba
7464dfddfdfd
$ sort -nk 2,3 data.txt

# To use first character as key:
$ sort -nk 1,1 data.txt

# To use a \0 separator:
$ sort -z data.txt | xargs -0
#Zero terminator is used to make safe use with xargs

# To ignore leading blank and use dictionnary order:
$ sort -bd unsorted.txt
</code>

  * Usage of uniq:<code>$ sort unsorted.txt | uniq</code>
  * Display only the unique lines:<code>$ uniq -u sorted.txt</code>
  * Count how many times each line appears:<code>$ sort unsorted.txt | uniq -c
      1 bash
      1 foss
      2 hack</code>
  * Find the duplicate lines:<code>$ sort unsorted.txt  | uniq -d
hack</code>

  * Specify start and width for uniqueness computation:<code>$ cat data.txt
u:01:gnu 
d:04:linux 
u:01:bash 
u:01:hack

$ sort data.txt | uniq -s 2 -w 2
d:04:linux 
u:01:bash 
</code>

  * Terminate lines with \0 separator:<code>$ uniq -z file.txt</code>

==== Temporary file naming and random numbers ====

  * Create a temporary file:<code>$ filename=`mktemp`
$ echo $filename
/tmp/tmp.8xvhkjF5fH</code>
  
  * Create temporary directory:<code>$ dirname=`mktemp -d`
$ echo $dirname
tmp.NI8xzW7VRX</code>

  * To just generate a filename without actually creating it:<code>$ tmpfile=`mktemp -u`
$ echo $tmpfile
/tmp/tmp.RsGmilRpcT</code>

  * Create temp file according to template:<code>$mktemp test.XXX
test.2tc</code>

==== Splitting files and data ====

  * Splitting a file:<code>$ split -b 10k data.file
$ ls
data.file  xaa  xab  xac  xad  xae  xaf  xag  xah  xai  xaj</code>

  * To use numeric suffixes:<code>$ split -b 10k data.file -d -a 4</code>

  * Specify a filename prefix:<code>$ split -b 10k data.file -d -a 4 split_file</code>

  * To split based on number of lines:<code>$ split -l 10 data.file</code>
  * csplit can be used to split based on file content: <code>csplit server.log /SERVER/ -n 2 -s {*}  -f server -b "%02d.log"  ; rm server00.log </code>

==== Slicing filenames based on extension ====

  * Extracting the name from **name.extension**:<code>file_jpg="sample.jpg"
name=${file_jpg%.*}
echo File name is: $name</code>

  * Extracting the extension from **name.extension**:<code>extension=${file_jpg#*.}</code>
  * Note the the oerator % is non-greedy (eg. finds the minimal match). Instead operator % % is greedy:<code>$ VAR=hack.fun.book.txt
$ echo ${VAR%.*}
hack.fun.book

$ echo ${VAR%%.*}
hack
</code>
  * We also have operator ## similar to # but greedy:<code>$ VAR=hack.fun.book.txt
$ echo ${VAR#*.}
fun.book.txt

$ echo ${VAR##*.}
txt
</code>

==== Renaming and moving files in bulk ====

  * Rename all image files in the current directory:<code>#!/bin/bash
#Filename: rename.sh
#Desc: Rename jpg and png files
count=1;
for img in `find . -iname '*.png' -o -iname '*.jpg' -type f -maxdepth 
1`
do
  new=image-$count.${img##*.}
  echo "Renaming $img to $new"
  mv "$img" "$new"
  let count++
done </code>

  * Renaming *.JPG to *.jpg:<code>$ rename *.JPG *.jpg</code>
  * Replace spaces with underscore:<code>$ rename 's/ /_/g' *</code>
  * Convert from uppr to lower or opposite:<code>$ rename 'y/A-Z/a-z/' *
$ rename 'y/a-z/A-Z/' *</code>
  * Move all mp3 in a folder:<code>$ find path -type f -name "*.mp3" -exec mv {} target_dir \;</code>
  * Recursive rename:<code>$ find path -type f -exec rename 's/ /_/g' {} \;</code>

==== Spell checking and dictionary manipulation ====

  * Dictionary files found in /usr/share/dict/
  * Check if word is part of dictionary:<code>#!/bin/bash
#Filename: checkword.sh
word=$1
grep "^$1$" /usr/share/dict/british-english -q 
if [ $? -eq 0 ]; then
  echo $word is a dictionary word;
else
  echo $word is not a dictionary word;
fi

# Usage as:
$ ./checkword.sh ful 
ful is not a dictionary word 

$ ./checkword.sh fool 
fool is a dictionary word
</code>
  * or we can use **aspell**.
  * List  all words in a file starting with a given word as follows: <code>$ look word filepath

# or:
$ grep "^word" filepath</code>

==== Automating interactive input ====

  * Automate an input for a command with:<code>$ echo -e "1\nhello\n" | ./interactive.sh 
You have entered 1, hello

# -e flag for echo means 'interpret escape sequences'</code>
  * The **expect** program can be used when the input order is not always the same.

==== Making commands quicker by running parallel processes ====

  * Run multiple instances of scripts with for instance:<code>#/bin/bash
#filename: generate_checksums.sh
PIDARRAY=()
for file in File1.iso File2.iso
do
  md5sum $file &
  PIDARRAY+=("$!") # $! : retrieves the PID of the last background process.
done
wait ${PIDARRAY[@]}</code>