public:books:linux_shell_scripting_cookbook:chapter_5

  • Using wget:
    wget URL
    
    # specify output file with -O
    # specify log file (instead of stdout) with -o:
    
    $ wget ftp://example_domain.com/somefile.img -O dloaded_file.img -o log
    
    # Specify number of retries with -t:
    $ wget -t 5 URL
    $ wget -t 0 URL  # retries infinitely.
    
    # Restrict the download speed: (k for kilobyte, m for megabyte)
    $ wget  --limit-rate 20k http://example.com/file.iso 
    
    # resume downloading:
    $ wget -c URL
    
    # copy complete website:
    $ wget --mirror --convert-links exampledomain.com
    
    # or limit the depth of the copy:
    $ wget -r -N -l -k DEPTH URL
    
    # Access pages with authentication:
    $ wget --user username --password pass URL
  • Usage of lynx:
    $ lynx URL -dump > webpage_as_text.txt
  • We can use the -nolist option to remove the numbers added for the links reference.
  • Prevent progress information display for curl with --silent option.
  • Curl usage:
    # Write to a file from the URL filename:
    $ curl URL --silent -O
    
    # to show progress bar:
    $ curl http://slynux.org -o index.html --progress
    
    # resume download:
    $ curl -C - URL
    
    # specify the referer string:
    $ curl --referer Referer_URL target_URL
    
    # specify cookies:
    $ curl http://example.com --cookie "user=slynux;pass=hack"
    
    # Set user agent:
    $ curl URL --user-agent "Mozilla/5.0"
    
    # pass additional header:
    $ curl -H "Host: www.slynux.org" -H "Accept-language: en" URL
    
    # specify speed limit:
    $ curl URL --limit-rate 20k
    
    # authentificate with curl:
    $ curl -u user:pass http://test_auth.com
    # or with password prompt:
    $ curl -u user http://test_auth.com 
    
    # Use the -I or -head option with curl to dump only the HTTP headers, without downloading 
    # the remote file. For example:
    $ curl -I http://slynux.org
  • Could use a script such as:
    #!/bin/bash
    #Desc: Fetch gmail tool
    username='PUT_USERNAME_HERE'
    password='PUT_PASSWORD_HERE'
    SHOW_COUNT=5 # No of recent unread mails to be shown
    echo
    
    curl  -u $username:$password --silent "https://mail.google.com/mail/
    feed/atom" | \
    tr -d '\n' | sed 's:</entry>:\n:g' |\
     sed -n 's/.*<title>\(.*\)<\/title.*<author><name>\([^<]*\)<\/
    name><email> 
    \([^<]*\).*/From: \2 [\3] \nSubject: \1\n/p' | \
    head -n $(( $SHOW_COUNT * 3 ))
  • Parsing content is usually done with sed and awk:
    $ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html  
    | \
    grep -o "Rank-.*" | \
    sed -e 's/ *Rank-\([0-9]*\) *\(.*\)/\1\t\2/' | \
    sort -nk 1 > actresslist.txt
  • Could use a script such as:
    #!/bin/bash
    #Desc: Images downloader
    #Filename: img_downloader.sh
    if [ $# -ne 3 ];
    then
      echo "Usage: $0 URL -d DIRECTORY"
      exit -1
    fi
    for i in {1..4}
    do
      case $1 in
      -d) shift; directory=$1; shift ;;
       *) url=${url:-$1}; shift;;
      esac
    done
    mkdir -p $directory;
    baseurl=$(echo $url | egrep -o "https?://[a-z.]+")
    echo Downloading $url
    curl -s $url | egrep -o "<img src=[^>]*>" |  
    sed 's/<img src=\"\([^"]*\).*/\1/g' > /tmp/$$.list
    sed -i "s|^/|$baseurl/|" /tmp/$$.list
    cd $directory;
    while read filename;
    do
      echo Downloading $filename
      curl -s -O "$filename" --silent
    done < /tmp/$$.list
  • usage example:
    $ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images
  • Typical script for thumbnail generation:
    #!/bin/bash
    #Filename: generate_album.sh
    #Description: Create a photo album using images in current directory
    echo "Creating album..."
    mkdir -p thumbs
    cat <<EOF1 > index.html
    <html>
    <head>
    <style>
    body 
    { 
      width:470px;
      margin:auto;
      border: 1px dashed grey;
      padding:10px; 
    } 
    img
    { 
      margin:5px;
      border: 1px solid black;
    } 
    </style>
    </head>
    <body>
    <center><h1> #Album title </h1></center>
    <p>
    EOF1
    for img in *.jpg;
    do 
      convert "$img" -resize "100x" "thumbs/$img"
      echo "<a href=\"$img\" ><img src=\"thumbs/$img\" title=\"$img\" /></
    a>" >> index.html
    done
    cat <<EOF2 >> index.html
    </p>
    </body>
    </html>
    EOF2 
    echo Album generated to index.html
  1. We need to download the bash-oauth library from https://github.com/livibetter/bash-oauth/archive/master.zip
  2. Then install from the sub dir bash-oauth-master with:
    # make install-all
  3. Go to https://dev.twitter.com/apps/new and register a new app.
  4. Provide read/write access to the new app.
  5. Retrieve the consumer key and the consumer secret
  6. Then use the following script:
    #!/bin/bash
    #Filename: twitter.sh
    #Description: Basic twitter client
    oauth_consumer_key=YOUR_CONSUMER_KEY
    oauth_consumer_secret=YOUR_CONSUMER_SECRET
    config_file=~/.$oauth_consumer_key-$oauth_consumer_secret-rc 
    if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]];
    then 
      echo -e "Usage: $0 tweet status_message\n   OR\n      $0 read\n"
      exit -1;
    fi
    source TwitterOAuth.sh
    TO_init
    if [ ! -e $config_file ]; then
     TO_access_token_helper
     if (( $? == 0 )); then
       echo oauth_token=${TO_ret[0]} > $config_file
       echo oauth_token_secret=${TO_ret[1]} >> $config_file
     fi
    fi
    source $config_file
    if [[ "$1" = "read" ]];
    then
      TO_statuses_home_timeline '' 'shantanutushar' '10'
      echo $TO_ret | sed 's/<\([a-z]\)/\n<\1/g' | \
    grep -e '^<text>' -e '^<name>' | sed 's/<name>/\ - by /g' | \
    sed 's$</*[a-z]*>$$g'
    elif [[ "$1" = "tweet" ]];
    then 
      shift
      TO_statuses_update '' "$@"
      echo 'Tweeted :)'
    fi
  • Then to use the script:
    $ ./twitter.sh read
    Please go to the following link to get the PIN: https://api.twitter.com/
    oauth/authorize?oauth_token=GaZcfsdnhMO4HiBQuUTdeLJAzeaUamnOljWGnU
    PIN: 4727143
    Now you can create, edit and present Slides offline.
     - by A Googler
    
    $ ./twitter.sh tweet "I am reading Packt Shell Scripting Cookbook"
    Tweeted :)
    $ ./twitter.sh read | head -2
    I am reading Packt Shell Scripting Cookbook 
     - by Shantanu Tushar Jha
     
  • Register for an account on a dictionary website.
  • Then use a script such as:
    #!/bin/bash
    #Filename: define.sh
    #Desc: A script to fetch definitions from dictionaryapi.com
    apikey=YOUR_API_KEY_HERE
    if  [ $# -ne 2 ];
    then
      echo -e "Usage: $0 WORD NUMBER"
      exit -1;
    fi
    curl --silent http://www.dictionaryapi.com/api/v1/references/learners/
    xml/$1?key=$apikey | \
    grep -o \<dt\>.*\</dt\> | \
    sed 's$</*[a-z]*>$$g' | \
    head -n $2 | nl
  • lynx and curl can be used for find broken links:
    #!/bin/bash 
    #Filename: find_broken.sh
    #Desc: Find broken links in a website
    if [ $# -ne 1 ]; 
    then 
      echo -e "$Usage: $0 URL\n" 
      exit 1; 
    fi 
    echo Broken links: 
    mkdir /tmp/$$.lynx 
    cd /tmp/$$.lynx 
    lynx -traversal $1 > /dev/null 
    count=0; 
    sort -u reject.dat > links.txt 
    while read link; 
    do 
      output=`curl -I $link -s | grep "HTTP/.*OK"`; 
      if [[ -z $output ]]; 
      then 
        echo $link; 
        let count++ 
      fi 
    done < links.txt 
    [ $count -eq 0 ] && echo No broken links found.
  • We use curl and diff to do this:
    #!/bin/bash
    #Filename: change_track.sh
    #Desc: Script to track changes to webpage
    if [ $# -ne 1 ];
    then 
      echo -e "$Usage: $0 URL\n"
      exit 1;
    fi
    first_time=0
    # Not first time
    if [ ! -e "last.html" ];
    then
      first_time=1
      # Set it is first time run
    fi
    curl --silent $1 -o recent.html
    if [ $first_time -ne 1 ];
    then
      changes=$(diff -u last.html recent.html)
      if [ -n "$changes" ];
      then
        echo -e "Changes:\n"
        echo "$changes"
      else
        echo -e "\nWebsite has no changes"
      fi
    else
      echo "[First run] Archiving.."
    fi
      
    cp recent.html last.html
  • Automating POST request with curl:
    $ curl URL -d "postvar=postdata2&postvar2=postdata2"
    
    # for instance:
    $ curl http://book.sarathlakshman.com/lsc/mlogs/submit.php -d "host=test-
    host&user=slynux"
    <html>
    You have entered :
    <p>HOST : test-host</p>
    <p>USER : slynux</p>
    <html>
  • With wget we can post with the --post-data argument:
    $ wget http://book.sarathlakshman.com/lsc/mlogs/submit.php --post-data 
    "host=test-host&user=slynux" -O output.html
    $ cat output.html
    <html>
    You have entered :
    <p>HOST : test-host</p>
    <p>USER : slynux</p>
    <html>
  • public/books/linux_shell_scripting_cookbook/chapter_5.txt
  • Last modified: 2020/07/10 12:11
  • by 127.0.0.1