$.*$<\/title.*<author><name>$[^<]*$<\/ name><email> $[^<]*$.*/From: \2 [\3] \nSubject: \1\n/p' | \ head -n $(( $SHOW_COUNT * 3 )) </code> ==== Parsing data from a website ==== * Parsing content is usually done with sed and awk:<code>$ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html | \ grep -o "Rank-.*" | \ sed -e 's/ *Rank-$[0-9]*$ *$.*$/\1\t\2/' | \ sort -nk 1 > actresslist.txt</code> ==== Image crawler and downloader ==== * Could use a script such as:<code>#!/bin/bash #Desc: Images downloader #Filename: img_downloader.sh if [ $# -ne 3 ]; then echo "Usage: $0 URL -d DIRECTORY" exit -1 fi for i in {1..4} do case $1 in -d) shift; directory=$1; shift ;; *) url=${url:-$1}; shift;; esac done mkdir -p $directory; baseurl=$(echo $url | egrep -o "https?://[a-z.]+") echo Downloading $url curl -s $url | egrep -o "<img src=[^>]*>" | sed 's/<img src=\"$[^"]*$.*/\1/g' > /tmp/$$.list sed -i "s|^/|$baseurl/|" /tmp/$$.list cd $directory; while read filename; do echo Downloading $filename curl -s -O "$filename" --silent done < /tmp/$$.list</code> * usage example: <code>$ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images</code> ==== Web photo album generator ==== * Typical script for thumbnail generation: <code>#!/bin/bash #Filename: generate_album.sh #Description: Create a photo album using images in current directory echo "Creating album..." mkdir -p thumbs cat <<EOF1 > index.html <html> <head> <style> body { width:470px; margin:auto; border: 1px dashed grey; padding:10px; } img { margin:5px; border: 1px solid black; } </style> </head> <body> <center><h1> #Album title </h1></center> <p> EOF1 for img in *.jpg; do convert "$img" -resize "100x" "thumbs/$img" echo "<a href=\"$img\" ><img src=\"thumbs/$img\" title=\"$img\" /></ a>" >> index.html done cat <<EOF2 >> index.html </p> </body> </html> EOF2 echo Album generated to index.html </code> ==== Twitter command-line client ==== - We need to download the bash-oauth library from https://github.com/livibetter/bash-oauth/archive/master.zip - Then install from the sub dir bash-oauth-master with: <code># make install-all</code> - Go to https://dev.twitter.com/apps/new and register a new app. - Provide read/write access to the new app. - Retrieve the consumer key and the consumer secret - Then use the following script: <code>#!/bin/bash #Filename: twitter.sh #Description: Basic twitter client oauth_consumer_key=YOUR_CONSUMER_KEY oauth_consumer_secret=YOUR_CONSUMER_SECRET config_file=~/.$oauth_consumer_key-$oauth_consumer_secret-rc if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]]; then echo -e "Usage: $0 tweet status_message\n OR\n $0 read\n" exit -1; fi source TwitterOAuth.sh TO_init if [ ! -e $config_file ]; then TO_access_token_helper if (( $? == 0 )); then echo oauth_token=${TO_ret[0]} > $config_file echo oauth_token_secret=${TO_ret[1]} >> $config_file fi fi source $config_file if [[ "$1" = "read" ]]; then TO_statuses_home_timeline '' 'shantanutushar' '10' echo $TO_ret | sed 's/<$[a-z]$/\n<\1/g' | \ grep -e '^<text>' -e '^<name>' | sed 's/<name>/\ - by /g' | \ sed 's$</*[a-z]*>$$g' elif [[ "$1" = "tweet" ]]; then shift TO_statuses_update '' "$@" echo 'Tweeted :)' fi </code> * Then to use the script: <code>$ ./twitter.sh read Please go to the following link to get the PIN: https://api.twitter.com/ oauth/authorize?oauth_token=GaZcfsdnhMO4HiBQuUTdeLJAzeaUamnOljWGnU PIN: 4727143 Now you can create, edit and present Slides offline. - by A Googler $ ./twitter.sh tweet "I am reading Packt Shell Scripting Cookbook" Tweeted :) $ ./twitter.sh read | head -2 I am reading Packt Shell Scripting Cookbook - by Shantanu Tushar Jha </code> ==== Creating a "define" utility by using the Web backend ==== * Register for an account on a dictionary website. * Then use a script such as: <code>#!/bin/bash #Filename: define.sh #Desc: A script to fetch definitions from dictionaryapi.com apikey=YOUR_API_KEY_HERE if [ $# -ne 2 ]; then echo -e "Usage: $0 WORD NUMBER" exit -1; fi curl --silent http://www.dictionaryapi.com/api/v1/references/learners/ xml/$1?key=$apikey | \ grep -o \<dt\>.*\</dt\> | \ sed 's$</*[a-z]*>$$g' | \ head -n $2 | nl </code> ==== Finding broken links in a website ==== * lynx and curl can be used for find broken links: <code>#!/bin/bash #Filename: find_broken.sh #Desc: Find broken links in a website if [ $# -ne 1 ]; then echo -e "$Usage: $0 URL\n" exit 1; fi echo Broken links: mkdir /tmp/$$.lynx cd /tmp/$$.lynx lynx -traversal $1 > /dev/null count=0; sort -u reject.dat > links.txt while read link; do output=`curl -I $link -s | grep "HTTP/.*OK"`; if [[ -z $output ]]; then echo $link; let count++ fi done < links.txt [ $count -eq 0 ] && echo No broken links found.</code> ==== Tracking changes to a website ==== * We use curl and diff to do this: <code>#!/bin/bash #Filename: change_track.sh #Desc: Script to track changes to webpage if [ $# -ne 1 ]; then echo -e "$Usage: $0 URL\n" exit 1; fi first_time=0 # Not first time if [ ! -e "last.html" ]; then first_time=1 # Set it is first time run fi curl --silent $1 -o recent.html if [ $first_time -ne 1 ]; then changes=$(diff -u last.html recent.html) if [ -n "$changes" ]; then echo -e "Changes:\n" echo "$changes" else echo -e "\nWebsite has no changes" fi else echo "[First run] Archiving.." fi cp recent.html last.html</code> ==== Posting to a web page and reading the response ==== * Automating POST request with curl: <code>$ curl URL -d "postvar=postdata2&postvar2=postdata2" # for instance: $ curl http://book.sarathlakshman.com/lsc/mlogs/submit.php -d "host=test- host&user=slynux" <html> You have entered : <p>HOST : test-host</p> <p>USER : slynux</p> <html> </code> * With wget we can post with the --post-data argument: <code>$ wget http://book.sarathlakshman.com/lsc/mlogs/submit.php --post-data "host=test-host&user=slynux" -O output.html $ cat output.html <html> You have entered : <p>HOST : test-host</p> <p>USER : slynux</p> <html></code>

===== 5. Tangled Web, Not At All! ===== ==== Downloading from a web page ==== * Using wget:

wget URL

# specify output file with -O
# specify log file (instead of stdout) with -o:

$ wget ftp://example_domain.com/somefile.img -O dloaded_file.img -o log

# Specify number of retries with -t:
$ wget -t 5 URL
$ wget -t 0 URL  # retries infinitely.

# Restrict the download speed: (k for kilobyte, m for megabyte)
$ wget  --limit-rate 20k http://example.com/file.iso 

# resume downloading:
$ wget -c URL

# copy complete website:
$ wget --mirror --convert-links exampledomain.com

# or limit the depth of the copy:
$ wget -r -N -l -k DEPTH URL

# Access pages with authentication:
$ wget --user username --password pass URL

==== Downloading a web page as plain text ==== * Usage of lynx: $ lynx URL -dump > webpage_as_text.txt * We can use the -nolist option to remove the numbers added for the links reference. ==== A primer on cURL ==== * Prevent progress information display for curl with --silent option. * Curl usage:

# Write to a file from the URL filename:
$ curl URL --silent -O

# to show progress bar:
$ curl http://slynux.org -o index.html --progress

# resume download:
$ curl -C - URL

# specify the referer string:
$ curl --referer Referer_URL target_URL

# specify cookies:
$ curl http://example.com --cookie "user=slynux;pass=hack"

# Set user agent:
$ curl URL --user-agent "Mozilla/5.0"

# pass additional header:
$ curl -H "Host: www.slynux.org" -H "Accept-language: en" URL

# specify speed limit:
$ curl URL --limit-rate 20k

# authentificate with curl:
$ curl -u user:pass http://test_auth.com
# or with password prompt:
$ curl -u user http://test_auth.com 

# Use the -I or -head option with curl to dump only the HTTP headers, without downloading 
# the remote file. For example:
$ curl -I http://slynux.org

==== Accessing Gmail e-mails from the command line ==== * Could use a script such as:

#!/bin/bash
#Desc: Fetch gmail tool
username='PUT_USERNAME_HERE'
password='PUT_PASSWORD_HERE'
SHOW_COUNT=5 # No of recent unread mails to be shown
echo

curl  -u $username:$password --silent "https://mail.google.com/mail/
feed/atom" | \
tr -d '\n' | sed 's::\n:g' |\
 sed -n 's/.*\(.*\)<\/title.*<author><name>\([^<]*\)<\/
name><email> 
\([^<]*\).*/From: \2 [\3] \nSubject: \1\n/p' | \
head -n $(( $SHOW_COUNT * 3 ))
</code>

==== Parsing data from a website ====

  * Parsing content is usually done with sed and awk:<code>$ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html  
| \
grep -o "Rank-.*" | \
sed -e 's/ *Rank-\([0-9]*\) *\(.*\)/\1\t\2/' | \
sort -nk 1 > actresslist.txt</code>

==== Image crawler and downloader ====

  * Could use a script such as:<code>#!/bin/bash
#Desc: Images downloader
#Filename: img_downloader.sh
if [ $# -ne 3 ];
then
  echo "Usage: $0 URL -d DIRECTORY"
  exit -1
fi
for i in {1..4}
do
  case $1 in
  -d) shift; directory=$1; shift ;;
   *) url=${url:-$1}; shift;;
  esac
done
mkdir -p $directory;
baseurl=$(echo $url | egrep -o "https?://[a-z.]+")
echo Downloading $url
curl -s $url | egrep -o "<img src=[^>]*>" |  
sed 's/<img src=\"\([^"]*\).*/\1/g' > /tmp/$$.list
sed -i "s|^/|$baseurl/|" /tmp/$$.list
cd $directory;
while read filename;
do
  echo Downloading $filename
  curl -s -O "$filename" --silent
done < /tmp/$$.list</code>

  * usage example: <code>$ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images</code>

==== Web photo album generator ====

  * Typical script for thumbnail generation: <code>#!/bin/bash
#Filename: generate_album.sh
#Description: Create a photo album using images in current directory
echo "Creating album..."
mkdir -p thumbs
cat <<EOF1 > index.html
<html>
<head>
<style>
body 
{ 
  width:470px;
  margin:auto;
  border: 1px dashed grey;
  padding:10px; 
} 
img
{ 
  margin:5px;
  border: 1px solid black;
} 
</style>
</head>
<body>
<center><h1> #Album title </h1></center>
<p>
EOF1
for img in *.jpg;
do 
  convert "$img" -resize "100x" "thumbs/$img"
  echo "<a href=\"$img\" ><img src=\"thumbs/$img\" title=\"$img\" /></
a>" >> index.html
done
cat <<EOF2 >> index.html
</p>
</body>
</html>
EOF2 
echo Album generated to index.html
</code>

==== Twitter command-line client ====

  - We need to download the bash-oauth library from  https://github.com/livibetter/bash-oauth/archive/master.zip
  - Then install from the sub dir bash-oauth-master with: <code># make install-all</code>
  - Go to https://dev.twitter.com/apps/new and register a new app.
  - Provide read/write access to the new app.
  - Retrieve the consumer key and the consumer secret
  - Then use the following script: <code>#!/bin/bash
#Filename: twitter.sh
#Description: Basic twitter client
oauth_consumer_key=YOUR_CONSUMER_KEY
oauth_consumer_secret=YOUR_CONSUMER_SECRET
config_file=~/.$oauth_consumer_key-$oauth_consumer_secret-rc 
if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]];
then 
  echo -e "Usage: $0 tweet status_message\n   OR\n      $0 read\n"
  exit -1;
fi
source TwitterOAuth.sh
TO_init
if [ ! -e $config_file ]; then
 TO_access_token_helper
 if (( $? == 0 )); then
   echo oauth_token=${TO_ret[0]} > $config_file
   echo oauth_token_secret=${TO_ret[1]} >> $config_file
 fi
fi
source $config_file
if [[ "$1" = "read" ]];
then
  TO_statuses_home_timeline '' 'shantanutushar' '10'
  echo $TO_ret | sed 's/<\([a-z]\)/\n<\1/g' | \
grep -e '^<text>' -e '^<name>' | sed 's/<name>/\ - by /g' | \
sed 's$</*[a-z]*>$$g'
elif [[ "$1" = "tweet" ]];
then 
  shift
  TO_statuses_update '' "$@"
  echo 'Tweeted :)'
fi
</code>

  * Then to use the script: <code>$ ./twitter.sh read
Please go to the following link to get the PIN: https://api.twitter.com/
oauth/authorize?oauth_token=GaZcfsdnhMO4HiBQuUTdeLJAzeaUamnOljWGnU
PIN: 4727143
Now you can create, edit and present Slides offline.
 - by A Googler

$ ./twitter.sh tweet "I am reading Packt Shell Scripting Cookbook"
Tweeted :)
$ ./twitter.sh read | head -2
I am reading Packt Shell Scripting Cookbook 
 - by Shantanu Tushar Jha
 </code>

==== Creating a "define" utility by using the Web backend ====

  * Register for an account on a dictionary website.
  * Then use a script such as: <code>#!/bin/bash
#Filename: define.sh
#Desc: A script to fetch definitions from dictionaryapi.com
apikey=YOUR_API_KEY_HERE
if  [ $# -ne 2 ];
then
  echo -e "Usage: $0 WORD NUMBER"
  exit -1;
fi
curl --silent http://www.dictionaryapi.com/api/v1/references/learners/
xml/$1?key=$apikey | \
grep -o \<dt\>.*\</dt\> | \
sed 's$</*[a-z]*>$$g' | \
head -n $2 | nl
</code>

==== Finding broken links in a website ====

  * lynx and curl can be used for find broken links: <code>#!/bin/bash 
#Filename: find_broken.sh
#Desc: Find broken links in a website
if [ $# -ne 1 ]; 
then 
  echo -e "$Usage: $0 URL\n" 
  exit 1; 
fi 
echo Broken links: 
mkdir /tmp/$$.lynx 
cd /tmp/$$.lynx 
lynx -traversal $1 > /dev/null 
count=0; 
sort -u reject.dat > links.txt 
while read link; 
do 
  output=`curl -I $link -s | grep "HTTP/.*OK"`; 
  if [[ -z $output ]]; 
  then 
    echo $link; 
    let count++ 
  fi 
done < links.txt 
[ $count -eq 0 ] && echo No broken links found.</code>

==== Tracking changes to a website ====

  * We use curl and diff to do this: <code>#!/bin/bash
#Filename: change_track.sh
#Desc: Script to track changes to webpage
if [ $# -ne 1 ];
then 
  echo -e "$Usage: $0 URL\n"
  exit 1;
fi
first_time=0
# Not first time
if [ ! -e "last.html" ];
then
  first_time=1
  # Set it is first time run
fi
curl --silent $1 -o recent.html
if [ $first_time -ne 1 ];
then
  changes=$(diff -u last.html recent.html)
  if [ -n "$changes" ];
  then
    echo -e "Changes:\n"
    echo "$changes"
  else
    echo -e "\nWebsite has no changes"
  fi
else
  echo "[First run] Archiving.."
fi
  
cp recent.html last.html</code>

==== Posting to a web page and reading the response ====

  * Automating POST request with curl: <code>$ curl URL -d "postvar=postdata2&postvar2=postdata2"

# for instance:
$ curl http://book.sarathlakshman.com/lsc/mlogs/submit.php -d "host=test-
host&user=slynux"
<html>
You have entered :
<p>HOST : test-host</p>
<p>USER : slynux</p>
<html>
</code>

  * With wget we can post with the --post-data argument: <code>$ wget http://book.sarathlakshman.com/lsc/mlogs/submit.php --post-data 
"host=test-host&user=slynux" -O output.html
$ cat output.html
<html>
You have entered :
<p>HOST : test-host</p>
<p>USER : slynux</p>
<html></code>