wget URL
# specify output file with -O
# specify log file (instead of stdout) with -o:
$ wget ftp://example_domain.com/somefile.img -O dloaded_file.img -o log
# Specify number of retries with -t:
$ wget -t 5 URL
$ wget -t 0 URL # retries infinitely.
# Restrict the download speed: (k for kilobyte, m for megabyte)
$ wget --limit-rate 20k http://example.com/file.iso
# resume downloading:
$ wget -c URL
# copy complete website:
$ wget --mirror --convert-links exampledomain.com
# or limit the depth of the copy:
$ wget -r -N -l -k DEPTH URL
# Access pages with authentication:
$ wget --user username --password pass URL
==== Downloading a web page as plain text ====
* Usage of lynx: $ lynx URL -dump > webpage_as_text.txt
* We can use the -nolist option to remove the numbers added for the links reference.
==== A primer on cURL ====
* Prevent progress information display for curl with --silent option.
* Curl usage: # Write to a file from the URL filename:
$ curl URL --silent -O
# to show progress bar:
$ curl http://slynux.org -o index.html --progress
# resume download:
$ curl -C - URL
# specify the referer string:
$ curl --referer Referer_URL target_URL
# specify cookies:
$ curl http://example.com --cookie "user=slynux;pass=hack"
# Set user agent:
$ curl URL --user-agent "Mozilla/5.0"
# pass additional header:
$ curl -H "Host: www.slynux.org" -H "Accept-language: en" URL
# specify speed limit:
$ curl URL --limit-rate 20k
# authentificate with curl:
$ curl -u user:pass http://test_auth.com
# or with password prompt:
$ curl -u user http://test_auth.com
# Use the -I or -head option with curl to dump only the HTTP headers, without downloading
# the remote file. For example:
$ curl -I http://slynux.org
==== Accessing Gmail e-mails from the command line ====
* Could use a script such as:#!/bin/bash
#Desc: Fetch gmail tool
username='PUT_USERNAME_HERE'
password='PUT_PASSWORD_HERE'
SHOW_COUNT=5 # No of recent unread mails to be shown
echo
curl -u $username:$password --silent "https://mail.google.com/mail/
feed/atom" | \
tr -d '\n' | sed 's::\n:g' |\
sed -n 's/.*\(.*\)<\/title.*\([^<]*\)<\/
name>
\([^<]*\).*/From: \2 [\3] \nSubject: \1\n/p' | \
head -n $(( $SHOW_COUNT * 3 ))
==== Parsing data from a website ====
* Parsing content is usually done with sed and awk:$ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html
| \
grep -o "Rank-.*" | \
sed -e 's/ *Rank-\([0-9]*\) *\(.*\)/\1\t\2/' | \
sort -nk 1 > actresslist.txt
==== Image crawler and downloader ====
* Could use a script such as:#!/bin/bash
#Desc: Images downloader
#Filename: img_downloader.sh
if [ $# -ne 3 ];
then
echo "Usage: $0 URL -d DIRECTORY"
exit -1
fi
for i in {1..4}
do
case $1 in
-d) shift; directory=$1; shift ;;
*) url=${url:-$1}; shift;;
esac
done
mkdir -p $directory;
baseurl=$(echo $url | egrep -o "https?://[a-z.]+")
echo Downloading $url
curl -s $url | egrep -o "]*>" |
sed 's/ /tmp/$$.list
sed -i "s|^/|$baseurl/|" /tmp/$$.list
cd $directory;
while read filename;
do
echo Downloading $filename
curl -s -O "$filename" --silent
done < /tmp/$$.list
* usage example: $ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images
==== Web photo album generator ====
* Typical script for thumbnail generation: #!/bin/bash
#Filename: generate_album.sh
#Description: Create a photo album using images in current directory
echo "Creating album..."
mkdir -p thumbs
cat < index.html
#Album title
EOF1
for img in *.jpg;
do
convert "$img" -resize "100x" "thumbs/$img"
echo "
a>" >> index.html
done
cat <> index.html
EOF2
echo Album generated to index.html
==== Twitter command-line client ====
- We need to download the bash-oauth library from https://github.com/livibetter/bash-oauth/archive/master.zip
- Then install from the sub dir bash-oauth-master with: # make install-all
- Go to https://dev.twitter.com/apps/new and register a new app.
- Provide read/write access to the new app.
- Retrieve the consumer key and the consumer secret
- Then use the following script: #!/bin/bash
#Filename: twitter.sh
#Description: Basic twitter client
oauth_consumer_key=YOUR_CONSUMER_KEY
oauth_consumer_secret=YOUR_CONSUMER_SECRET
config_file=~/.$oauth_consumer_key-$oauth_consumer_secret-rc
if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]];
then
echo -e "Usage: $0 tweet status_message\n OR\n $0 read\n"
exit -1;
fi
source TwitterOAuth.sh
TO_init
if [ ! -e $config_file ]; then
TO_access_token_helper
if (( $? == 0 )); then
echo oauth_token=${TO_ret[0]} > $config_file
echo oauth_token_secret=${TO_ret[1]} >> $config_file
fi
fi
source $config_file
if [[ "$1" = "read" ]];
then
TO_statuses_home_timeline '' 'shantanutushar' '10'
echo $TO_ret | sed 's/<\([a-z]\)/\n<\1/g' | \
grep -e '^' -e '^' | sed 's//\ - by /g' | \
sed 's$*[a-z]*>$$g'
elif [[ "$1" = "tweet" ]];
then
shift
TO_statuses_update '' "$@"
echo 'Tweeted :)'
fi
* Then to use the script: $ ./twitter.sh read
Please go to the following link to get the PIN: https://api.twitter.com/
oauth/authorize?oauth_token=GaZcfsdnhMO4HiBQuUTdeLJAzeaUamnOljWGnU
PIN: 4727143
Now you can create, edit and present Slides offline.
- by A Googler
$ ./twitter.sh tweet "I am reading Packt Shell Scripting Cookbook"
Tweeted :)
$ ./twitter.sh read | head -2
I am reading Packt Shell Scripting Cookbook
- by Shantanu Tushar Jha
==== Creating a "define" utility by using the Web backend ====
* Register for an account on a dictionary website.
* Then use a script such as: #!/bin/bash
#Filename: define.sh
#Desc: A script to fetch definitions from dictionaryapi.com
apikey=YOUR_API_KEY_HERE
if [ $# -ne 2 ];
then
echo -e "Usage: $0 WORD NUMBER"
exit -1;
fi
curl --silent http://www.dictionaryapi.com/api/v1/references/learners/
xml/$1?key=$apikey | \
grep -o \.*\ | \
sed 's$*[a-z]*>$$g' | \
head -n $2 | nl
==== Finding broken links in a website ====
* lynx and curl can be used for find broken links: #!/bin/bash
#Filename: find_broken.sh
#Desc: Find broken links in a website
if [ $# -ne 1 ];
then
echo -e "$Usage: $0 URL\n"
exit 1;
fi
echo Broken links:
mkdir /tmp/$$.lynx
cd /tmp/$$.lynx
lynx -traversal $1 > /dev/null
count=0;
sort -u reject.dat > links.txt
while read link;
do
output=`curl -I $link -s | grep "HTTP/.*OK"`;
if [[ -z $output ]];
then
echo $link;
let count++
fi
done < links.txt
[ $count -eq 0 ] && echo No broken links found.
==== Tracking changes to a website ====
* We use curl and diff to do this: #!/bin/bash
#Filename: change_track.sh
#Desc: Script to track changes to webpage
if [ $# -ne 1 ];
then
echo -e "$Usage: $0 URL\n"
exit 1;
fi
first_time=0
# Not first time
if [ ! -e "last.html" ];
then
first_time=1
# Set it is first time run
fi
curl --silent $1 -o recent.html
if [ $first_time -ne 1 ];
then
changes=$(diff -u last.html recent.html)
if [ -n "$changes" ];
then
echo -e "Changes:\n"
echo "$changes"
else
echo -e "\nWebsite has no changes"
fi
else
echo "[First run] Archiving.."
fi
cp recent.html last.html
==== Posting to a web page and reading the response ====
* Automating POST request with curl: $ curl URL -d "postvar=postdata2&postvar2=postdata2"
# for instance:
$ curl http://book.sarathlakshman.com/lsc/mlogs/submit.php -d "host=test-
host&user=slynux"
You have entered :
HOST : test-host
USER : slynux
* With wget we can post with the --post-data argument: $ wget http://book.sarathlakshman.com/lsc/mlogs/submit.php --post-data
"host=test-host&user=slynux" -O output.html
$ cat output.html
You have entered :
HOST : test-host
USER : slynux