===== 3. File In, File Out =====
==== Generating files of any size ====
* **loopback** files: contain a filesystem itself and can be mounted.
* Create a file with dd:$ dd if=/dev/zero of=junk.data bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00767266 s, 137 MB/s
==== The intersection and set difference (A-B) on text files ====
* comm utility operations:
* Intersection: The intersection operation will print the lines that the specified files have in common with one another.
* Difference: The difference operation will print the lines that the specified files contain and that are not the same in all of those files
* Set difference: The set difference operation will print the lines in file "A" that do not match those in all of the set of files specified ("B" plus "C" for example)
* Comparaison:$ cat A.txt
apple
orange
gold
silver
steel
iron
$ cat B.txt
orange
gold
cookies
carrot
$ sort A.txt -o A.txt ; sort B.txt -o B.txt
$ comm A.txt B.txt
apple
carrot
cookies
gold
iron
orange
silver
steel
* To print intersection we remove columns 1 and 2:$ comm A.txt B.txt -1 -2
gold
orange
* To print the lines that are uncommon in the two files:$ comm A.txt B.txt -3
apple
carrot
cookies
iron
silver
steel
# To produce a single column:
$ comm A.txt B.txt -3 | sed 's/^\t//'
apple
carrot
cookies
iron
silver
steel
# Difference for A.txt:
$ comm A.txt B.txt -2 -3
# Difference for B.txt:
$ comm A.txt B.txt -1 -3
==== Finding and deleting duplicate files ====
* Script to remove duplicate files by content:#!/bin/bash
#Filename: remove_duplicates.sh
#Description: Find and remove duplicate files and keep one sample
of each file.
ls -lS --time-style=long-iso | awk 'BEGIN {
getline; getline;
name1=$8; size=$5
}
{
name2=$8;
if (size==$5)
{
"md5sum "name1 | getline; csum1=$1;
"md5sum "name2 | getline; csum2=$1;
if ( csum1==csum2 )
{
print name1; print name2
}
};
size=$5; name1=name2;
}' | sort -u > duplicate_files
cat duplicate_files | xargs -I {} md5sum {} | sort | uniq -w 32 |
awk '{ print "^"$2"$" }' | sort -u > duplicate_sample
echo Removing...
comm duplicate_files duplicate_sample -2 -3 | tee /dev/stderr |
xargs rm
echo Removed duplicates files successfully.
==== Working with file permissions, ownership, and the sticky bit ====
* We have the setuid bit for the owner: -rwS------
* We have the setgid bit for the group: ----rwS---
* We have the sticky bit for directories: -------rwT
* Set sticky bit: $ chmod a+t directory_name
==== Making files immutable ====
* Make a file immutable with: # chattr +i file
* Remove immutable state with: # chattr -i file
==== Generating blank files in bulk ====
* Create blank file with:$ touch filename
* Generate files in bulk:for name in {1..100}.txt
do
touch $name
done
# We can use different patterns such as:
test{1..200}.c
testing{a..z}.txt
* We can modify only access time or modification time with touch:touch -a filename
touch -m filename
* We can specify the time to use:$ touch -d "Fri Jun 25 20:50:14 IST 1999" filename
==== Finding symbolic links and their targets ====
* Create symlink with: $ ln -s target symbolic_link_name
* Print only symlinks: $ ls -l | grep "^l"
* Find symlinks: $ find . -type l -print
* Read the target of a symlink: $ readlink web
/var/www
==== Enumerating file type statistics ====
* To print the type of a file:$ file filename
# For instance:
$ file /etc/passwd
/etc/passwd: ASCII text
# To exclude the filename:
$ file -b /etc/passwd
ASCII text
* Script for file statistics:#!/bin/bash
# Filename: filestat.sh
if [ $# -ne 1 ];
then
echo "Usage is $0 basepath";
exit
fi
path=$1
declare -A statarray;
while read line;
do
ftype=`file -b "$line" | cut -d, -f1`
let statarray["$ftype"]++;
done < <(find $path -type f -print)
echo ============ File types and counts =============
for ftype in "${!statarray[@]}";
do
echo $ftype : ${statarray["$ftype"]}
done
* Usage of the previous script:$ ./filetype.sh /home/slynux/programs
============ File types and counts =============
Vim swap file : 1
ELF 32-bit LSB executable : 6
ASCII text : 2
ASCII C program text : 10
* in Bash 3.x we could have used: done <<< "`find $path -type f -print`"
==== Using loopback files ====
* Create a 1 GB file: $ dd if=/dev/zero of=loobackfile.img bs=1G count=1
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 37.3155 s, 28.8 MB/s
* Then format the file:$ mkfs.ext4 loopbackfile.img
# Check the file type:
$ file loobackfile.img
loobackfile.img: Linux rev 1.0 ext4 filesystem data,
UUID=c9d56c42-f8e6-4cbd-aeab-369d5056660a (extents) (large files)
(huge files)
* Mount the file:# mkdir /mnt/loopback
# mount -o loop loopbackfile.img /mnt/loopback
# Or manually:
# losetup /dev/loop1 loopbackfile.img
# mount /dev/loop1 /mnt/loopback
# To unmount:
# umount mount_point
* We can create partitions inside loopback images:# losetup /dev/loop1 loopback.img
# fdisk /dev/loop1
* Quicker way to mount llopback disk images with partitions using kpartx: # kpartx -v -a diskimage.img
add map loop0p1 (252:0): 0 114688 linear /dev/loop0 8192
add map loop0p2 (252:1): 0 15628288 linear /dev/loop0 122880
# Then
# mount /dev/mapper/loop0p1 /mnt/disk1
# Then remove the mapping:
# kpartx -d diskimage.img
loop deleted : /dev/loop0
* Mounting ISO files as loopback:# mkdir /mnt/iso
# mount -o loop linux.iso /mnt/iso
* Flush changes immediately: $ sync
==== Creating ISO files and hybrid ISO ====
* To create ISO from cdrom we use: # cat /dev/cdrom > image.iso
* But preferred method is:# dd if=/dev/cdrom of=image.iso
* We can create an iso from a folder with:$ mkisofs -V "Label" -o image.iso source_dir/
* Convert standard ISO to hybrid ISO:# isohybrid image.iso
# Then to write the iso on an USB device:
# dd if=image.iso of=/dev/sdb1
# or:
# cat image.iso >> /dev/sdb1
* Burning ISO from command line:# cdrecord -v dev=/dev/cdrom image.iso
# extra options:
-speed SPEED # speed in number of x (eg. 8x or 16x for instance)
-multi # for multi session.
# eject cd rom:
$ eject
$ eject -t # to close the tray
==== Finding the difference between files, patching ====
* Say we have the file contents: # File 1: version1.txt
this is the original text
line2
line3
line4
happy hacking !
# File 2: version2.txt
this is the original text
line2
line4
happy hacking !
GNU is not UNIX
* Nonunified diff will be:$ diff version1.txt version2.txt
3d2
GNU is not UNIX
* Unified diff output will be:$ diff -u version1.txt version2.txt
--- version1.txt 2010-06-27 10:26:54.384884455 +0530
+++ version2.txt 2010-06-27 10:27:28.782140889 +0530
@@ -1,5 +1,5 @@
this is the original text
line2
-line3
line4
happy hacking !
-
+GNU is not UNIX
* Generate a patch file by redirecting the output of diff to a file:$ diff -u version1.txt version2.txt > version.patch
* Apply a patch with patch command: $ patch -p1 version1.txt < version.patch
patching file version1.txt
* To revert the patch: $ patch -p1 version1.txt < version.patch
patching file version1.txt
Reversed (or previously applied) patch detected! Assume -R? [n] y
#Changes are reverted.
# We could use the -R flag directly here to avoid the message.
* Generate difference between directories:$ diff -Naur directory1 directory2
# -N: treating absend files as empty
# -a : consider all files as text files
# -u produce unified output
# -r recursively traverse directories
==== Using head and tail for printing the last or first 10 lines ====
* Usage of head: # print the first 10 lines:
$ head file
# read data from stdin:
$ cat text | head
# specify the number of lines to display:
$ head -b 4 file
# print all lines except last M lines:
$ head -n -M file
* Usage of tail:# print the last 10 lines:
$ tail file
# read from stdin:
$ cat text | tail
# print the last 5 lines:
$ tail -n 5 file
# print all lines except first M lines:
$ tail -n -(M+1)
# Monitor the change in the file content with:
$ tail -f growing_file
# terminate tail when a process dies:
$ PID=$(pidof Foo)
$ tail -f file --pid $PID
==== Listing only directories – alternative methods ====
* techniques to list directories:$ ls -d */
$ ls -F | grep "/$"
$ ls -l | grep "^d"
$ find . -type d -maxdepth 1 -print
==== Fast command-line navigation using pushd and popd ====
* We use pushd and popd as:pushd /var/www
# then the stack contains [/var/www | ~ ]
# and current path is /var/www
# push additional paths:
pushd /usr/src
# list all the paths in the stack:
$ dirs
/usr/src /var/www ~ /usr/share /etc
# switching to a given path in the list (indexes start from 0):
$ pushd +3
# To remove the last pushed and change to the previous directory:
$ popd
# remove a specific path from the stack:
popd +num # indexes from 0.
* When we deal only with 2 paths, then we can use instead:
$ cd /var/www
$ cd /usr/src
$ cd - # switch to alternate path.
==== Counting the number of lines, words, and characters in a file ====
* Usage of wc utility: # Count number of lines:
$ wc -l file
# use stdin:
$ cat file | wc -l
# count num words:
$ wc -w file
$ cat file | wc -w
# count num characters:
$ wc -c file
$ cat file | wc -c
# print num lines words and characters:
$ wc file
1435 15763 112200
# print the length of longest line:
$ wc file -L
205
==== Printing the directory tree ====
* with the tree command: $ tree some_folder
# We can also provide a pattern:
$ tree path -P PATTERN # Pattern should be wildcard
$ tree path -I PATTERN # only files excluding the match pattern
$ tree -h # print the size along the files and directories.
* HTML output with tree: tree PATH -H http://localhost -o out.html