3. File In, File Out
Generating files of any size
- loopback files: contain a filesystem itself and can be mounted.
- Create a file with dd:
$ dd if=/dev/zero of=junk.data bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00767266 s, 137 MB/s
The intersection and set difference (A-B) on text files
- comm utility operations:
- Intersection: The intersection operation will print the lines that the specified files have in common with one another.
- Difference: The difference operation will print the lines that the specified files contain and that are not the same in all of those files
- Set difference: The set difference operation will print the lines in file “A” that do not match those in all of the set of files specified (“B” plus “C” for example)
- Comparaison:
$ cat A.txt apple orange gold silver steel iron $ cat B.txt orange gold cookies carrot $ sort A.txt -o A.txt ; sort B.txt -o B.txt $ comm A.txt B.txt apple carrot cookies gold iron orange silver steel
- To print intersection we remove columns 1 and 2:
$ comm A.txt B.txt -1 -2 gold orange
- To print the lines that are uncommon in the two files:
$ comm A.txt B.txt -3 apple carrot cookies iron silver steel # To produce a single column: $ comm A.txt B.txt -3 | sed 's/^\t//' apple carrot cookies iron silver steel # Difference for A.txt: $ comm A.txt B.txt -2 -3 # Difference for B.txt: $ comm A.txt B.txt -1 -3
Finding and deleting duplicate files
- Script to remove duplicate files by content:
#!/bin/bash #Filename: remove_duplicates.sh #Description: Find and remove duplicate files and keep one sample of each file. ls -lS --time-style=long-iso | awk 'BEGIN { getline; getline; name1=$8; size=$5 } { name2=$8; if (size==$5) { "md5sum "name1 | getline; csum1=$1; "md5sum "name2 | getline; csum2=$1; if ( csum1==csum2 ) { print name1; print name2 } }; size=$5; name1=name2; }' | sort -u > duplicate_files cat duplicate_files | xargs -I {} md5sum {} | sort | uniq -w 32 | awk '{ print "^"$2"$" }' | sort -u > duplicate_sample echo Removing... comm duplicate_files duplicate_sample -2 -3 | tee /dev/stderr | xargs rm echo Removed duplicates files successfully.
Working with file permissions, ownership, and the sticky bit
- We have the setuid bit for the owner: -rwS------
- We have the setgid bit for the group: ----rwS---
- We have the sticky bit for directories: -------rwT
- Set sticky bit:
$ chmod a+t directory_name
Making files immutable
- Make a file immutable with:
# chattr +i file
- Remove immutable state with:
# chattr -i file
Generating blank files in bulk
- Create blank file with:
$ touch filename
- Generate files in bulk:
for name in {1..100}.txt do touch $name done # We can use different patterns such as: test{1..200}.c testing{a..z}.txt
- We can modify only access time or modification time with touch:
touch -a filename touch -m filename
- We can specify the time to use:
$ touch -d "Fri Jun 25 20:50:14 IST 1999" filename
Finding symbolic links and their targets
- Create symlink with:
$ ln -s target symbolic_link_name
- Print only symlinks:
$ ls -l | grep "^l"
- Find symlinks:
$ find . -type l -print
- Read the target of a symlink:
$ readlink web /var/www
Enumerating file type statistics
- To print the type of a file:
$ file filename # For instance: $ file /etc/passwd /etc/passwd: ASCII text # To exclude the filename: $ file -b /etc/passwd ASCII text
- Script for file statistics:
#!/bin/bash # Filename: filestat.sh if [ $# -ne 1 ]; then echo "Usage is $0 basepath"; exit fi path=$1 declare -A statarray; while read line; do ftype=`file -b "$line" | cut -d, -f1` let statarray["$ftype"]++; done < <(find $path -type f -print) echo ============ File types and counts ============= for ftype in "${!statarray[@]}"; do echo $ftype : ${statarray["$ftype"]} done
- Usage of the previous script:
$ ./filetype.sh /home/slynux/programs ============ File types and counts ============= Vim swap file : 1 ELF 32-bit LSB executable : 6 ASCII text : 2 ASCII C program text : 10
- in Bash 3.x we could have used:
done <<< "`find $path -type f -print`"
Using loopback files
- Create a 1 GB file:
$ dd if=/dev/zero of=loobackfile.img bs=1G count=1 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 37.3155 s, 28.8 MB/s
- Then format the file:
$ mkfs.ext4 loopbackfile.img # Check the file type: $ file loobackfile.img loobackfile.img: Linux rev 1.0 ext4 filesystem data, UUID=c9d56c42-f8e6-4cbd-aeab-369d5056660a (extents) (large files) (huge files)
- Mount the file:
# mkdir /mnt/loopback # mount -o loop loopbackfile.img /mnt/loopback # Or manually: # losetup /dev/loop1 loopbackfile.img # mount /dev/loop1 /mnt/loopback # To unmount: # umount mount_point
- We can create partitions inside loopback images:
# losetup /dev/loop1 loopback.img # fdisk /dev/loop1
- Quicker way to mount llopback disk images with partitions using kpartx:
# kpartx -v -a diskimage.img add map loop0p1 (252:0): 0 114688 linear /dev/loop0 8192 add map loop0p2 (252:1): 0 15628288 linear /dev/loop0 122880 # Then # mount /dev/mapper/loop0p1 /mnt/disk1 # Then remove the mapping: # kpartx -d diskimage.img loop deleted : /dev/loop0
- Mounting ISO files as loopback:
# mkdir /mnt/iso # mount -o loop linux.iso /mnt/iso
- Flush changes immediately:
$ sync
Creating ISO files and hybrid ISO
- To create ISO from cdrom we use:
# cat /dev/cdrom > image.iso
- But preferred method is:
# dd if=/dev/cdrom of=image.iso
- We can create an iso from a folder with:
$ mkisofs -V "Label" -o image.iso source_dir/
- Convert standard ISO to hybrid ISO:
# isohybrid image.iso # Then to write the iso on an USB device: # dd if=image.iso of=/dev/sdb1 # or: # cat image.iso >> /dev/sdb1 * Burning ISO from command line:<code># cdrecord -v dev=/dev/cdrom image.iso # extra options: -speed SPEED # speed in number of x (eg. 8x or 16x for instance) -multi # for multi session. # eject cd rom: $ eject $ eject -t # to close the tray
Finding the difference between files, patching
- Say we have the file contents:
# File 1: version1.txt this is the original text line2 line3 line4 happy hacking ! # File 2: version2.txt this is the original text line2 line4 happy hacking ! GNU is not UNIX
- Nonunified diff will be:
$ diff version1.txt version2.txt 3d2 <line3 6c5 > GNU is not UNIX
- Unified diff output will be:
$ diff -u version1.txt version2.txt --- version1.txt 2010-06-27 10:26:54.384884455 +0530 +++ version2.txt 2010-06-27 10:27:28.782140889 +0530 @@ -1,5 +1,5 @@ this is the original text line2 -line3 line4 happy hacking ! - +GNU is not UNIX
- Generate a patch file by redirecting the output of diff to a file:
$ diff -u version1.txt version2.txt > version.patch
- Apply a patch with patch command:
$ patch -p1 version1.txt < version.patch patching file version1.txt
- To revert the patch:
$ patch -p1 version1.txt < version.patch patching file version1.txt Reversed (or previously applied) patch detected! Assume -R? [n] y #Changes are reverted. # We could use the -R flag directly here to avoid the message.
- Generate difference between directories:
$ diff -Naur directory1 directory2 # -N: treating absend files as empty # -a : consider all files as text files # -u produce unified output # -r recursively traverse directories
Using head and tail for printing the last or first 10 lines
- Usage of head:
# print the first 10 lines: $ head file # read data from stdin: $ cat text | head # specify the number of lines to display: $ head -b 4 file # print all lines except last M lines: $ head -n -M file
- Usage of tail:
# print the last 10 lines: $ tail file # read from stdin: $ cat text | tail # print the last 5 lines: $ tail -n 5 file # print all lines except first M lines: $ tail -n -(M+1) # Monitor the change in the file content with: $ tail -f growing_file # terminate tail when a process dies: $ PID=$(pidof Foo) $ tail -f file --pid $PID
Listing only directories – alternative methods
- techniques to list directories:
$ ls -d */ $ ls -F | grep "/$" $ ls -l | grep "^d" $ find . -type d -maxdepth 1 -print
Fast command-line navigation using pushd and popd
- We use pushd and popd as:
pushd /var/www # then the stack contains [/var/www | ~ ] # and current path is /var/www # push additional paths: pushd /usr/src # list all the paths in the stack: $ dirs /usr/src /var/www ~ /usr/share /etc # switching to a given path in the list (indexes start from 0): $ pushd +3 # To remove the last pushed and change to the previous directory: $ popd # remove a specific path from the stack: popd +num # indexes from 0.
- When we deal only with 2 paths, then we can use instead:
$ cd /var/www $ cd /usr/src $ cd - # switch to alternate path.
Counting the number of lines, words, and characters in a file
- Usage of wc utility:
# Count number of lines: $ wc -l file # use stdin: $ cat file | wc -l # count num words: $ wc -w file $ cat file | wc -w # count num characters: $ wc -c file $ cat file | wc -c # print num lines words and characters: $ wc file 1435 15763 112200 # print the length of longest line: $ wc file -L 205
Printing the directory tree
- with the tree command:
$ tree some_folder # We can also provide a pattern: $ tree path -P PATTERN # Pattern should be wildcard $ tree path -I PATTERN # only files excluding the match pattern $ tree -h # print the size along the files and directories.
- HTML output with tree:
tree PATH -H http://localhost -o out.html