===== 3. File In, File Out ===== ==== Generating files of any size ==== * **loopback** files: contain a filesystem itself and can be mounted. * Create a file with dd:$ dd if=/dev/zero of=junk.data bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00767266 s, 137 MB/s ==== The intersection and set difference (A-B) on text files ==== * comm utility operations: * Intersection: The intersection operation will print the lines that the specified files have in common with one another. * Difference: The difference operation will print the lines that the specified files contain and that are not the same in all of those files * Set difference: The set difference operation will print the lines in file "A" that do not match those in all of the set of files specified ("B" plus "C" for example) * Comparaison:$ cat A.txt apple orange gold silver steel iron $ cat B.txt orange gold cookies carrot $ sort A.txt -o A.txt ; sort B.txt -o B.txt $ comm A.txt B.txt apple carrot cookies gold iron orange silver steel * To print intersection we remove columns 1 and 2:$ comm A.txt B.txt -1 -2 gold orange * To print the lines that are uncommon in the two files:$ comm A.txt B.txt -3 apple carrot cookies iron silver steel # To produce a single column: $ comm A.txt B.txt -3 | sed 's/^\t//' apple carrot cookies iron silver steel # Difference for A.txt: $ comm A.txt B.txt -2 -3 # Difference for B.txt: $ comm A.txt B.txt -1 -3 ==== Finding and deleting duplicate files ==== * Script to remove duplicate files by content:#!/bin/bash #Filename: remove_duplicates.sh #Description: Find and remove duplicate files and keep one sample of each file. ls -lS --time-style=long-iso | awk 'BEGIN { getline; getline; name1=$8; size=$5 } { name2=$8; if (size==$5) { "md5sum "name1 | getline; csum1=$1; "md5sum "name2 | getline; csum2=$1; if ( csum1==csum2 ) { print name1; print name2 } }; size=$5; name1=name2; }' | sort -u > duplicate_files cat duplicate_files | xargs -I {} md5sum {} | sort | uniq -w 32 | awk '{ print "^"$2"$" }' | sort -u > duplicate_sample echo Removing... comm duplicate_files duplicate_sample -2 -3 | tee /dev/stderr | xargs rm echo Removed duplicates files successfully. ==== Working with file permissions, ownership, and the sticky bit ==== * We have the setuid bit for the owner: -rwS------ * We have the setgid bit for the group: ----rwS--- * We have the sticky bit for directories: -------rwT * Set sticky bit: $ chmod a+t directory_name ==== Making files immutable ==== * Make a file immutable with: # chattr +i file * Remove immutable state with: # chattr -i file ==== Generating blank files in bulk ==== * Create blank file with:$ touch filename * Generate files in bulk:for name in {1..100}.txt do touch $name done # We can use different patterns such as: test{1..200}.c testing{a..z}.txt * We can modify only access time or modification time with touch:touch -a filename touch -m filename * We can specify the time to use:$ touch -d "Fri Jun 25 20:50:14 IST 1999" filename ==== Finding symbolic links and their targets ==== * Create symlink with: $ ln -s target symbolic_link_name * Print only symlinks: $ ls -l | grep "^l" * Find symlinks: $ find . -type l -print * Read the target of a symlink: $ readlink web /var/www ==== Enumerating file type statistics ==== * To print the type of a file:$ file filename # For instance: $ file /etc/passwd /etc/passwd: ASCII text # To exclude the filename: $ file -b /etc/passwd ASCII text * Script for file statistics:#!/bin/bash # Filename: filestat.sh if [ $# -ne 1 ]; then echo "Usage is $0 basepath"; exit fi path=$1 declare -A statarray; while read line; do ftype=`file -b "$line" | cut -d, -f1` let statarray["$ftype"]++; done < <(find $path -type f -print) echo ============ File types and counts ============= for ftype in "${!statarray[@]}"; do echo $ftype : ${statarray["$ftype"]} done * Usage of the previous script:$ ./filetype.sh /home/slynux/programs ============ File types and counts ============= Vim swap file : 1 ELF 32-bit LSB executable : 6 ASCII text : 2 ASCII C program text : 10 * in Bash 3.x we could have used: done <<< "`find $path -type f -print`" ==== Using loopback files ==== * Create a 1 GB file: $ dd if=/dev/zero of=loobackfile.img bs=1G count=1 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 37.3155 s, 28.8 MB/s * Then format the file:$ mkfs.ext4 loopbackfile.img # Check the file type: $ file loobackfile.img loobackfile.img: Linux rev 1.0 ext4 filesystem data, UUID=c9d56c42-f8e6-4cbd-aeab-369d5056660a (extents) (large files) (huge files) * Mount the file:# mkdir /mnt/loopback # mount -o loop loopbackfile.img /mnt/loopback # Or manually: # losetup /dev/loop1 loopbackfile.img # mount /dev/loop1 /mnt/loopback # To unmount: # umount mount_point * We can create partitions inside loopback images:# losetup /dev/loop1 loopback.img # fdisk /dev/loop1 * Quicker way to mount llopback disk images with partitions using kpartx: # kpartx -v -a diskimage.img add map loop0p1 (252:0): 0 114688 linear /dev/loop0 8192 add map loop0p2 (252:1): 0 15628288 linear /dev/loop0 122880 # Then # mount /dev/mapper/loop0p1 /mnt/disk1 # Then remove the mapping: # kpartx -d diskimage.img loop deleted : /dev/loop0 * Mounting ISO files as loopback:# mkdir /mnt/iso # mount -o loop linux.iso /mnt/iso * Flush changes immediately: $ sync ==== Creating ISO files and hybrid ISO ==== * To create ISO from cdrom we use: # cat /dev/cdrom > image.iso * But preferred method is:# dd if=/dev/cdrom of=image.iso * We can create an iso from a folder with:$ mkisofs -V "Label" -o image.iso source_dir/ * Convert standard ISO to hybrid ISO:# isohybrid image.iso # Then to write the iso on an USB device: # dd if=image.iso of=/dev/sdb1 # or: # cat image.iso >> /dev/sdb1 * Burning ISO from command line:# cdrecord -v dev=/dev/cdrom image.iso # extra options: -speed SPEED # speed in number of x (eg. 8x or 16x for instance) -multi # for multi session. # eject cd rom: $ eject $ eject -t # to close the tray ==== Finding the difference between files, patching ==== * Say we have the file contents: # File 1: version1.txt this is the original text line2 line3 line4 happy hacking ! # File 2: version2.txt this is the original text line2 line4 happy hacking ! GNU is not UNIX * Nonunified diff will be:$ diff version1.txt version2.txt 3d2 GNU is not UNIX * Unified diff output will be:$ diff -u version1.txt version2.txt --- version1.txt 2010-06-27 10:26:54.384884455 +0530 +++ version2.txt 2010-06-27 10:27:28.782140889 +0530 @@ -1,5 +1,5 @@ this is the original text line2 -line3 line4 happy hacking ! - +GNU is not UNIX * Generate a patch file by redirecting the output of diff to a file:$ diff -u version1.txt version2.txt > version.patch * Apply a patch with patch command: $ patch -p1 version1.txt < version.patch patching file version1.txt * To revert the patch: $ patch -p1 version1.txt < version.patch patching file version1.txt Reversed (or previously applied) patch detected! Assume -R? [n] y #Changes are reverted. # We could use the -R flag directly here to avoid the message. * Generate difference between directories:$ diff -Naur directory1 directory2 # -N: treating absend files as empty # -a : consider all files as text files # -u produce unified output # -r recursively traverse directories ==== Using head and tail for printing the last or first 10 lines ==== * Usage of head: # print the first 10 lines: $ head file # read data from stdin: $ cat text | head # specify the number of lines to display: $ head -b 4 file # print all lines except last M lines: $ head -n -M file * Usage of tail:# print the last 10 lines: $ tail file # read from stdin: $ cat text | tail # print the last 5 lines: $ tail -n 5 file # print all lines except first M lines: $ tail -n -(M+1) # Monitor the change in the file content with: $ tail -f growing_file # terminate tail when a process dies: $ PID=$(pidof Foo) $ tail -f file --pid $PID ==== Listing only directories – alternative methods ==== * techniques to list directories:$ ls -d */ $ ls -F | grep "/$" $ ls -l | grep "^d" $ find . -type d -maxdepth 1 -print ==== Fast command-line navigation using pushd and popd ==== * We use pushd and popd as:pushd /var/www # then the stack contains [/var/www | ~ ] # and current path is /var/www # push additional paths: pushd /usr/src # list all the paths in the stack: $ dirs /usr/src /var/www ~ /usr/share /etc # switching to a given path in the list (indexes start from 0): $ pushd +3 # To remove the last pushed and change to the previous directory: $ popd # remove a specific path from the stack: popd +num # indexes from 0. * When we deal only with 2 paths, then we can use instead: $ cd /var/www $ cd /usr/src $ cd - # switch to alternate path. ==== Counting the number of lines, words, and characters in a file ==== * Usage of wc utility: # Count number of lines: $ wc -l file # use stdin: $ cat file | wc -l # count num words: $ wc -w file $ cat file | wc -w # count num characters: $ wc -c file $ cat file | wc -c # print num lines words and characters: $ wc file 1435 15763 112200 # print the length of longest line: $ wc file -L 205 ==== Printing the directory tree ==== * with the tree command: $ tree some_folder # We can also provide a pattern: $ tree path -P PATTERN # Pattern should be wildcard $ tree path -I PATTERN # only files excluding the match pattern $ tree -h # print the size along the files and directories. * HTML output with tree: tree PATH -H http://localhost -o out.html