Posts for the month of September 2007

FAT32 perl utilities

As noted before, my work laptop dual boots into WinXP and Fedora Core 7. They share a large FAT32 partition. Yesterday I finally got a 500GB external drive at work to back up my stuff. It's also FAT32. So I whipped up this quick script that splits a large data stream (using redirection or cat would make files work) and dumps it in 1GB slices. The second has some modifications to instead fill up the hard drive with zeroes, which is needed to make a backup of it more compressable. On a Linux box, I normally just do dd if=/dev/zero of=delme bs=102400 || rm -rf delme but that would exceed the file size limitations of FAT32. The first iteration of the filler was simply cat /dev/zero | perl splitter.pl fill but then realized that there was a lot of actual reading going on, instead of just dumping zeros, so I changed some stuff.

In filler, I tried to pre-allocate the 2GB slice file and then fill it with zero to try to avoid even more fragmentation and FAT table manipulations. However, when I re-opened the file and then seeked to zero it would change the size back down - I didn't have time to research it further; if anyone has a good solution please let me know.

I've also run filler under Cygwin to fill another partition.

splitter.pl:

#!/usr/bin/perl -w
# This program splits incoming data into ~1GB chunks (for dumping a file
# on the fly to FAT32 partitions for example).
# Data is STDIN, and first argument is prefix of output (optional).
#
# To recombine the output, simply:
# cat FILE_* > /path/to/better/fs/OriginalFile

BEGIN {
push(@INC, "/mnt/hd/usr/lib/perl5/5.8.8/");
push(@INC, "/mnt/hd/usr/lib/perl5/5.8.8/i386-linux-thread-multi/");
}
use strict;
use Fcntl; # import sysread flags

binmode(STDIN);

use constant FULL_SIZE => (2*1024*1024*1024); # 2 GB

my $chunk_byte_count = FULL_SIZE+1; # Force an open on first output byte
my $chunk_file_count = 0; # Start at file 0
my ($read_count, $buffer);
my $blksize = 1024; # This might get overwritten later
my $prefix = $ARGV[0] || "FILE";

# The framework of this is from camel page 231

while ($read_count = sysread STDIN, $buffer, $blksize) {
  if (!defined $read_count) {
    next if $! =~ /^Interrupted/;
    die "System read error: $!\n";
  }
  # Decide if we need another file
  if ($chunk_byte_count >= FULL_SIZE) { # Need a new file
    close OUTFILE if $chunk_file_count;
    sysopen OUTFILE, (sprintf "${prefix}_%02d", $chunk_file_count++),
      O_WRONLY | O_TRUNC | O_CREAT | O_BINARY or die "Could not open output file for write!\n";
    $blksize = (stat OUTFILE)[11] || 16384; # Get preferred block size
    # print STDERR "(New output file from $0 (blksize $blksize))\n";
    $chunk_byte_count = 0;
  } # New file
  my $wr_ptr = 0; # Pointer within buffer
  while ($read_count) { # This handles partial writes
    my $written = syswrite OUTFILE, $buffer, $read_count, $wr_ptr;
    die "System write error: $!\n" unless defined $written;
    $read_count -= $written;
    $wr_ptr += $written;
  } # Writing a chunk
  $chunk_byte_count += $wr_ptr;
  #print "(\$wr_ptr = $wr_ptr), (\$chunk_byte_count = $chunk_byte_count), (\$chunk_file_count = $chunk_file_count)\n";
} # Main read loop

# Report on it
print "Wrote out $chunk_file_count chunk files.\n";

filler.pl:

#!/usr/bin/perl -w
# This program fills a hard drive with 2GB files all NULL.
# (This makes compressed images of the hard drive smaller.)
# First argument is prefix of output (optional).
#

BEGIN {
  push(@INC, "/mnt/hd/usr/lib/perl5/5.8.8/");
  push(@INC, "/mnt/hd/usr/lib/perl5/5.8.8/i386-linux-thread-multi/");
}

use strict;
use Fcntl qw(:DEFAULT :seek); # import sysread flags

use constant FULL_SIZE => 2*(1024*1024*1024); # 2 GB

my $chunk_byte_count = FULL_SIZE+1; # Force an open on first output byte
my $chunk_file_count = 0; # Start at file 0
my ($read_count, $buffer);
my $blksize = 16384; # This might get overwritten later
my $prefix = $ARGV[0] || "FILL";
my $last_show = -1;
$| = 1; # always flush

# The framework of this is from camel page 231
$buffer = "\0" x $blksize;

# Without pre-alloc:
#real    1m20.860s
#user    0m10.155s
#sys     0m32.531s

# With pre-alloc:
#real    8m56.391s
#user    0m16.359s
#sys     1m11.921s

# Which makes NO sense, but hey, that's Cygwin... maybe because FAT32?

# Note: It was O_RDWR but switching to O_WRONLY didn't seem to help.
# However, maybe if Norton is disabled?

while (1) {
  # Decide if we need another file
  if ($chunk_byte_count >= FULL_SIZE) { # Need a new file
    close OUTFILE if $chunk_file_count;
    print STDERR "\rNew fill output file ($prefix)... \n";
    sysopen OUTFILE, (sprintf "${prefix}_%02d", $chunk_file_count++),
      O_WRONLY | O_TRUNC | O_CREAT | O_BINARY | O_EXCL or die "Could not open output file for write!\n";
    # Pre-allocate the file
#    print STDERR "New fill output file ($prefix) pre-allocating, expect freeze... \n";
#    sysseek OUTFILE, FULL_SIZE-1, SEEK_SET;
#    syswrite OUTFILE, $buffer, 1, 0;
#    close OUTFILE;
#    print STDERR "\tdone, now blanking out the file.\n";
#    sysopen OUTFILE, (sprintf "${prefix}_%02d", $chunk_file_count++),
#      O_WRONLY | O_BINARY or die "Could not re-open output file for write!\n";
#    sysseek OUTFILE, 0, SEEK_SET; # This might just be ignored?
    # Done pre-allocating
    my $blk = $blksize;
    $blksize = (stat OUTFILE)[11] || 16384; # Get preferred block size
    if ($blksize != $blk) {
      # new block size, should only happen once
      $buffer = "\0"x$blksize;
    }
    $chunk_byte_count = 0;
    $last_show = -1;
  } # New file
  $read_count = $blksize;
  while ($read_count) { # This handles partial writes
    my $written = syswrite OUTFILE, $buffer, $read_count, 0;
    die "System write error: $!\n" unless defined $written;
    $read_count -= $written;
    $chunk_byte_count += $written;
  } # Writing a chunk
  # End of a chunk
  my $new_show = int ($chunk_byte_count/(1024*1024));
  if ($new_show > $last_show) {
    print STDERR "\r${new_show}MB";
    $last_show = $new_show;
  }
#  print "(\$chunk_byte_count = $chunk_byte_count), (\$chunk_file_count = $chunk_file_count)\n";
} # Main while loop

# Report on it [think it always crashes before this ;)]
print "\rWrote out $chunk_file_count chunk files.\n";