[Ham-Computers] RE: USB memory stick file space

Philip, KO6BB ko6bb at ko6bb.com
Sat Jul 21 01:28:04 EDT 2007


Thank you Aaron,

A nice clear explanation, and yes, my computer shows it as a FAT formatted 
disk drive.

73 de Phil, KO6BB

----- Original Message ----- 

Phil (et al),

It's not the USB stick, it's the file system format.  Most file systems are 
based on the "classic" 512-byte sector size.  Back in the old days, tracking 
the shear number of 512-byte sectors would require quite a bit of overhead, 
so "clustering" was developed.  With clustering, the OS only needs to keep 
track of "clusters" of sectors rather than individual sectors.

The cluster size depends on the OS and how large the drive is, but is 
usually a multiple of sectors.  From this point forward, I'll use MS-DOS's 
File Allocation Table (FAT) structure as it's relatively easy to follow 
(specifically FAT16).

FAT16 (using a 16-bit table) can track just under 64K "allocation units" 
(aka clusters).  So, if each "cluster" was the size of one sector, then the 
cluster size would be 512-bytes and the max partition size would be 64K x 
512 = 32MB.  MS-DOS allowed cluster sizes up to 32KB (64 sectors per 
cluster), so the max partition size is 64K x 32K = 2GB.

Here's where the problem comes in...realize that the smallest unit that 
FAT16 tracks is a cluster.  If the cluster size is 32KB and a file is only 
1K in size, the file will still take up 32KB of drive space - the rest is 
called "slack".  With this same example, if you have 100 1-byte files, 
there's only 100-bytes of data, but it's taking up 3.2MB of drive space most 
of which is slack!

For reference, FAT32 uses a 28-bit allocation table for a max of 2^28 (256M) 
clusters.  This leads to an official drive limit of 8TB using "standard" 
32KB clusters or 16TB using non-standard 64KB clusters.  And, although NTFS 
supports 2^64 clusters, WinXP uses a max of 2^32 clusters = 4G x 65K = 256TB

And, to throw a wrench into the works...512-byte sectors, by today's 
standards, are highly inefficient and not directly compatible with certain 
memory addresing schemes.  For example, in order for "fast" access, RAM 
needs to be addressed in blocks.  With flash drives, these "blocks" are 
often larger than a cluster so when the OS requests a cluster, the flash 
drive is actually reading a larger "block".  Intelligence in the flash drive 
makes this "sub-block allocation" transparent to the OS, but what if you 
need to access a bunch of single clusters from random points on the drive? 
This is where things slow down.  Ever notice that writing a bunch of small 
files is slower than writing a few large ones to a flash drive?  BTW, 
sub-block allocation isn't a new concept...it's been used in server OS's for 
years (Netware, for example).

So, Paul, the amount of overhead you see may be correct depending on the 
file format and the size of the clusters.  NTFS typically uses 4KB clusters 
for most drives.  The USB key might be formatted with FAT with a larger 
cluster size, thus more slack space.  You can determine the cluster size of 
any drive by running CHKDSK.  The summary report will tell you the size of 
the "allocation units" (cluster size) and the total and remaining number of 
"allocation units".  The cluster size is automatically determined by the 
drive size, but you can "force" format a partition to any cluster size 
(which may limit the size of the partition).  Flash drives are fastest with 
larger cluster sizes (up to the "block" transfer size) - unfortunately, it's 
not possible to know what the block size is and manufacturer's generally 
don't release this info.  Also, larger cluster sizes could also mean more 
wasted space if storing a lot of small files.

Anyways, I've digressed enough.  If you want more detailed info about FAT or 
NTFS, do a Google search or read up about them in Wikipedia:

http://en.wikipedia.org/wiki/File_Allocation_Table
http://en.wikipedia.org/wiki/NTFS

73,

  - Aaron, NN6O




More information about the Ham-Computers mailing list