[Ham-Computers] RE: USB memory stick file space
Philip, KO6BB
ko6bb at ko6bb.com
Sat Jul 21 01:28:04 EDT 2007
Thank you Aaron,
A nice clear explanation, and yes, my computer shows it as a FAT formatted
disk drive.
73 de Phil, KO6BB
----- Original Message -----
Phil (et al),
It's not the USB stick, it's the file system format. Most file systems are
based on the "classic" 512-byte sector size. Back in the old days, tracking
the shear number of 512-byte sectors would require quite a bit of overhead,
so "clustering" was developed. With clustering, the OS only needs to keep
track of "clusters" of sectors rather than individual sectors.
The cluster size depends on the OS and how large the drive is, but is
usually a multiple of sectors. From this point forward, I'll use MS-DOS's
File Allocation Table (FAT) structure as it's relatively easy to follow
(specifically FAT16).
FAT16 (using a 16-bit table) can track just under 64K "allocation units"
(aka clusters). So, if each "cluster" was the size of one sector, then the
cluster size would be 512-bytes and the max partition size would be 64K x
512 = 32MB. MS-DOS allowed cluster sizes up to 32KB (64 sectors per
cluster), so the max partition size is 64K x 32K = 2GB.
Here's where the problem comes in...realize that the smallest unit that
FAT16 tracks is a cluster. If the cluster size is 32KB and a file is only
1K in size, the file will still take up 32KB of drive space - the rest is
called "slack". With this same example, if you have 100 1-byte files,
there's only 100-bytes of data, but it's taking up 3.2MB of drive space most
of which is slack!
For reference, FAT32 uses a 28-bit allocation table for a max of 2^28 (256M)
clusters. This leads to an official drive limit of 8TB using "standard"
32KB clusters or 16TB using non-standard 64KB clusters. And, although NTFS
supports 2^64 clusters, WinXP uses a max of 2^32 clusters = 4G x 65K = 256TB
And, to throw a wrench into the works...512-byte sectors, by today's
standards, are highly inefficient and not directly compatible with certain
memory addresing schemes. For example, in order for "fast" access, RAM
needs to be addressed in blocks. With flash drives, these "blocks" are
often larger than a cluster so when the OS requests a cluster, the flash
drive is actually reading a larger "block". Intelligence in the flash drive
makes this "sub-block allocation" transparent to the OS, but what if you
need to access a bunch of single clusters from random points on the drive?
This is where things slow down. Ever notice that writing a bunch of small
files is slower than writing a few large ones to a flash drive? BTW,
sub-block allocation isn't a new concept...it's been used in server OS's for
years (Netware, for example).
So, Paul, the amount of overhead you see may be correct depending on the
file format and the size of the clusters. NTFS typically uses 4KB clusters
for most drives. The USB key might be formatted with FAT with a larger
cluster size, thus more slack space. You can determine the cluster size of
any drive by running CHKDSK. The summary report will tell you the size of
the "allocation units" (cluster size) and the total and remaining number of
"allocation units". The cluster size is automatically determined by the
drive size, but you can "force" format a partition to any cluster size
(which may limit the size of the partition). Flash drives are fastest with
larger cluster sizes (up to the "block" transfer size) - unfortunately, it's
not possible to know what the block size is and manufacturer's generally
don't release this info. Also, larger cluster sizes could also mean more
wasted space if storing a lot of small files.
Anyways, I've digressed enough. If you want more detailed info about FAT or
NTFS, do a Google search or read up about them in Wikipedia:
http://en.wikipedia.org/wiki/File_Allocation_Table
http://en.wikipedia.org/wiki/NTFS
73,
- Aaron, NN6O
More information about the Ham-Computers
mailing list