[Ham-Computers] RE: USB memory stick file space

Hsu, Aaron (NBC Universal) aaron.hsu at nbcuni.com
Fri Jul 20 14:55:33 EDT 2007


Phil (et al),

It's not the USB stick, it's the file system format.  Most file systems are based on the "classic" 512-byte sector size.  Back in the old days, tracking the shear number of 512-byte sectors would require quite a bit of overhead, so "clustering" was developed.  With clustering, the OS only needs to keep track of "clusters" of sectors rather than individual sectors.

The cluster size depends on the OS and how large the drive is, but is usually a multiple of sectors.  From this point forward, I'll use MS-DOS's File Allocation Table (FAT) structure as it's relatively easy to follow (specifically FAT16).

FAT16 (using a 16-bit table) can track just under 64K "allocation units" (aka clusters).  So, if each "cluster" was the size of one sector, then the cluster size would be 512-bytes and the max partition size would be 64K x 512 = 32MB.  MS-DOS allowed cluster sizes up to 32KB (64 sectors per cluster), so the max partition size is 64K x 32K = 2GB.

Here's where the problem comes in...realize that the smallest unit that FAT16 tracks is a cluster.  If the cluster size is 32KB and a file is only 1K in size, the file will still take up 32KB of drive space - the rest is called "slack".  With this same example, if you have 100 1-byte files, there's only 100-bytes of data, but it's taking up 3.2MB of drive space most of which is slack!

For reference, FAT32 uses a 28-bit allocation table for a max of 2^28 (256M) clusters.  This leads to an official drive limit of 8TB using "standard" 32KB clusters or 16TB using non-standard 64KB clusters.  And, although NTFS supports 2^64 clusters, WinXP uses a max of 2^32 clusters = 4G x 65K = 256TB

And, to throw a wrench into the works...512-byte sectors, by today's standards, are highly inefficient and not directly compatible with certain memory addresing schemes.  For example, in order for "fast" access, RAM needs to be addressed in blocks.  With flash drives, these "blocks" are often larger than a cluster so when the OS requests a cluster, the flash drive is actually reading a larger "block".  Intelligence in the flash drive makes this "sub-block allocation" transparent to the OS, but what if you need to access a bunch of single clusters from random points on the drive?  This is where things slow down.  Ever notice that writing a bunch of small files is slower than writing a few large ones to a flash drive?  BTW, sub-block allocation isn't a new concept...it's been used in server OS's for years (Netware, for example).

So, Paul, the amount of overhead you see may be correct depending on the file format and the size of the clusters.  NTFS typically uses 4KB clusters for most drives.  The USB key might be formatted with FAT with a larger cluster size, thus more slack space.  You can determine the cluster size of any drive by running CHKDSK.  The summary report will tell you the size of the "allocation units" (cluster size) and the total and remaining number of "allocation units".  The cluster size is automatically determined by the drive size, but you can "force" format a partition to any cluster size (which may limit the size of the partition).  Flash drives are fastest with larger cluster sizes (up to the "block" transfer size) - unfortunately, it's not possible to know what the block size is and manufacturer's generally don't release this info.  Also, larger cluster sizes could also mean more wasted space if storing a lot of small files.

Anyways, I've digressed enough.  If you want more detailed info about FAT or NTFS, do a Google search or read up about them in Wikipedia:

http://en.wikipedia.org/wiki/File_Allocation_Table
http://en.wikipedia.org/wiki/NTFS

73,

  - Aaron, NN6O


-----Original Message-----
Sent: Friday, July 20, 2007 9:18 AM
Subject: [Ham-Computers] USB memory stick file space

Hi All,

I have a question about how memory sticks use/allocate memory.  Yesterday I 
bought a Memorex USB 2GB stick and loaded a couple directories that I wanted 
to load into another computer.

One directory (spoken Bible with background music) contains 1255 mp3 files, 
total size of 840MB, SIZE ON DISK 861MB.  OK, that's reasonable. Another 
folder, containing sheet music in text format consists of 1209 text files, 
total size of 1.75MB, SIZE ON DISK 37.7MB.  That seems like an excessive 
overhead!

I 'think' I know what's happening here, something to do with "sector size", 
it looks like a hard disk in "My Computer".  But the same 1209 files on my 
"D" drive only occupies 4.9 MB, roughly 1/7th of what it does on the memory 
stick.  Is it normal for USB memory sticks to waste storage space like this?

73 de Phil,  KO6BB
http://www.ko6bb.com
DX begins at the noise floor!
Ten Meter CW Beacon KO6BB/B, 20 Watts 24/7 on 28.248MCs.
Merced, Central California,    37.3N  120.48W  CM97sh


More information about the Ham-Computers mailing list