[PVRCNC] SS prefills file accuracy in SSB 2008

Sat Nov 22 13:40:04 EST 2008

Just in case anyone gives a rat's fuzzy hind-quarters....

Here are a few observations and notes about the prefill file some of you 
may have used for SS SSB.

After the contest, I solicited logs from AA4NC, N1LN, and KA1ARB.  My 
intention is to use their logs to build a more robust sample set for the 
2009 season.  They have all agreed and sent me their data.  (Thanks!)

I was also curious after the SSB Sweeps to see how accurate the prefills 
data may have been, so I thought I'd pass along what I have found.

Background:
The most comprehensive of the prefills files for 2008 SSB was built from 
3 sources.  Actual log data from N4AF, ARRL CW SS log data from the ARRL 
scores database, and ARRL SSB SS log data from the ARRL scores database.

The three data sources were applied to one another in the order outlined 
above, and in chronological order.

The intention for 2009 is to include the additional log data from AA4NC, 
N1LN, and KA1ARB.  This should add value by including data for casual 
contesters who may not ever actually submit logs to ARRL.

Observations:
After the contest AA4NC brought up a few questions about discrepancies 
he had found.  I expected there would be some, so once I got the logs 
from the gang I was able to slice and dice a bit.

I looked at only the discrepancies/errors for the "check" and the 
"section".  I'm sure the discrepancy in the precedence field is much 
higher - but folks change class a lot, so I don't care.

Check number errors:
Across all three sample logs, the error rate in the prefills seems to be 
about four percent.

I have found one source of error that seems easily explained.  The ARRL 
database data for some of these errors contained a '00' check number for 
some callsigns' MOST RECENT entry.  A look at the
scores data for prior years shows a uniform check value in prior years.

This seems to be a case of someone submitting a log without having 
filled in their own exchange info. (Note: As a writelog user, I know 
that this is possible with writelog.  All too easy a mistake to make!)

That accounted for just under a third of the errors.

In comparing the discrepancies in AA4NC vs KA1ARB vs N1LN, I found that 
the contestants had indeed logged the same check numbers.  I expect that 
means the prefills file was incorrect, as the three top NC ops are 
unlikely to all make the same logging error.  This covers almost all of 
the other errors.  Some of that is probably folks using different 
checks, guest ops, etc.  But some is just plain errors in the prefills. 
No known explanation for these errors(yet).

The final group are uniques in each of the three logs.  I figure the 
prefills are wrong rather than the contestants, but it could be either.

Summary on check numbers:
Expect to see about a four percent error rate in the prefills data for 
the check number.

Section Errors:
The discrepancies in the section ran just under half of the 
discrepancies in the check number.  If comes out to be about 1.5 
percent.  No research on the possible reasons for those yet.

Conclusions:
Looking at only three sample logs showed very similar discrepancy rates 
with the prefills, so I expect the following advice to be valid.

1) Log what you hear!

2) Expect four percent error on the check, about two percent error on 
the section.

3) I expected higher discrepancy rates than these, so at this point I 
don't see much reason to change my basic methodology.  For 2009, I'll 
produce the data in much the same way, sequences adjusted for the mode 
of the current contest.

I don't know what sort of accuracy anyone expected with these exchange 
files, but now you have a baseline on the error rates in the one 
produced for 2008.

73 es cu in ARRL 160m de w4kaz