[TheForge] Re: searchable theforge archive revisited

Mike Spencer mspencer at tallships.ca
Wed Nov 24 21:00:59 EST 2004


> 0. e-mail addresses in theforge archives. should they be:

     b. munged 

        Human-readable but in a format that will prevent (or at least
        hinder) automated harvesting.

> 1. urls. should they be:
>
     b. left as is.

        A 404 URL is at worst annoying.  At best it may offer a clue to
        to finding the/a current location.  Since this will be an
        archive, old URLs may have historical value.

> 2. signatures

     b. left as is.

        Personal sigs left as is.  Listserv boilerplate can be deleted.

        In any case, many mailers don't adhere rigidly to the '-- '
        convention for .sig marker so it can't realistically be used
        as an EOT flag.

> general comments on the searchable theforge archive.
>
> 0. blank lines are being deleted.

   No.  A blank line is only 1 or 2 bytes.  Some writers carefully
   format their ASCII text and in some cases -- say, tables -- blank
   lines may even be essential for readability.

> 1. the various footers inserted by qth.net are being deleted.

  Yes.

> 2. lines which contain only '>' are being deleted.

   No.  Often original messages are very poorly formatted as ASCII,
   e.g. when the sender's mailer uses a variable-width font.  Good
   form for quoting may result in '>'-quoted blank lines that enhance
   readability.  Cf. "general comments...",  above.

   Trailing blank lines, i.e. those that come after all text, with
   or without '>' quoting, are content-free and won't be missed if
   elided. 

In general, an archiver should never omit or change content.  Archival
meta-data additions should be clearly flagged.  Format changes are
optional.

HTH, IMHO, IMNSHO, YMMV, IANAL etc. etc.,
- Mike

-- 
Michael Spencer                  Nova Scotia, Canada       .~. 
                                                           /V\ 
mspencer at tallships.ca                                     /( )\
http://home.tallships.ca/mspencer/                        ^^-^^

-- 




More information about the TheForge mailing list