[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF header

A special initial comment is really not usual. It is a standard part of 
POSIX shell scripts, in the same way as described above, where the 
directive line does not have to be stripped off before passing it to the 
shall parser.

If you think about it, an embedded comment directive is not really a 
problem. A parser should be able to strip off comments, but the correct 
parser is already active before than happens, assuming you follow James' 
3-step procedure below. It is OK to strip off the comment directive, 
because it is meant only for use by the parser front-end, and the file 
output formatter will add it's own file-specific directive, which may 
not even be the same STAR/CIF version.

A directive embedded in an initial comment really does make sense, 
because it is irrelevant once the correct parser is selected. It might 
make sense to add a specific 2nd character, similar to the POSIX shell 
#!. For example, the STAR format could define an initial line beginning 
with #% as parsing directive rather than just a plain comment. That 
makes the abuse of a comment line as a bit less of a hack.

Joe

James Hester wrote:
> Yes, I didn't pick up on this point in your previous email. I don't
> think it is a problem.  The way I think of it is that the convention
> for file identification that we are proposing logically applies
> *before* CIF syntax is relevant, where 'before' refers to a time
> sequence, not a file position.  So in that sense we could choose
> anything we like to be the identifying characters. However, these
> characters are used to fire up the appropriate parser, which for
> simplicity runs over the whole file; if what it sees at the beginning
> of the file is simply a meaningless (from a CIF syntax point of view)
> comment, the process works smoothly.  So the notional sequence is as
> follows:
> 
> 1. Request parse of file
> 2. Look for magic number to make decision on CIF1/CIF2
> 3. Direct the file to a CIF1/CIF2 parser
> 
> It would indeed be possible to pass everything but the first line to
> the parser in step 3, but doing so adds a bit more complexity which is
> unnecessary given the alternative solution.

> On Thu, Oct 29, 2009 at 12:38 AM, David Brown <idbrown@mcmaster.ca> wrote:
>> I agree that a header is needed.  I am concerned about starting it with a #
>> which could still get lost, though I can find no good example of where it
>> would cause a problem.  However, as a matter of principle I think it bad
>> form to have a convention (# = comment) that applies everywhere except in
>> this one location.  Almost any other character would do but this might upset
>> CIF1 parsers.  I suppose the real advantage of starting with # is that it
>> would be ignored by a legacy CIF1 parser without causing a problem (though
>> the parser might have problems later).
>>
>> David
>>
>>
>> James Hester wrote:
>>
>> I believe that Joe's suggestion of mandating a CIF2.0 header comment
>> coincides with Brian's earlier suggestion that this should now be
>> mandatory.  We should also note David's comment that we must now be
>> careful about stating that comments can be discarded from files, as
>> the first line comment may be a special case.
>>
>> Regarding David's comment, I think that we can proceed by stating that
>> any program that writes a CIF must put in the mandatory CIF2.0 (or
>> whatever it turns out to be) comment in the header.  This would
>> include programs that simply strip comments and then write something
>> out.
>>
>> Are we all agreed on having a mandatory header?
>>
>> On Wed, Oct 28, 2009 at 2:19 AM, Joe Krahn <krahn@niehs.nih.gov> wrote:
>>
>> IMHO, there should be some sort of header to distinguish CIF variants,
>> sort of like the DOCTYPE line at the top of XML files. This will help
>> deal with CIF1 files that are not CIF2 compliant, and could also better
>> handle more extreme variants, like binary CIF. The current syntax
>> suggests, but does not require, an initial comment line starting with #CIF.
>>
>> I proposed to the COMCIFS list that global_ could be used for this
>> purpose. In this way, global_ would only be read by the parser, and not
>> be considered as part of the actual CIF data. The idea is to use the
>> existing STAR syntax instead of designing something new. The
>> disadvantage is that the global_ section itself would have to maintain a
>> restricted 7-bit ASCII format, and not allow any of the newer STAR/CIF
>> syntax. So, the "simplicity" of just using the existing STAR syntax
>> really is not there.
>>
>> Alternatively, the initial CIF comment line could be made a requirement
>> rather than a suggestion, and also define a way to include additional
>> file attributes in the form of param=value pairs. For example, a CIF2
>> file could add "binary=true" to indicate the presence of binary
>> sections, rather than binary-CIF having to be a completely separate format.
>>
>> If extra file attributes seem like an unnecessary complication, then
>> maybe at least the simple comment line could be made a requirement?
>> Then, you can distinguish CIF2 files, and assume that any file without a
>> comment is CIF1.
>>
>> Joe Krahn
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
> 
> 
> 

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]