Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Imgcif-l] proposed change in first line of imgcif files

Dear James,

   I have read your remarks and Harry's.  How about we say what we all seem 
to agree on:

   If both a magic number and one or more CIF tags specify values for the 
same or related parameter(s), those values should agree according to the 
dictionary specified relationships among the parameter(s).  Similarly, if 
two CIF tags specify values for the same or related parameters, those 
values should agree according to the dictionary specified relationships 
among the parameter(s).  In all dictionaries, those relationships should 
be clearly explained in the explanatory text of the dictionaries.  In DDLm 
dictionaries, the ability to algorithmically specify those relationships 
should be exploited where appropriate.

   This would clearly specify our common intent for clearly documented 
agreement among multiple presentations of the same information, and leave
it to specific applications to follow whatever approach seems appropriate
to the application developer in dealing with the error case of 
disagreement.

   Now what we really need to do is to agree in the CIF tags that should
agree with the imgCIF magic number.

   To get everything in the same place, here is the magic number proposal 
along with a _ws...  tag to allow the information to be referenced 
algorithmically and, finally, a variant on James' specific tags for the 
newly added style and style_version

1.  What problem is being solved?.  As the use of imgCIF has increased, 
two very distinct sets of files have appeared: the "miniCBFs" used for the 
Pilatus 6m detector and more fully populated imgCIF files, such as the 
ones produced for ADSC detectors.  While the information necessary for 
processing can be discovered from context in handling a miniCBF, it may be 
necessary to read fairly far into the file to discover that the file is 
indeed a miniCBF, complicating the design of reading software.

2.  The proposed solution.  Currently CBF files begin with a magic number 
comment line
             1         2         3         4         5
    12345678901234567890123456789012345678901234567890
    ###CBF: VERSION n.m

We propose to extend the magic number comment line with two optional 
fields to read

             1         2         3         4         5
    12345678901234567890123456789012345678901234567890
    ###CBF: VERSION n.m     style     style_version

where "style" is a unique CBF style identifier left justified as a single
word in columns 25-34 and "style_version" is a left justified integer in
columns 35-44.

Each style will be registered in a central repository along with 
information on the tags that will be carried for that style and a template 
of the tags that would be needed to fully populate the file.

3.  To faciltiate writing DDLm methods to work with this or any other 
magic number convention, a pseudo-tag _ws.prologue would allow application 
manipulation of the comments and whitespace from before a data block. 
The prefix ws would be reserved for this purpose and for similar, related 
tags.  No parser would have to work with this tag.  It is provided simply 
to have an unambigous algorithmic way to state the relationship with the 
following actual CIF tags.

4.  James Hester has proposed two new tags to be carried within an imgCIF 
file to agree with the style and style_version: 
_diffrn_detector.data_style and _diffrn_detector.data_style_version. 
Ignoring the 0-base, vs. 1-base indexing issues, just to state the 
relationship between the first comment line and these tags in pseudocode:

_diffrn_detector.data_style = trim(_ws.prologue[25:34])
_diffrn_detector.data_style_version = trim(_ws.prologue[35:44])

I would suggest, however, that these two tags do not quite fit in the 
diffrn_detector category, inasmuch as they do not really describe the 
detector.  They actually describe the format of the data block being used 
to present the detector information.  Therefore I suggest that we start a 
new category:  data_block_format and define

_data_block_format.data_style = trim(_ws.prologue[25:34])
_data_block_format.data_style_version = trim(_ws.prologue[35:44])


Regards,
   Herbert

P.S.  I think we should explore formally creating a standard following
the ISO processes, working under the IUCr, but seeing if we can
eventually get ISO to accept what we do.

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Thu, 2 Oct 2008, James Hester wrote:

> Dear Herbert:
>
> Our bone of contention in a nutshell is that you want the header to take
> precedence, and I want the tags to take precedence.   I think that if the
> contents of a comment are allowed to override values in the body of the CIF,
> we are changing a fundamental aspect of the CIF standard (i.e. that comments
> are irrelevant and that all information is contained in tag-value pairs), so
> if we do end up wanting priority for the header information, then I think we
> should run it by COMCIFS.
>
> You write:
>
> We are _not_ providing just a convenience header.  We are providing
>> critical information that major software packages will count on to
>> correctly parse certain CIF files.
>
>
> Just to be clear, by 'parse' I understand 'separate into tags and values'.
> I prefer the term 'interpret' i.e. I would say that this is critical
> information for *interpretation* of the frame data.  What is your intended
> meaning?
>
> The critical information that we are providing in the header is *already
> available* within the CIF file, but these major software packages would
> prefer not to access that, presumably because it is inefficient to do so
> and/or it would imply a more flexible internal representation of the
> experimental geometry.  The header is therefore more convenient, and the
> critical information is available in a less convenient form even without the
> header.  How is this not a convenience header??
>
> No matter what we say, that magic
>> number will override any conflicting information deeper into the file
>> for those packages.  In an ideal world, the information deeper in
>> will not conflict, but when there is a conflict, it is highly unlikely
>> that processing will be stopped if the resulting data makes any sense
>> at all to the processing software.
>
>
> But I agree with you.  If software chooses to rely on the header
> information, it can do that and ignore the non-frame-data.  It is just that
> the guarantee that header and dataframes match is given by the CIF file
> provider, not by the (img)CIF standard.  Software using a convenience header
> will not interpret the CIF tags so will not be in a position to detect a
> mismatch.
>
> We have a better chance of reducing further growth of conflicting
>> dialects by being realistic about what people will do than by
>> prescribing rules they are not likely to follow.
>
>
> Which rules are you talking about?  I am saying, use the header information
> all you want, but in the unlikely case that it does not match the file
> contents the read might fail.
>
>
>> We would be
>> doing a disservice to users if we tell them they can override a
>> magic number by changing a tag further into the CIF when it is not
>> likely to work.
>
>
> But if these major software packages state up front that they rely on the
> header, so changing certain tags will not work, what is the problem? And why
> is a user editing a raw imgCIF anyway?   And under DDLm, if a user edits a
> tag, it is possible to pick up that the 'style'/'version' values are no
> longer correct, so there is some safety net possible even in this (highly
> unusual) use case.
>
>>
>>
>>  Once we agree on the relevant tags, I will add the necessary checks
>> to vcif2, so that people who care can check for such conflicts, and
>> give cif2cbf the ability to regenerate files with the conflicts
>> resolved, but I think we need to accept the reality that we have
>> no real control over what people will do in the field, especially if
>> what we specify conflicts with commonly accepted practice.  A standard
>> will only be accepted by this community if it grows organically from
>> within the community, and we have had a strong, consistent and reasonable
>> request for clear specification of the necessary parse in the first
>> line of imgCIF files.
>
>
> I'm *not* saying that we shouldn't have a statement of how the frame data
> should be interpreted right there in the first line.  I'm *not* saying that
> a program which uses this information must then read the relevant tags as
> well to check for conflicts.  I *am* saying that if, somehow, the datablock
> tags and the header mismatch, a program which relies on the header might
> fail if the header is in error.  Of course, if instead the tags are in
> error, then it will not fail.  Where is the issue?  Harry, (if you are still
> reading along), is this an acceptable position from your point of view?
>
>>
>>
>>  I think it would be fine to move CIF in the direction of really being
>> a standard.  The ISO rules are given at
>>
>>
>> http://www.iso.org/iso/standards_development/processes_and_procedures/how_are_standards_developed.htm
>>
>> ANSI has similar guidelines.  The standards process in time-consuming and
>> involves a great deal of consensus building.  I think it would be worth
>> trying to do.
>
>
> I do appreciate the need for consensus, but at the moment it seems that you
> and I are the only ones searching for consensus. I am also thinking that the
> time has come to instigate some sort of standards process.  Are you thinking
> of CIF becoming an actual ISO or ANSI standard, or rather of  implementing
> similar processes within the auspices of the IUCr?
>
> Best wishes,
> James.
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://scripts.iucr.org/mailman/listinfo/imgcif-l

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.