[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Imgcif-l] proposed change in first line of imgcif files
- To: "The Crystallographic Binary File and its imgCIF application to image data" <imgcif-l@iucr.org>
- Subject: Re: [Imgcif-l] proposed change in first line of imgcif files
- From: "James Hester" <jamesrhester@gmail.com>
- Date: Wed, 24 Sep 2008 14:35:50 +1000
- In-Reply-To: <20080923224741.M51207@epsilon.pair.com>
- References: <20080826195337.H76753@epsilon.pair.com><279aad2a0809172141u3034905bq6ba660c89703b4bb@mail.gmail.com><84F0D152-F08A-485B-B9FD-AA2011B1836E@mrc-lmb.cam.ac.uk><20080918083938.I64013@epsilon.pair.com><279aad2a0809231659g560a410ds3e1bcdf3a6809ae3@mail.gmail.com><20080923224741.M51207@epsilon.pair.com>
Firstly, I need some clarification for a few statements: On Wed, Sep 24, 2008 at 1:14 PM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote: > The problem with the proposition > > "that the >> >> information derived from the 'style' and 'version' parts of the header >> would not contain anything that couldn't be derived from the CIF file >> proper" > > is one of chickens versus eggs. We well may need magic number or other > file-type identifier information to determine the parse logic for a > CIF. How is this so? I thought that the CIF 1.1 syntax specification is sufficient to parse a data file into datablocks containing text datanames and text data values. You may then (selectively, if you like) examine values that you are interested in to obtain further information, convert to numbers, create headers etc. > Once we know that, we certainly can have tags within a CIF to > confirm those parse decisions, but even then, we may not be able > to glean everything without some external dialect specifier, e.g. > when dealing with the differences between mmCIF and pdbx CIF. I thought that the differences between mmCIF and pdbx CIF occur above the syntactical level ie in the selection of datanames and data types available. Are there syntactical differences? > At first I thought of solving this problem by simply cloning all > style variant flags as the values of tags with the CIF, but what > do we then do if the magic number information and the CIF tags > disagree or one or the other is missing? We need to specify the > handling of those cases, and not simply by declaring them to > be errors. People still will want to know how to recover their > data. I don't think this is a problem at all if we adopt the philosophy that a commentless CIF is semantically equivalent to one with comments (and I thought this was the philosophy all along). Therefore, the contents of the datablocks take precedence over anything that might appear in a comment. Case (1): If the header information disagrees with the tags, the header is wrong. Case (2): If the header is missing, it can be regenerated from the relevant tags. Case (3): If the relevant data tags are missing but the header is present, this particular data file is doing an end run around the standard and we slap it on the wrists, but pragmatically it will probably survive The only case in which people might have trouble accessing their data is (3), if some idealistic program strips off all comments. But (3) should be explicitly rejected by us in any case. > So, what I am proposing is: > > 1. We make an clean, unambiguous statement of how to handle magic > numbers, comments and whitespace in CIF. I think what I have proposed > will do that job, but I someone may have a better approach As you can probably tell, my clean, unambigous statement is: (1) A commented CIF must always be semantically equivalent to that CIF with all comments removed (2) A convenience header may be supplied, but must be recoverable from information within the CIF itself. I see no need for the standard to deal with whitespace or comments anywhere apart from the header. What are the motivations to concern ourselves with comments etc. in other parts of the CIF? I appreciate that particular implementations may want to preserve comments, but does that have to be within the purvey of a general standard? Indeed, it seems to me that the standard is failing if we are putting anything worthwhile into comments (with the exception of a header for programming convenience). > 2. For information that can be carried in the same CIF in mulitple, > perhaps conflicting ways, we specify a precedence of interpretations. > As a practical matter, I think magic numbers have to take precedence > over conflicting or missing tags values within the CIF. The magic > number will have been read and interpreted well before the tag value > if encountered. This may then call for a warning, but the users > will expect a rational effort at completing the parse, and perhaps > even an automatic correction to the CIF to remove the conflict. See above - I go the reverse way, but that is based on my understanding that any CIF-formatted file can be parsed based on the standards documents, without reference to any supplementary header information. > That being said, I have no objection to encouraging the use of the > tags James has proposed, but the alignment between that content > and the magic number information needs to be explcitly stated, and > the simplest way to do that unambiguosly within a CIF, especially > in a DDLm CIF, would be by stating that relationship in term of > the values of James' new tags and the value of _ws.prologue. In DDLm > we could even include the parse algorithms for decomposing the > magic number and for creating it. This looks like a promising solution in that it keeps the specification of _ws.prologue within the imgCIF dictionary, and gives it an interpretation beyond "the first comment in the file". From the point of view of the standard, I would not specify that this *must* be the first comment in the file, simply that this may be output as the first line, simply because I still think it should be possible for a commentless CIF to be viable. Likewise, I don't think it should be automatically set to the value of the first comment in the file when reading in: the two should simply match, with a resolution for mismatches as set out above. I hope this is seen as a formality which does not have significant practical impact. > This is not quite the same as James's prescription of an equivalence > between a CIF with comments and the same CIF with those comments > removed, but I think it is a pragmatic compromise and comes closer > to that goal than we have been in the past. If my modification above is acceptable, then we are all happy. A comment-stripping CIF program will not see any comments in the input file, but will see _ws.prologue and may or may not include it as a header when outputting (but will carry through the _ws.prologue data item). A comment-aware CIF program will see the header and happily use it, perhaps checking that it matches _ws.prologue. A program which needs the header but finds a CIF without one can farm the CIF off to a little utility which gets hold of _ws.prologue (perhaps even using DDLm methods to generate it) and prepends it to the file. A technical issue: there is one header for a given file, but there are possibly multiple datablocks. What is the behaviour if multiple versions/styles are present in different datablocks? This is a corner case which presumably doesn't relate to manufacturer-produced CIFs, but we need to specify the behaviour. I would suggest that the definition of _ws.prologue state that 'The value of _ws.prologue may be output as the first line of an output CIF file. Where multiple datablocks are present in a file, a value of _ws.prologue from any one of those datablocks can be used'. James. -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ imgcif-l mailing list imgcif-l@iucr.org http://scripts.iucr.org/mailman/listinfo/imgcif-l
Reply to: [list | sender only]
- References:
- [Imgcif-l] proposed change in first line of imgcif files (Herbert J. Bernstein)
- Re: [Imgcif-l] proposed change in first line of imgcif files (James Hester)
- Re: [Imgcif-l] proposed change in first line of imgcif files (Harry Powell)
- Re: [Imgcif-l] proposed change in first line of imgcif files (Herbert J. Bernstein)
- Re: [Imgcif-l] proposed change in first line of imgcif files (James Hester)
- Prev by Date: Re: [Imgcif-l] proposed change in first line of imgcif files
- Next by Date: Re: [Imgcif-l] proposed change in first line of imgcif files
- Prev by thread: Re: [Imgcif-l] proposed change in first line of imgcif files
- Next by thread: Re: [Imgcif-l] proposed change in first line of imgcif files
- Index(es):