[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <[email protected]>
- Subject: Re: CIF Infoset
- From: David Brown <[email protected]>
- Date: Mon, 30 Aug 2004 15:15:17 -0400
- In-Reply-To: <[email protected]>
- References: <[email protected]>
Here are a few more comments from IDB:The core dictionary defines three items which can be looped:So how do you intend to get around this namespace issue? No CIFs that I have encountered have ever declared their conformance to any dictionary. Even if they did, there is something called the dictionary stacking protocol which allows those definitions to be overridden without declaring a namespace. On top of that there is the boundless capacity for making up your own data names on the fly for which there may never be any dictionary definition at all. How can you reliably assign anything but a generic namespace to an infoset? Its all just adhoc guesswork. _audit_conform_dict_name _audit_conform_dict_version _audit_conform_dict_location # Contains the URL where the dictionary can be found As far as I know these have not been widely used - Acta Cryst. should start insisting that these be included in submitted papers. There is no need to give the dictionary version in anything as ephemeral a comment.
This would tidy things up, but the parser must be able to handle ad hoc
data names without choking.This was a serious omission in the first version of CIF (you have to remember that this was produced before we even considered writing dictionaries in STAR format). As you point out we have introduced the list reference _symmetry_equiv_posi_site_id (which incidentally has now been superceded by _space_group_symop_id taken from the symmetry_cif dictionary - a dictionary which takes a more systematic and forward-looking approach to symmetry). Again Acta Cryst. should insist on the inclusion of these id's. We had a discussion of this point while preparing the symmetry_CIF dictionary and came to the decision that these two strings were not equivalent, i.e., underscore is not white space.. For that reason P_21/c is no longer regarded as a valid space group symbol although there is a warning that some heritage CIFs may use that convention. There is an enumeration list for _space_group_name_H-M_ref which explicitly allows only 'P 21/c'. Other space group symbols are similarly definedI had a hazy recollection that "this is a string" and this_is_a_string were equally valid CIF constructs containing identical information content, used for example in space group names. Would they be formally identical in an infoset? Does the white space in all strings have to be normalised (is that the right word?)? They are not semantically the same, though they are not (scientifically) significantly different. The distinction is important.Would 1.2(2) and 1.3(2) be equivalent in an infoset? Lexically they are different, but semantically they are the same value, within error. The difficulty is not pserving the data type, but the semantics of downstream decisions. If one author writes _my_phone "123-45678" they are announcing this is not a number while if another writes _my_phone 123-45678 they are announcing it is a number. It is much worse that this. There is a definition of what constitutes a number in DDL1 but it is given only as text and that only by way of examples (which incidentally do not include 123-45678). The examples may not be intended as an exhaustive list, but no other guidance is given. DDL2 is both better and worse since, although numbers are defined in terms of regular expressions, each dictionary defines its own set of data types and there appears to be no limit on how many data types are defined. It sound to me as if all values should be treated as data strings unless a dictionary is used and the appropriate data types defined in the infoset. Then some means is needed to preserve these types (if possible) in any realization of the infoset, e.g., by writing them in XML or a different version of CIF. In any case DDL1 certainly needs to tighten up its definition of a number if typing is going to be important. Good luck! David -- Dr. I.D.Brown, Professor Emeritus, Department of Physics and Astronomy McMaster University, Hamilton Ontario, Canada |
Reply to: [list | sender only]
- References:
- Re: CIF Infoset (ddb)
- Prev by Date: Re: CIF Infoset
- Next by Date: Re: CIF Infoset
- Prev by thread: Re: CIF Infoset
- Next by thread: Re: CIF Infoset
- Index(es):

