[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] CIF2 semantics
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] CIF2 semantics
- From: Brian McMahon <bm@iucr.org>
- Date: Tue, 26 Jul 2011 15:25:04 +0100
- In-Reply-To: <CAM+dB2f7ONotZTG-wHZBzj82ao7PzPpYv4JxZ5q8uB1BmTDO3w@mail.gmail.com>
- References: <CAM+dB2eL5jrEFBcmGpDe6RTvpv4qfmxXa722XXzaS_zgCjsxKw@mail.gmail.com><a06240802ca545e68933f@192.168.2.101><CAM+dB2f7ONotZTG-wHZBzj82ao7PzPpYv4JxZ5q8uB1BmTDO3w@mail.gmail.com>
> Do we all agree with the following assertion regarding full point > and question mark? > (1) A full point/question mark inside string delimiters is *not* > equivalent to an undelimited full point/question mark I agree with this assertion. The ITG description "The special values of '.' and '?' represent data that are inapplicable and unknown, respectively." is at pains to stress that there is a semantic distinction, albeit a subtle one, between the two cases. As James notes, there is often a "relatively harmless" confusion between the two, though it can be significant. CIFs generated from the CCDC typically have ? for the symmetry operators in geometry loops. I think this is correct: the database has not recorded the symmetry of these positions (probably because they were not supplied in the original publication from which the information was abstracted), and while you'll probably get away with guessing that's because they are 1_555, it's not guaranteed. However, I do have some concern over the way that the unquoted literal full point is also advanced as an enumeration value with a specific (implied) value in a number of coreCIF definitions (for example, the symmetry operation associated with an atom site). Volume G explains it thus: The substitution of the full-point character `.' in place of a CIF data value serves two similar, but not identical, purposes. If it is used in looped lists of data it is normally a signal that a value in a particular packet (i.e. a value in the row of the table) is `inapplicable' or `inappropriate'. In some CIF applications involving access to a data dictionary it is used to signal that the default value of the item is defined in its definition in the dictionary. Consequently, the interpretation of this signal is an application-specific matter and its use must be determined according to the application. For example, in a CIF submitted for publication in Acta Crystallographica the presence of a `.' value for the item _geom_bond_site_symmetry_1 is predetermined as the default value 1_555 (as per the dictionary definition). Note that, in this instance, it is also equivalent to `no additional symmetry' or `inapplicable'. That phrase "no ADDITIONAL symmetry" feels somewhat forced. Given the number of existing CIF1 data files, I propose that we live with this, but I would be interested if anyone could come up with a cleaner or clearer rationalisation. Note how the extracts from Volume G unhelpfully place the characters under discussion within quote marks, though the eagle-eyed will notice that in the printed volume the quote marks are in Times font! ---- >>> Numbers: I believe that strings that could be interpreted as numbers >>> are nevertheless (in a formal sense) just strings in the context of >>> the post-parse abstract data model. Therefore, whether or not a >>> numerical string is delimited does not change its value: 4.5 and >>> "4.5" are identical values. I guess the distinction is that in isolation you don't know whether 4.5 means the quantity halfway between four and five, or the software revision preceding 4.5.1 (or even 4.5.2beta). The assumption behind seeking to differentiate these cases with syntactic quoting is that you're not relying on type declarations in a dictionary to tell an application how to treat this - as an unalterable string or as a quantity that can be subjected to arithmetic manipulations. I do wonder how maintaining the distinction actually does help non-dictionary-based software. I can see that fixed-format FORTRAN i/o benefits from knowing that columns 27-32 represent a floating-point number, but I suppose that even FORTRAN CIF parsers must account for the free-format nature of the CIF by isolating the string value and subsequently determining how to convert it to a number. If the decisions on how to do so are based only on hand-coding according to known tags, then I see no reason why one cannot add a "delimiter-stripper" function to the necessary routines. I'm genuinely curious here. I don't have a strong a priori prejudice against or in favour of maintaining the formal distinction between non-quoted and quoted numbers. Regards Brian On Tue, Jul 26, 2011 at 11:24:15PM +1000, James Hester wrote: > I take it from the comment below that Herbert agrees to continue with the IT > Vol G descriptions of the meanings of . and ?. I am aware that one often > finds a relatively harmless confusion between the two, most obviously when ? > is used as a placeholder in a loop instead of the usually more appropriate > <full point>. This confusion should encourage us to provide clarification > in the formal specification. > > Regarding numbers, could Herbert or others who wish 4.5 and "4.5" to have > different abstract types , whereas kkkkk and "kkkkk" have the same abstract > type, please explain why this behaviour is preferable, how it allows useful > work to be done etc. Meanwhile I'll prepare a post describing my reasoning > for more uniform behaviour. > > On Tue, Jul 26, 2011 at 10:13 PM, Herbert J. Bernstein < > yaya@bernstein-plus-sons.com> wrote: > >> On null values, I believe "." and "?" are different in meaning from >> their unquoted versions, but that unquoted . and ? are both essentially >> equivalent null values. >> >> On numbers, past practice has been to treat 4.5 and "4.5" as very >> different, the former being a type numb value and the latter being >> a type char value. This was an important and significant early >> difference between CIF and STAR and has been used in the handling of >> the number-like strings that arise in PDB bib entries, e.g. >> 1234-5678 is the number 1234e-5678, but "1234-5678" is a string >> >> >> At 1:24 PM +1000 7/26/11, James Hester wrote: >>> Dear DDLm group, >>> >>> In order to minimise the number of issues we have to discuss in >>> Madrid to clean up CIF2, I would like to turn discussion to those >>> semantic issues which are relevant to the syntax. I believe that >>> there are three possible types of datavalue: "inapplicable", >>> "unknown" and "string", represented by <full point> (commonly called >>> a "full stop" or "period"), <question mark> and everything else, >>> respectively. >>> >>> Do we all agree with the following assertion regarding full point >>> and question mark? >>> (1) A full point/question mark inside string delimiters is *not* >>> equivalent to an undelimited full point/question mark >>> >>> Numbers: I believe that strings that could be interpreted as numbers >>> are nevertheless (in a formal sense) just strings in the context of >>> the post-parse abstract data model. Therefore, whether or not a >>> numerical string is delimited does not change its value: 4.5 and >>> "4.5" are identical values. >>> >>> Note that this latter assertion does *not* require that >>> CIF-conformant software must always handle numbers as strings; I am >>> making these statements in order to clarify the abstract data model >>> on which the various DDLs and domain dictionaries operate, not to >>> dictate software design. If your software can manage any potential >>> need to swap between string and number representation of your data >>> value, then more power to you. >>> >>> Please state whether you agree or disagree with the above. >>> >>> James. >>> -- >>> T +61 (02) 9717 9907 >>> F +61 (02) 9717 3145 >>> M +61 (04) 0249 4148 >>> >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> >> >> -- >> ===================================================== >> Herbert J. Bernstein, Professor of Computer Science >> Dowling College, Kramer Science Center, KSC 121 >> Idle Hour Blvd, Oakdale, NY, 11769 >> >> +1-631-244-3035 >> yaya@dowling.edu >> ===================================================== _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (James Hester)
- Prev by Date: Re: [ddlm-group] CIF2 semantics
- Next by Date: Re: [ddlm-group] CIF2 semantics
- Prev by thread: Re: [ddlm-group] CIF2 semantics
- Next by thread: Re: [ddlm-group] CIF2 semantics
- Index(es):