[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] CIF2 semantics
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] CIF2 semantics
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Tue, 26 Jul 2011 14:41:28 -0400
- In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA543C16565B24@11.stjude.org>
- References: <CAM+dB2eL5jrEFBcmGpDe6RTvpv4qfmxXa722XXzaS_zgCjsxKw@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA543C16565B24@11.stjude.org>
To understand the problem with conflating strings and numbers, look at the following tags and values: _citation.journal_id_ISSN 0036-8075 _citation.journal_id_CSD 0038 If you have a dictionary, you know both items are strings, not numbers and you will reliably keep the leading zeros and not treat the first as 36*10^(-8075). If you don't have a dictionary and are just using, say, CIFtbx, you might treat both values as numbers. Under current rules you can protect the values from the numeric interpretation even without a dictionary by saying _citation.journal_id_ISSN "0036-8075" _citation.journal_id_CSD "0038" and all is well. Without that mechanism, you need a dictionary. At 10:23 AM -0500 7/26/11, Bollinger, John C wrote: >On Monday, July 25, 2011 10:25 PM, James Hester wrote: >>In order to minimise the number of issues we have to discuss in >>Madrid to clean up CIF2, I would like to turn discussion to those >>semantic issues which are relevant to the syntax. I believe that >>there are three possible types of datavalue: "inapplicable", >>"unknown" and "string", represented by <full point> (commonly >>called a "full stop" or "period"), <question mark> and everything >>else, respectively. >> >>Do we all agree with the following assertion regarding full point >>and question mark? >>(1) A full point/question mark inside string delimiters is *not* >>equivalent to an undelimited full point/question mark >> >>Numbers: I believe that strings that could be interpreted as >>numbers are nevertheless (in a formal sense) just strings in the >>context of the post-parse abstract data model. Therefore, whether >>or not a numerical string is delimited does not change its value: >>4.5 and "4.5" are identical values. >> >>Note that this latter assertion does *not* require that >>CIF-conformant software must always handle numbers as strings; I am >>making these statements in order to clarify the abstract data model >>on which the various DDLs and domain dictionaries operate, not to >>dictate software design. If your software can manage any potential >>need to swap between string and number representation of your data >>value, then more power to you. >> >>Please state whether you agree or disagree with the above. > > >I agree that a CIF data value comprising only a full point or >question mark character is a place-holder value where it is >whitespace-delimited, but is an ordinary string value otherwise. No >other data values are place-holders in the CIF sense. CIF 1.1 >distinguishes between the meanings of these place-holders, and that >distinction may occasionally be useful. > > >>From before the advent of CIF dictionaries, CIF 1 specified that >>data values of certain forms were of numeric type, and values of >>all other forms were of string type. Although CIF 1.1 describes >>this among the common semantic features rather than the syntax >>specifications, I am uncertain whether that should be interpreted >>as an intentional technical decision. Certainly many computer >>languages treat data typing for literal values as a syntactic >>issue, but others are very successful with a more freewheeling >>approach. > >I agree with James and Brian that it comes down to the practical >advantages of making a distinction, and from that perspective I >assert > > >1) The distinction is useful only where the appropriate data type >would otherwise be unknown, AND the data type is needed for decision >making. > >Knowledge of the appropriate data type could be dynamically derived >from a dictionary, but I suspect that most CIF software simply >encodes its data type requirements algorithmically (e.g. programs >know that _cell_length_a must be numeric). Since Herbert raises PDB >software in particular, I am curious about whether there the >practical ambiguity there: what are some of the CIF data items whose >data type that software needs but cannot determine other than from >their lexical form? What is a specific consequence that could arise >from the software choosing the wrong data type for those items? > >One of the areas that would be affected is general-purpose CIF >tools, such as pretty printers, that rely only on the content of the >CIFs presented to them. Such programs may safely reformat numbers >(e.g. switch among pure decimal form and various recognized forms of >scientific notation, convert s.u.s from rule-of-29 to rule of 19) >only if they can reliably recognize them as numbers. > > >2) The distinction may be practical where it isn't otherwise useful, >especially in the sense that it may be built in to a lot of existing >software. > >I know it's built into most CIF software I've ever written. I'm not >sure offhand how significant the impact would be of lifting the >distinction. > > >Overall, I am apprehensive about lifting the formal distinction for >CIF 1.x, but I am open to considering it for CIF 2.0. I am not yet >persuaded that it would be advantageous, but neither am I persuaded >that it would be harmful. > > >Regards, > >John >-- >John C. Bollinger, Ph.D. >Department of Structural Biology >St. Jude Children's Research Hospital > > >Email Disclaimer: www.stjude.org/emaildisclaimer > >_______________________________________________ >ddlm-group mailing list >ddlm-group@iucr.org >http://scripts.iucr.org/mailman/listinfo/ddlm-group -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (Bollinger, John C)
- Prev by Date: Re: [ddlm-group] CIF2 semantics
- Next by Date: Re: [ddlm-group] CIF2 semantics
- Prev by thread: Re: [ddlm-group] CIF2 semantics
- Next by thread: Re: [ddlm-group] CIF2 semantics
- Index(es):