[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Thu, 15 Oct 2009 15:33:14 -0400 (EDT)
- In-Reply-To: <4AD76FCD.2090304@mcmaster.ca>
- References: <C6FD6D53.120FF%nick@csse.uwa.edu.au><4AD76FCD.2090304@mcmaster.ca>
Dear Colleagues, First let me say that I very much agree with Nick's latest version. I think it is a reasonable, pragmatic compromise trying to introduce changes that are needed while providing an workable conversion path for existing CIF 1 data files. Now to David's comments: > For clarification are two ' (i.e. '') the same as one "? Some of the > illustrations seem to indicate that this is so, but this may be the result of > the fonts used which do not distinguish between '' and ". (Thisi is why I am > using Courier.) If they are not the same then '' is not a delimiter and > ''red'' would be interpreted as three items '' red '' whereas "red" would be > interpreted as the single item red without any quotes. Are both ''' and """ > legal dlimitiers? Both Nick and I believe that two sequential apostrohphe's is different from double quote character and both are different from the microsoft smart quotes. That is why the ASCII characters were cited. > As Simon points out, the tags are all undelimited strings and therefore are > restricted in their allowed character set in CIF2.0. I don't think that any > existing tags violate Nick's latest set of excluded characters. Characters > such as % and / do appear in DDL1, though they will probably be removed in > DDLm. If any illegal characters do appear they present no problem as long as > a CIF2 application recognizes that it is reading a CIF1. Where the name > appears as an alias in a DDLm dictionary it can be be made legal by being > quoted. Providing a CIF1 application can be taught to recognize _ and . as > interchangeable it should have no problem in reading CIF2 names, but it may > not recognize the names which will be different or absent in the DDL1 > dictionary. This could result in a loss of information, which may or may not > be important. It would clearly have serious problems with arrays and other > cases where the new delimiters were used. Yes, the list of excluded characters for tags will be the same as the list of excluded characters for non-delimited strings. I am sure that somebody, somwhere has a CIF with a tag that may need to be changed because it uses one of the small list of excluded characters, but I think this will be a sufficiently rare problem to make hand correction an acceptable cure. I will add checks to my software to flag such cases and provide a warning. Certainly, for conversion utilities, we will need to have access from the applications to all the relevant dictionaries, except for the simplest, cleanest cases. > I assume that CIF2.0 applies to both the dictionaries and the CIFs > themselves. Are there conditions (like global_) that only apply to > dictionaries? CIFs prepared using the CIF2.0 standards are likely in the > first instance to code matrices and vectors as separate elements. Existing > methods can combine these into arrays. Eventually I foresee that such values > will be coded directly as arrays as this is more efficient. Methods will > then be needed to decompose these arrays into their elements in case > individual elements need to be retrieved. I see no problem except that a > CIF2.0 coded in this way clearly could not be read by a CIF1 application. Certainly it would be nice to move over to bracketed formats for matrices and vectors in data files. Once you do that, it would make sense to just allow the full recursive use of bracket values in data file. In order to go back to a CIF-1, we will need to, at the very least, embed the resulting complex data item into a text field. In many cases, with the dictionaries available, it should be possible to redistribute the complex data value items among the appropriate CIF-1 data items, but I am not sure this will be needed. > The use of an expression such as #CIF2,0 as a magic number as the first > string in a CIF could cause problems since the CIF standard states that > anything after # is not part of the CIF and can be stripped out without > destroying the integrity of the CIF, i.e., anything following # has no > bearing on the either the syntax or the semantics of the CIF. Have I missed > something here? Software designed, e.g., to strip out the comments in a > template could easily strip out the magic number. No problem if this is a > CIF1 file, but it would create an illegal file if it did this to a CIF2 file. > Some legacy software might not be sophisticated enough to recognize the > problem. In general I would strongly advocate using a different initial > character for this string. At some point, software that expects to process a CIF2 will have to get the information that it needs to follow CIF2 rules from somewhere. If all it expects to deal with are CIF2 documents, then it is fine. But if it expects to handle both CIF1 and CIF2, then it should pay attention to the first couple of comments. However, to help, I would suggest that for CIF2, we adopt the convention of using .cif2 or .cf2, rather than .cif as the file extension for a data file, and .dic3 or .cd3 rather than .dic as the file extension for a DDLm dctionary. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Thu, 15 Oct 2009, David Brown wrote: > > I have just returned from a cruise along the coast of Labrador which started > before the current discussions began. I have spent the last couple of days > reading through all 74 contributions that have subsequently arrived in my > computer. Most of the discussion is a little outside my concern with > dictionaries, but I have been trying to see if there are any obvious problems > with what is proposed. Perhaps someone can point out if I have > misinterpreted anything. > > I am adopting James' suggestion that the new standard is sufficiently > different to require a major version change to CIF2.0 rather than CIF1.2. > > For clarification are two ' (i.e. '') the same as one "? Some of the > illustrations seem to indicate that this is so, but this may be the result of > the fonts used which do not distinguish between '' and ". (Thisi is why I am > using Courier.) If they are not the same then '' is not a delimiter and > ''red'' would be interpreted as three items '' red '' whereas "red" would be > interpreted as the single item red without any quotes. Are both ''' and """ > legal dlimitiers? > > I have been considering the problems of CIF 1.0 and 1.1 files being read by > CIF2.0 applications and vice versa. > > As Simon points out, the tags are all undelimited strings and therefore are > restricted in their allowed character set in CIF2.0. I don't think that any > existing tags violate Nick's latest set of excluded characters. Characters > such as % and / do appear in DDL1, though they will probably be removed in > DDLm. If any illegal characters do appear they present no problem as long as > a CIF2 application recognizes that it is reading a CIF1. Where the name > appears as an alias in a DDLm dictionary it can be be made legal by being > quoted. Providing a CIF1 application can be taught to recognize _ and . as > interchangeable it should have no problem in reading CIF2 names, but it may > not recognize the names which will be different or absent in the DDL1 > dictionary. This could result in a loss of information, which may or may not > be important. It would clearly have serious problems with arrays and other > cases where the new delimiters were used. > > I assume that CIF2.0 applies to both the dictionaries and the CIFs > themselves. Are there conditions (like global_) that only apply to > dictionaries? CIFs prepared using the CIF2.0 standards are likely in the > first instance to code matrices and vectors as separate elements. Existing > methods can combine these into arrays. Eventually I foresee that such values > will be coded directly as arrays as this is more efficient. Methods will > then be needed to decompose these arrays into their elements in case > individual elements need to be retrieved. I see no problem except that a > CIF2.0 coded in this way clearly could not be read by a CIF1 application. > > The use of an expression such as #CIF2,0 as a magic number as the first > string in a CIF could cause problems since the CIF standard states that > anything after # is not part of the CIF and can be stripped out without > destroying the integrity of the CIF, i.e., anything following # has no > bearing on the either the syntax or the semantics of the CIF. Have I missed > something here? Software designed, e.g., to strip out the comments in a > template could easily strip out the magic number. No problem if this is a > CIF1 file, but it would create an illegal file if it did this to a CIF2 file. > Some legacy software might not be sophisticated enough to recognize the > problem. In general I would strongly advocate using a different initial > character for this string. > > David > > _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Nick Spadaccini)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (David Brown)
- Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Index(es):