I am having diffioulty keeping up with the discussion - just as
everything seems to be falling into place, we are suddenly off in some
As with Joe, I am not close to the elide problem which is why I have not weighed in earlier or voted. If James' call for a vote is still active, I vote for no elides, joining James, Nick and Herbert.
James Hester wrote:
Thanks for the quick reply over Thanksgiving, John. I take from your message that the PDB does not need any elide mechanism to be defined in the CIF2 syntax. Would you therefore be prepared to vote in favour of not defining any elides, or would you prefer to abstain? Votes so far: No elides: James, Nick, Herbert if the IUCr + PDB say it is OK Elides:? Unknown: John, Joe, David B., Brian, Simon On Wed, Nov 25, 2009 at 12:03 PM, John Westbrook <firstname.lastname@example.org> wrote:I confess that I am having difficulty keeping up with all aspects of this discussion. Following Herb's suggestion I will try to summarize the quoting issues from the PDB perspective. 1. As there are multiple ways of quoting a string our tools and files surround embedded quotes with quotes of the opposite sense or with semicolons in the mixed case. I think that this point has been covered a number of times now and I believe that Nick has suggested that all reasonable cases can be handled by using this approach. 2. I too was not aware that original definition of terminators had changed and did not include either a leading or trailing whitespace. Certainly this must still be the case for single and double quotes. I cannot recall ever seeing an example where the terminator \n; was following by a whitespace character, but about half of the codes that I am familiar with would fall over on \n;next_token. 3. Line folding has never been an issue for PDB nor has line length. Regards, John Herbert J. Bernstein wrote:My major concern about anything we do is to be able to preserve the functionality of the practices that the IUCr is following in journal publications and the PDB is following. Inasmuch as they seem able to cope with no elide in CIF 1.1, the remaining question is whether they will be negatively impacted by the change in string termination without any elide. If they can use CIF 2 with these changes, my objections are purely academic and irrelevant. -- Herberrt ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 email@example.com ===================================================== On Wed, 25 Nov 2009, James Hester wrote:Herbert: I have the dubious advantage of not having participated in all those CIF1.0/1.1 discussions, so only have the spec as written down to rely on. Anyway, how do you feel about abandoning any specification of elides in CIF2 syntax, as suggested by Nick? On Wed, Nov 25, 2009 at 10:53 AM, Herbert J. Bernstein <firstname.lastname@example.org> wrote:Dear James, I started to write: "No, in CIF 1.1, none of the terminal quote marks, including the \n; are effective unless followed by whitespace (\n, space, tab, of end of file). This is a well-established, and very tricky part of the CIF spec going back to 1990. That is why Nick had to explicitly specify that a terminal quote mark would be effective no matter what it was followed by." But the grammer currently on the IUCr web site is _not_ the one that I recall COMCIFs discussing and approving. It now explcitly removes the requirement for terminal white space in the special case of the \n; text field terminator. I don't recall when that change was adopted, but it appears that you are right under the current spec about the example I chose. Inasmuch as there is a lot of working code that enforces and uses the original whitespace handling and uses it in line-folding, I will not revise CIFtbx 3, but I will try to do something to adapt to this change for CIFtbx 4. I guess we are just going to have yet another few dialects of CIF. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 email@example.com ===================================================== On Wed, 25 Nov 2009, James Hester wrote:To be precise, we are not 'referring all elides to the application' because no elides are recognised by the lexer under Nick's latest suggestion, so there are no elides to refer to the application. My understanding of CIF1.1 syntax suggests that the string you provide would produce a syntax error in CIF1.1, as the semicolon at the start of the second line would terminate the string, and so whitespace should then appear as the second character on the second line, rather than reverse solidus. On Wed, Nov 25, 2009 at 9:23 AM, Herbert J. Bernstein <firstname.lastname@example.org> wrote:The only problem with referring all elisdes to the application is that with the removal of the requirement of a blank after a \n; for it to be effective, the line folding protocol develops a slight gap. The case is as follows ;\ ;\ ; Is a valid single text field in CIF 1.1, which when handled with the line folding protocol translates to the equivalent of ';' because the embedded ;\ is not a valid text terminator. If we require that a text field the begins with "\n;\\" must be terminated by "\n; " or "\n;\n" or "\n;\t" that problem would be fixed. ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 email@example.com ===================================================== On Wed, 25 Nov 2009, James Hester wrote:I wholeheartedly agree with Nick's suggestion. On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini <firstname.lastname@example.org> wrote:It appears to me that we have spent far too long on a syntactic issue which can be avoided 99.9999% of the time. Quite simply given the 5 ways to delimit strings, it is next to impossible to get a situation where you cannot choose one of those to make the problem go away. I think the RCSB systematically avoid it by choosing "ab'cd" 'ab"cd' ;ab'"cd ; But now we additionally have """ and ''' to choose from, making it even easier. So I propose in line with James' position there is NO eliding of terminator character at the CIF2 syntax level. ALL elides in the string are assumed to be user specific encoding (say TeX, IUCr \greek) which can be resolved at the dictionary level. This necessarily means NO terminator character can appear in a string delimited by the same terminator character. You will need to choose a different terminator character. That is No " in "strings" No ' in 'strings' No """ in """strings""" (but separable individual and doublet " are allowed) No ''' in '''strings''' (but separable individual and doublet ' are allowed) EVERYTHING in the string is returned as raw (except the initiating and terminating character). The only time you will not be able to encode anything in a delimited string is when you want to include ' " """ ''' and \n; in the one string. The likelihood of that is almost zero, unless you may want to include a CIF within a CIF (a silly thing to do IMHO). In that case the contents can be encoded in a dictionary driven way. I suggest it be declared as a BASE64 type and then all the syntactic ambiguity disappears. Problem solved! No need to elide because of CIF2 syntax rules all elides are user driven, contents are returned raw. As for Herbs comment in a recent email what about line-folding, then the same holds. That is NOT a lexer issue and it has nothing to do with the parser, everything is read literally and returned raw and what to do with it is promulgated to the downstream application. Straw vote - No elides of terminator strings as described above - Nick-- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list email@example.com http://scripts.iucr.org/mailman/listinfo/ddlm-group------------------------------------------------------------------------ _______________________________________________ ddlm-group mailing list firstname.lastname@example.org http://scripts.iucr.org/mailman/listinfo/ddlm-group_______________________________________________ ddlm-group mailing list email@example.com http://scripts.iucr.org/mailman/listinfo/ddlm-group
begin:vcard fn:I.David Brown n:Brown;I.David org:McMaster University;Brockhouse Institute for Materials Research adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada email;internet:firstname.lastname@example.org title:Professor Emeritus tel;work:+905 525 9140 x 24710 tel;fax:+905 521 2773 version:2.1 end:vcard
_______________________________________________ ddlm-group mailing list email@example.com http://scripts.iucr.org/mailman/listinfo/ddlm-group