[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
[ddlm-group] [John.Bollinger@STJUDE.ORG: Re: Feedback on draft CIF2specification from JohnBollinger]
- To: ddlm-group@iucr.org
- Subject: [ddlm-group] [John.Bollinger@STJUDE.ORG: Re: Feedback on draft CIF2specification from JohnBollinger]
- From: Brian McMahon <bm@iucr.org>
- Date: Mon, 10 May 2010 09:28:29 +0100
Forwarded on behalf of John Bollinger. John: Apologies for the delay - I think this came in just after I left the office for the weekend. To keep traffic flowing smoothly, I've added you to ddlm-group for now. Feel free to unsubscribe if and when you wish. Regards Brian ----- Forwarded message from "Bollinger, John C" <John.Bollinger@STJUDE.ORG> ----- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG> To: "'bm@iucr.org'" <bm@iucr.org> Date: Fri, 7 May 2010 10:27:01 -0500 Subject: Re: [ddlm-group] Feedback on draft CIF2 specification from JohnBollinger Thread-Topic: Re: [ddlm-group] Feedback on draft CIF2 specification from JohnBollinger Dear Brian, In view of the fact that the DDLm group appears to want to conduct discussion of CIF2 on their own mailing list, and pursuant to IUCr's instructions for non-members offering contributions to restricted lists, I am sending you these remarks as a proposed contribution to the ddlm-group list. (Note: I am entirely willing to be added to the DDLm list, but I don't think at the moment I'm prepared to contribute outside the scope of CIF syntax discussion.) These are in response to some of Joe Krahn's comments: >> Point 21: (requiring the labels in a CIF2 Table to be quote-delimited.) >This requirement was useless in the previous revision, because semicolon >was in the set of characters that must be quoted. That is no longer true >in the last revision. The difficulty with regard to semicolon could be resolved, I think, by allowing \n;-delimited strings as labels as well. I should be surprised to see real-world CIFs actually use such a feature, but this particular question revolves around parsing and avoiding language ambiguity. It would appeal to me from a consistency standpoint for *all* the delimited string representations to be usable as table index labels, though I don't think it's very important. >I still prefer that index labels be treated as any >other data element, and not make it a special case. To do this,the >delimiting colon should have optional whitespace. The index label only >needs to be quoted by normal quote/whitespace delimiter lexing rules. >Onquoted index lables are then valid, as long as whitespace separates >the colon. If the index labels are required to be delimited strings, and no whitespace is allowed between the closing delimiter and the colon, then the language has some nice properties: 1) lexical analysis can be completely insensitive to context, while at the same time, 2) lexical analysis can reliably identify index labels The role of a lexer performing such an analysis is completely separated from the role of the grammar parser it feeds, while the lexer also eases the parser's job by handling index label identification. These properties arise because a string that is not an index label must be separated from so a delimited string followed immediately by a colon can only be an index label or an error. In short, I am not in favor of allowing whitespace between the index label and its colon. On the other hand, I see no particular advantage to forbidding whitespace between the colon and the following data value. Removing that requirement would allow tables to be written in a somewhat easier to read format, and also would allow slightly longer values to be used without line folding. For example, these would all be allowed and equivalent: "label":'value' "label": "value" 'label':value 'label': value >Alternatively, the colon character could be made a mandatory-quoted >character within tables. Then, only normal token quoting rules need >apply. True, but I prefer to avoid special rules for particular contexts. Another alternative, however, would be to _require_ the colon of the index label to be separated from the value by whitespace. That seems to me more consistent with the rest of CIF2, and also could provide for unambiguous use of unquoted index labels. Thus all of the following could be allowed and equivalent: "label": 'part1:part2' 'label': part1:part2 label: part1:part2 ;label ;: part1:part2 Doing that does make a bit more work for the parser, which must do the right thing with label: part1: in a Table, and also with _tag value: outside, but that's doable. >Either way, I think the optional whitespace around the colon allow nicer >formatting. I do like the option (at least) of whitespace after the colon, and I agree that it affords nicer formatting. I don't think the same is true for whitespace before the colon, however, and I am particularly unexcited by the prospect of label :value being allowed. Best Regards, John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital Email Disclaimer: www.stjude.org/emaildisclaimer ----- End forwarded message ----- _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Prev by Date: Re: [ddlm-group] Feedback on draft CIF2 specification fromJohn Bollinger
- Next by Date: [ddlm-group] UTF-8 BOM
- Prev by thread: Re: [ddlm-group] UTF-8 BOM
- Next by thread: [ddlm-group] Support for legacy files in DDLm
- Index(es):