[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Alternative proposal for eliding
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Alternative proposal for eliding
- From: Saulius Grazulis <grazulis@ibt.lt>
- Date: Tue, 28 Jun 2011 17:30:57 +0300
- Organization: Biotechnologijos institutas
Dear DDLm group members, I would like to comment on some concerns regarding backwards compatibility of my proposal, the "prefixed <eol><semicolon> text fields". I think that in most cases the problems can be easily circumvented: On Wed, Jun 8, 2011 at 6:17 AM, David Brown <idbrown@mcmaster.ca> wrote: > The only place I see a possible problem is with a heritage CIF with the> following sequence>> _publ_section_experimental> ; a,b,c,\a,\b,\c were> determined from powder patterns> ;>> /.../> A CIF reader would expect to find:>> _publ_section_experimental> ; a,b,c,\a,\b,\c> a,b,c,were determined from powder patterns> ;>> and strip off the a,b,c, /.../ Not, thats actually NOT the way I supposed the things would work. Under my proposal, the above sequence would not be interpreted as a prefix, since the final backslash is not followed by a newline (or by a white space and a newline). Thus, the pattern would be interpreted literally, as it it is done now, and no problem would occur with such legacy archived files. To make "a,b,c" a prefix, one should write: _publ_section_experimental;a,b,c,\a,b,c,\a,\b,\ca,b,c,were determined from powder patterns; Which is different from above and should be equivalent, after prefix removal, to '\a,\b,\c were determined from powder patterns' in an unquoted string. Note that the 'a,b,c,' string *may* be at the beginning of a line, even if it is a prefix: _publ_section_experimental;a,b,c,\a,b,c,a,b,c,\a,\b,\ca,b,c,were determined from powder patterns; would fold to 'a,b,c,\a,\b,\c were determined from powder patterns' single-quoted string after changing newlines to spaces. Actually, the Perl RE was not accurate in my previous prosal, the more appropriate determination of prefix in Perl REs would be: if( $text =~ /^([^\\]+)\\(\s+)?\n/ ) { # a text without backslashes, # then a backslash, # then maybe blank, then newline. my $prefix = $1; $text =~ s/^${prefix}\\\n//; $text =~ s/^${prefix}//mg} > I agree that misreading of a legacy file without incurring a parsing error> is practically impossible. The only situation when the legacy files would be misinterpreted would be when they contain a *nonempty* text and a *trailing* backslash as the first line of the ';'-delimited text. Arguably, such files are seldom and probably non-existent. For example, there are only two such files in the COD CIF collection out of 140k+ (which encompasses nearly all files from the IUCr journals and quite a few by other publishers): saulius@tasmanijos-velnias cif/ > find ? -iname '*.cif' \| xargs perl -ne 'print $ARGV, "\t", $_ if /^;([^\\]+)\\(\s+)?\n/' 2/2213918.cif ;{4,4'-Dibromo-2,2'-[1,2-phenylenebis(nitrilomethylidene)]diphenolato-\ 2/2224012.cif ; \ and both are probably mis-represented folded long lines which should be corrected anyway; see the full files: http://www.crystallography.net/2213918.cifhttp://www.crystallography.net/2224012.cif (Originals are at: http://scripts.iucr.org/cgi-bin/sendcif?ng2268sup1http://scripts.iucr.org/cgi-bin/sendcif?sj2654sup1 and they have the same syntax). I can run the same check on the PDB mmCIF collection if needed. Even if such files are encountered, in most cases it will not cause much harm -- a parser will not be able to strip away prefixes and leave the rest of the value as is. This could (should?) trigger a warning. > We should, however, make it possible in CIF2 to present multiline values> containing a backslash before the first <eol> without risking a parsing> error on read when this <backslash> is misunderstood as a prefix flag. I think discarding the new line of the first ';' line is not necessary in case the line is not a prefix. The suggested prefix declarations are unique enough to be recognized without this rule. # From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com># Date: Tue, 7 Jun 2011 05:48:34 -0400 (EDT): > This would certainy be a worthy suggestion to consider in a> CIF1 context. Sure the prefixed ';'-texts can be used in CIF1 as well, being mostly backwards compatible, and compatible with the CIF line folding rule. > For CIF2, my own preference would be to solve this problem by adopting> the full Python syntax and semantics for treble-quoted strings My understanding is that, unless escape sequences like those in C or in Python or Perl are mandated in CIF strings ("The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character"[1]), the triple-quoted syntax does not solve the cif-in-cif problem -- as I have read in the recent CIF2 draft[2], 'Clearly, the string within cannot contain an ASCII """'. Thus again we will have a non-representable values in CIF -- the ones that contain triple-single quotes followed by a space, triple double quotes followed by a space and a semicolon at the beginning of a line. [1] http://docs.python.org/reference/lexical_analysis.html [2] http://www.iucr.org/__data/assets/pdf_file/0017/41426/cif2_syntax_changes_jrh20100705.pdf We do not need to go far to find such values -- the text of the cif2_syntax_changes_jrh20100705.pdf draft itself *is* an example of a non-representable value :). The prefixes could easily save the situation without adding much extra work for parsers. Sincerely,Saulius -- Dr. Saulius GražulisInstitute of Biotechnology, Graiciuno 8LT-02241 Vilnius, Lietuva (Lithuania)fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556mobile: (+370-684)-49802, (+370-614)-36366_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Alternative proposal for eliding (SIMON WESTRIP)
- Prev by Date: Re: [ddlm-group] The Grazulis eliding proposal: how to incorporateinto CIF?
- Next by Date: Re: [ddlm-group] Alternative proposal for eliding
- Prev by thread: Re: [ddlm-group] Alternative proposal for eliding. .
- Next by thread: Re: [ddlm-group] Alternative proposal for eliding
- Index(es):