[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Triple-quoted strings
- To: Nick.Spadaccini@uwa.edu.au, Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Triple-quoted strings
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Mon, 9 Nov 2009 07:55:51 -0500 (EST)
- In-Reply-To: <C71DA0FD.12369%nick@csse.uwa.edu.au>
- References: <C71DA0FD.12369%nick@csse.uwa.edu.au>
Dear Colleagues, There is no CIF requirement for a leading space. With the CIF line-folding protocol, the following is a perfectly valid alternate representation on the string "abcdefgh" in CIF 1.1 with line folding ;\ abc\ def\ gh\ ; ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Mon, 9 Nov 2009, Nick Spadaccini wrote: > On 30/10/09 3:14 AM, "Joe Krahn" <krahn@niehs.nih.gov> wrote: > >> Triple-quotes have the advantage that simple strings can be inline, >> instead of block format. Maybe it is reasonable to also update semicolon >> quoting as I suggested below, so that quoted lines drop the initial >> space for all intervening lines, allowing for "pretty" block text >> formatting? > > Not sure I appreciate what is being requested here. STAR has no requirement > of a leading space, this must be a CIF thing. I am guessing it is yet > another throwback to the Fortran-esque view of the world. A space to > separate tokens at the END of the previous line will be lost if you apply > GOFFIO (Good Old Fashion Fortran IO). Hence it has to be the leading > character of the next line. > >> Herbert J. Bernstein wrote: >>> The current proposal to deal with nested quotes, for all the quoted >>> strngs in CIF2 is to work python-style: >>> >>> 1. The reverse-solidus (\) is used to escape all quote marks and >>> the reverse solidus itself >> Is the escape required for all instances of reverse-solidus, or only >> when writing a literal \" ? If it is mandatory, shouldn't it be >> mandatory for quotes as well? > > Parsed this several times in my head (I am getting old), and can't decipher > what you asking. Clarify please. > >>> 2. All quotes strings will terminate on the first uneascaped >>> closing quote, indpendent of whether it is followed by a blank. >> If the old quoting rules are to be dropped, why use triple quotes at >> all? Plain old quotes, with all embedded quotes escaped, is sufficient. >> Python has triple quotes because single-quoted strings can have variable >> substitutions within the string, which is not relevant for CIF. > > True for Python. The new CIF2 string states that a reverse solidus before AN > ALLOWED character protects that character from interpretation as a token > terminator. (ALL ALLOWED characters are protected in this way). The triple > quote is a CIF multiline comment. A newline is not an ALLOWED character > within a single or double quote string - hence you cannot achieve the > equivalent of a triple quote y protecting a newline character. > >>> 3. All CIF2 writers are required to follow any closing quote with >>> a separator appropriate to the context, i.e. on the top level with >>> whitespace, or in a list by a comma or a close brace, etc. >>> 4. The reverse-solidus itself would always be passed to the >>> application to decide what to do with it. >>> >>> Thus the following would be used to nest treble quote >>> >>> """ here we are inside \""" treble quotes \""" """ >>> >>> and the application would receive >>> >>> here we are inside \""" treble quotes \""" >> What is the rationale for #4? IMHO, it will be much less problematic to >> hide data-formatting issues from applications calling the CIF I/O routines. >> >> It would be much safer to avoid "corrupting" the data with details of >> the format. Is it also requires for the application to escape it's own >> strings when writing data? For example, if writing a literal """, does >> the CIF library escape it for me, and I later read back a different >> string, \""", and have to remove the reverse-solidus myself? Or, does it >> return an error saying it is an invalid string? >> >> Normally, escape sequences specific to the format are encoded and >> decoded by the low-level routines, and essentially hidden from the user. >> Otherwise, it adds unnecessary complications to applications using >> the I/O library. A well-designed system should just accept raw strings >> from the caller, insert any quotes and/or escape sequences, and do the >> reverse when the data is read back in. The CIF2 library would >> automatically know that strings with brackets must be quoted, even >> though that was not required for CIF1. >> >> Just my 2 cents. > > I can understand that the applications need to know how to interpret the > format. The problem is, there are existing multiple formats encoded in CIFs. > If we could just assume one, say the C formatting style that \n=newline, > \t=tab etc it would be easy. > > However your statement that the en/de-coding is done by low level routines > needs clarification. If you read from a file "hello\nworld" in Python and > print it, it is printed as a raw string. None of the decoding is done for > you. It is not until you print the eval() of the string that the decoding is > done - that is it is left to the invocation of an application. That is all > we are saying here. Trouble is there will be several eval() routines for > different encodings. Which eval() routine will depend on another data item > to tell you which encoding, eg > > _abstract.encoding "Latex" > > _abstract """This abstract will be in \latex, > and in {\bf this} is in bold typeface""" > > The encoding could come from the dictionary, but that would require ALL > _abstract values be encoded in Latex. > > But I agree the set of eval() routines need to be provided (or at least > there specification. > >> Joe Krahn >> >>> >>> -- HJB >>> ===================================================== >>> Herbert J. Bernstein, Professor of Computer Science >>> Dowling College, Kramer Science Center, KSC 121 >>> Idle Hour Blvd, Oakdale, NY, 11769 >>> >>> +1-631-244-3035 >>> yaya@dowling.edu >>> ===================================================== >>> >>> On Tue, 27 Oct 2009, Joe Krahn wrote: >>> >>>> I just joined the DDL list. Here is my view on triple-quoted strings. I >>>> hope I'm not missing something already covered on the list. >>>> >>>> IMHO, triple-quoting is not a very good solution for multi-line text. >>>> You still have to define a way to escape a literal triple-quote. Why not >>>> just stick with the existing newline-semicolon method, and only have to >>>> define a special escape code for a semicolon at the beginning of a line? >>>> It is backward-compatible because older CIF's simply do not have >>>> embedded newline-semicolon. With triple-quotes, you have to deal with >>>> the possibility of """ being a quoted quote character. >>>> >>>> Another possibility is to remove a leading space from all lines within a >>>> semicolon-quoted block. Most CIF text blocks already formatted that way, >>>> so that the first line of text with the semicolon is at the same >>>> indentation level as the remaining lines. For example: >>>> >>>> ;11111 >>>> 22222 >>>> 33333 >>>> ; >>>> >>>> is a quoted block representing: >>>> >>>> 11111 >>>> 22222 >>>> 33333 >>>> >>>> >>>> Even though some people might like triple-quoting due to being Python >>>> users, I think that the alternatives are a better fit into the existing >>>> syntax. >>>> >>>> Joe Krahn >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group > > cheers > > Nick > > -------------------------------- > Associate Professor N. Spadaccini, PhD > School of Computer Science & Software Engineering > > The University of Western Australia t: +61 (0)8 6488 3452 > 35 Stirling Highway f: +61 (0)8 6488 1089 > CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick > MBDP M002 > > CRICOS Provider Code: 00126G > > e: Nick.Spadaccini@uwa.edu.au > > > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] Triple-quoted strings (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] CIF-2 changes
- Next by Date: Re: [ddlm-group] CIF-2 changes
- Prev by thread: Re: [ddlm-group] Triple-quoted strings
- Next by thread: [ddlm-group] Relationship of CIF2 to legacy platforms
- Index(es):