[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Use of elides in strings
From: Joe Krahn <[email protected]>
Date: Mon, 23 Nov 2009 17:03:14 -0500
In-Reply-To: <[email protected]>
References: <C7306520.1258E%[email protected]> <[email protected]> <[email protected]> <[email protected]><[email protected]>

Simon,

In general, I agree with you completely. In fact, my main reason for 
joining this list is to promote a CIF 2 format that avoids ambiguities. 
However, some people have taken the approach that current CIF 
implementations make the dictionary context mandatory.

I would also drop rule #4, but it is better to have a default than to 
leave it completely ambiguous. Do the dictionary-based developers want 
to allow for an alternate conversion, or is the conflict only about 
where in the software is the conversion done?

Joe


SIMON WESTRIP wrote:
> Hi Joe
> 
> In reply to this and your subsequent comments (the subject line of the 
> emails seems to have overlapped):
> 
> I'm not sure that rule 4) "An implementation may override the default 
> conversions in #2, but
> should be avoided in most cases to maintain compatibility"
>  is appropriate in defining a base syntax for CIF (in any flavour, CIF1 
> or 2, or ....).
> As I understand it and encounter it, one of the purposes of CIF is as an 
> archive format.
> As such, the way data are stored in the CIF should _not_ be 
> implementation dependent,
> i.e. anything that suggests that a data item could be interpreted in 
> more than one way
> depending on the particular software that the CIF is passed to is in 
> danger of making
> the standard 'non-standard'. Though it can be argued that 
> context-sensitivity could be extended to
> the dictionary (some sort of interpretation of whether an elide used in 
> a string type was intended or
> not, or whether for a particular item it will always be interpreted in a 
> certain way),
> the fact is there will be applications that are only interested in 
> obtaining the value of a data item
> without any consideration of the CIF dictionary, and they will need to 
> know the rules for identifyng that value.
> For example, a molecular graphics program may just want the site data -
> it may encounter _site_label "A\"BC" in one loop, but (for whatever 
> reason), the associated site data may be given as
> _site_label 'A"BC' in another loop. The application needs to know 
> whether it is looking at the same key value,
> but cannot do this if the rules say that "A\"BC" might be A\"BC or it 
> might be A"BC depending on
> the interpretation described in an associated dictionary or by a 
> particular discipline or organization.
> So when it comes down to defining a base syntax that all CIFs should 
> adhere to, I don't think there is any scope for
> offering various interpretations of what the value of a data item 
> actually is.
> 
> Forgive me if I've missed the point here or misunderstood your comments, 
> but seems to me that establishing strict rules
>  about the use of elides is quite important, whether they produce A\"BC 
> or A"BC by default (some time back I interpreted
>  them as A"BC, but that was rejected, but subsequently A"BC is on the 
> table!). So whichever way it goes, I look forward to
> the results of any straw vote on this.
> 
> Cheers
> 
> Simon
> 
> 
> 
> ------------------------------------------------------------------------
> *From:* Joe Krahn <[email protected]>
> *To:* Group finalising DDLm and associated dictionaries 
> <[email protected]>
> *Sent:* Monday, 23 November, 2009 16:53:07
> *Subject:* Re: [ddlm-group] Use of elides in strings
> 
> I think the solution is to define the CIF2 syntax in a way that allows
> more flexibility in the software implementation. IMHO, if you are going
> to leave the reverse-solidus intact, you should leave the quotes intact
> as well, because the elides are dependent on the quoting context.
> Obviously, RCSB software is designed in a way that they prefer all
> character conversions at the dictionary level. Other developers want the
> conversion done at the same time quotes are removed, so it can be done
> in the correct quoting context.
> 
> It should be possible to allow both approaches, with syntax definitions
> something like this:
> 
> Within quoted strings, the following rules apply:
> 
> 1) all close-quote definitions include the look-behind assertion that
> they are not preceded by an odd number of ASCII reverse solidus characters.
> 
> 2) By default, <REVERSE SOLIDUS><REVERSE SOLIDUS> represents <REVERSE
> SOLIDUS>, and <REVERSE SOLIDUS><CLOSE QUOTE> represents <CLOSE QUOTE>.
> 
> 3) It is implementation dependent whether the conversions defined in #2
> are applied at the file I/O formatting level (i.e. parser on input).
> 
> 4) An implementation may override the default conversions in #2, but
> should be avoided in most cases to maintain compatibility.
> 
> Joe
> 
> James Hester wrote:
>  > The outstanding issue seems to be around where in the process these
>  > elides get stripped; Herb and John argue that it should be possible to
>  > do this in an optional way at the dictionary stage.  As I've already
>  > indicated, I don't think that it is that straightforward.
>  >
>  > On Mon, Nov 23, 2009 at 9:35 PM, SIMON WESTRIP
>  > <[email protected] <mailto:[email protected]>> wrote:
>  >> So at the risk of repeating myself, at this stage there seems to be 
> majority
>  >> acceptance of
>  >> what I've been refering to as context-sensitive treatment of elides:
>  >>
>  >> Using the trivial example of _label "A\"BC"
>  >>
>  >> James and Nick would return A"BC
>  >>
>  >> Herb and John would return A\"BC
>  >>
>  >> I would return A"BC
>  >>
>  >> I wont address Herb's examples as I performed a similar exercise back in
>  >> THREAD3
>  >> which was then received with a different opinion :-)
>  >>
>  >
> 
> _______________________________________________
> ddlm-group mailing list
> [email protected] <mailto:[email protected]>
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Use of elides in strings (James Hester)

References:

Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)

Re: [ddlm-group] Use of elides in strings (SIMON WESTRIP)

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Joe Krahn)

Re: [ddlm-group] Use of elides in strings (SIMON WESTRIP)

Prev by Date: Re: [ddlm-group] Use of elides in strings

Next by Date: Re: [ddlm-group] CIF header

Prev by thread: Re: [ddlm-group] Use of elides in strings

Next by thread: Re: [ddlm-group] Use of elides in strings

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Use of elides in strings