Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings

Simon,

In general, I agree with you completely. In fact, my main reason for 
joining this list is to promote a CIF 2 format that avoids ambiguities. 
However, some people have taken the approach that current CIF 
implementations make the dictionary context mandatory.

I would also drop rule #4, but it is better to have a default than to 
leave it completely ambiguous. Do the dictionary-based developers want 
to allow for an alternate conversion, or is the conflict only about 
where in the software is the conversion done?

Joe


SIMON WESTRIP wrote:
> Hi Joe
> 
> In reply to this and your subsequent comments (the subject line of the 
> emails seems to have overlapped):
> 
> I'm not sure that rule 4) "An implementation may override the default 
> conversions in #2, but
> should be avoided in most cases to maintain compatibility"
>  is appropriate in defining a base syntax for CIF (in any flavour, CIF1 
> or 2, or ....).
> As I understand it and encounter it, one of the purposes of CIF is as an 
> archive format.
> As such, the way data are stored in the CIF should _not_ be 
> implementation dependent,
> i.e. anything that suggests that a data item could be interpreted in 
> more than one way
> depending on the particular software that the CIF is passed to is in 
> danger of making
> the standard 'non-standard'. Though it can be argued that 
> context-sensitivity could be extended to
> the dictionary (some sort of interpretation of whether an elide used in 
> a string type was intended or
> not, or whether for a particular item it will always be interpreted in a 
> certain way),
> the fact is there will be applications that are only interested in 
> obtaining the value of a data item
> without any consideration of the CIF dictionary, and they will need to 
> know the rules for identifyng that value.
> For example, a molecular graphics program may just want the site data -
> it may encounter _site_label "A\"BC" in one loop, but (for whatever 
> reason), the associated site data may be given as
> _site_label 'A"BC' in another loop. The application needs to know 
> whether it is looking at the same key value,
> but cannot do this if the rules say that "A\"BC" might be A\"BC or it 
> might be A"BC depending on
> the interpretation described in an associated dictionary or by a 
> particular discipline or organization.
> So when it comes down to defining a base syntax that all CIFs should 
> adhere to, I don't think there is any scope for
> offering various interpretations of what the value of a data item 
> actually is.
> 
> Forgive me if I've missed the point here or misunderstood your comments, 
> but seems to me that establishing strict rules
>  about the use of elides is quite important, whether they produce A\"BC 
> or A"BC by default (some time back I interpreted
>  them as A"BC, but that was rejected, but subsequently A"BC is on the 
> table!). So whichever way it goes, I look forward to
> the results of any straw vote on this.
> 
> Cheers
> 
> Simon
> 
> 
> 
> ------------------------------------------------------------------------
> *From:* Joe Krahn <krahn@niehs.nih.gov>
> *To:* Group finalising DDLm and associated dictionaries 
> <ddlm-group@iucr.org>
> *Sent:* Monday, 23 November, 2009 16:53:07
> *Subject:* Re: [ddlm-group] Use of elides in strings
> 
> I think the solution is to define the CIF2 syntax in a way that allows
> more flexibility in the software implementation. IMHO, if you are going
> to leave the reverse-solidus intact, you should leave the quotes intact
> as well, because the elides are dependent on the quoting context.
> Obviously, RCSB software is designed in a way that they prefer all
> character conversions at the dictionary level. Other developers want the
> conversion done at the same time quotes are removed, so it can be done
> in the correct quoting context.
> 
> It should be possible to allow both approaches, with syntax definitions
> something like this:
> 
> Within quoted strings, the following rules apply:
> 
> 1) all close-quote definitions include the look-behind assertion that
> they are not preceded by an odd number of ASCII reverse solidus characters.
> 
> 2) By default, <REVERSE SOLIDUS><REVERSE SOLIDUS> represents <REVERSE
> SOLIDUS>, and <REVERSE SOLIDUS><CLOSE QUOTE> represents <CLOSE QUOTE>.
> 
> 3) It is implementation dependent whether the conversions defined in #2
> are applied at the file I/O formatting level (i.e. parser on input).
> 
> 4) An implementation may override the default conversions in #2, but
> should be avoided in most cases to maintain compatibility.
> 
> Joe
> 
> James Hester wrote:
>  > The outstanding issue seems to be around where in the process these
>  > elides get stripped; Herb and John argue that it should be possible to
>  > do this in an optional way at the dictionary stage.  As I've already
>  > indicated, I don't think that it is that straightforward.
>  >
>  > On Mon, Nov 23, 2009 at 9:35 PM, SIMON WESTRIP
>  > <simonwestrip@btinternet.com <mailto:simonwestrip@btinternet.com>> wrote:
>  >> So at the risk of repeating myself, at this stage there seems to be 
> majority
>  >> acceptance of
>  >> what I've been refering to as context-sensitive treatment of elides:
>  >>
>  >> Using the trivial example of _label "A\"BC"
>  >>
>  >> James and Nick would return A"BC
>  >>
>  >> Herb and John would return A\"BC
>  >>
>  >> I would return A"BC
>  >>
>  >> I wont address Herb's examples as I performed a similar exercise back in
>  >> THREAD3
>  >> which was then received with a different opinion :-)
>  >>
>  >
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org <mailto:ddlm-group@iucr.org>
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.