Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Searching for a compromise on eliding

If there is acceptance of P', logic suggests that the same
approach should be taken towards the single-quoted strings,
e.g. a user might question:

I can do this """C\"""", so why can't I do this: "C\"" ?

This would then just leave the semicolon-delimited fields as
the means to store 'raw' strings.

This may be 'maximally desruptive', but CIF2 is already distinct
from CIF1 and will require conversion of e.g. "C\"" in any case
(the latter is valid CIF1).

Basically, I worry that the compromise is starting to make CIF look
a bit 'messy'; perhaps all the changes should be reviewed...



From: James Hester <jamesrhester@gmail.com>
To: ddlm-group <ddlm-group@iucr.org>
Sent: Friday, 25 February, 2011 2:50:59
Subject: [ddlm-group] Searching for a compromise on eliding

Dear DDLm-group,

I think we have all had a decent chance to argue our case for
Proposals P, F and F'.  I have also been in small side discussions
with Ralf and John W.  Their points of view can be summarised as
(i) Behaviour of triple-quoted strings will be too confusing unless
Python behaviour is followed (Ralf)
(ii) There is considerable criticism of CIF in the macromolecular
community because of idiosyncratic behaviour, particularly concerning
quoting.  We should therefore stick to accepted standards as much as
possible (John W)

For John W and Ralf these points outweigh any of the disadvantages of
Proposal P, and so Proposal P remains their first choice.  Proposal P
is therefore the first choice of 3 out of 5 COMCIFS voters, and the
last choice of the other two (I would rank it worse than doing
nothing, actually).  I note that non-voting members are uniformly
opposed to Proposal P.

I therefore want to try to seek some common middle ground in the hope
that I can find a proposal that could be at least as acceptable as
Proposal P to Ralf and/or Herbert and/or John W.

Consider the following four new proposals - P-prime, Q, G and null:

* Proposal P-prime: triple-quoted strings are treated as for Python
2.7.  No Unicode or raw strings are defined (ie no strings starting
u""" or r""").

I interpret John W and Ralf's position to be that they would be able
to support this proposal as the preferred choice, as our syntax would
still be entirely consistent with Python.  This proposal is a
considerable improvement on Proposal P, because the dangers of raw
strings are taken out of the equation, and the Unicode database is no
longer a dependency.  We are still left with a whole bunch of (frankly
pointless) elides, leading to Proposal Q:

* Proposal Q: As for Proposal P-prime, with the following changes:
(1) Only <backslash><delimiter> and <backslash><backslash> when it
precedes <backslash><delimiter> are recognised escape sequences at the
syntactical level
(2) A DDLm string type, e.g. "CText", is defined in com_val.dic for
which the remaining escape sequences have the meaning assigned to them
by the Python 2.7 standard.  mmCIF and related domains can standardise
their definitions on this string type and derivatives, making the
above division between syntax and dictionary invisible to users and
programmers in their domain.

* Proposal G: Proposal F', but with a different delimiter

Ralf has indicated that he actually thinks Proposal F' is best, but
only if the delimiters are not going to be confused with Python
delimiters.  I interpret John W's position to be that he would not
support such a change in delimiters as that would simply make CIF even
more idiosyncratic.  Anyway, any such replacement delimiter would need
to be multi-character, easy to type and unlikely to occur as the first
characters in CIF1 datavalues.  We would also need to reduce the
characterset of non-delimited CIF2 strings to exclude any such
delimiters.  Ideas?

* Null proposal: do nothing as we can't agree

I think I could support Proposal Q as an acceptable fallback from F',
and if somebody can find sensible delimiters for Proposal G that works
for me as well.  The preferred treatment for backslash rich text for
Proposals P,P' and Q will necessarily be semicolon-delimited strings.

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.