[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Searching for a compromise on eliding
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Searching for a compromise on eliding
- From: SIMON WESTRIP <simonwestrip@btinternet.com>
- Date: Fri, 25 Feb 2011 12:39:05 +0000 (GMT)
- In-Reply-To: <AANLkTi=bEDjCpJgyuB07q1FBFZjA_jbG=4jgLsXEvw4g@mail.gmail.com>
- References: <AANLkTi=bEDjCpJgyuB07q1FBFZjA_jbG=4jgLsXEvw4g@mail.gmail.com>
If there is acceptance of P', logic suggests that the same
approach should be taken towards the single-quoted strings,
e.g. a user might question:
I can do this """C\"""", so why can't I do this: "C\"" ?
This would then just leave the semicolon-delimited fields as
the means to store 'raw' strings.
This may be 'maximally desruptive', but CIF2 is already distinct
from CIF1 and will require conversion of e.g. "C\"" in any case
(the latter is valid CIF1).
Basically, I worry that the compromise is starting to make CIF look
a bit 'messy'; perhaps all the changes should be reviewed...
Cheers
Simon
From: James Hester <jamesrhester@gmail.com>
To: ddlm-group <ddlm-group@iucr.org>
Sent: Friday, 25 February, 2011 2:50:59
Subject: [ddlm-group] Searching for a compromise on eliding
Dear DDLm-group,
I think we have all had a decent chance to argue our case for
Proposals P, F and F'. I have also been in small side discussions
with Ralf and John W. Their points of view can be summarised as
follows:
(i) Behaviour of triple-quoted strings will be too confusing unless
Python behaviour is followed (Ralf)
(ii) There is considerable criticism of CIF in the macromolecular
community because of idiosyncratic behaviour, particularly concerning
quoting. We should therefore stick to accepted standards as much as
possible (John W)
For John W and Ralf these points outweigh any of the disadvantages of
Proposal P, and so Proposal P remains their first choice. Proposal P
is therefore the first choice of 3 out of 5 COMCIFS voters, and the
last choice of the other two (I would rank it worse than doing
nothing, actually). I note that non-voting members are uniformly
opposed to Proposal P.
I therefore want to try to seek some common middle ground in the hope
that I can find a proposal that could be at least as acceptable as
Proposal P to Ralf and/or Herbert and/or John W.
Consider the following four new proposals - P-prime, Q, G and null:
* Proposal P-prime: triple-quoted strings are treated as for Python
2.7. No Unicode or raw strings are defined (ie no strings starting
u""" or r""").
I interpret John W and Ralf's position to be that they would be able
to support this proposal as the preferred choice, as our syntax would
still be entirely consistent with Python. This proposal is a
considerable improvement on Proposal P, because the dangers of raw
strings are taken out of the equation, and the Unicode database is no
longer a dependency. We are still left with a whole bunch of (frankly
pointless) elides, leading to Proposal Q:
* Proposal Q: As for Proposal P-prime, with the following changes:
(1) Only <backslash><delimiter> and <backslash><backslash> when it
precedes <backslash><delimiter> are recognised escape sequences at the
syntactical level
(2) A DDLm string type, e.g. "CText", is defined in com_val.dic for
which the remaining escape sequences have the meaning assigned to them
by the Python 2.7 standard. mmCIF and related domains can standardise
their definitions on this string type and derivatives, making the
above division between syntax and dictionary invisible to users and
programmers in their domain.
* Proposal G: Proposal F', but with a different delimiter
Ralf has indicated that he actually thinks Proposal F' is best, but
only if the delimiters are not going to be confused with Python
delimiters. I interpret John W's position to be that he would not
support such a change in delimiters as that would simply make CIF even
more idiosyncratic. Anyway, any such replacement delimiter would need
to be multi-character, easy to type and unlikely to occur as the first
characters in CIF1 datavalues. We would also need to reduce the
characterset of non-delimited CIF2 strings to exclude any such
delimiters. Ideas?
* Null proposal: do nothing as we can't agree
I think I could support Proposal Q as an acceptable fallback from F',
and if somebody can find sensible delimiters for Proposal G that works
for me as well. The preferred treatment for backslash rich text for
Proposals P,P' and Q will necessarily be semicolon-delimited strings.
James.
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
approach should be taken towards the single-quoted strings,
e.g. a user might question:
I can do this """C\"""", so why can't I do this: "C\"" ?
This would then just leave the semicolon-delimited fields as
the means to store 'raw' strings.
This may be 'maximally desruptive', but CIF2 is already distinct
from CIF1 and will require conversion of e.g. "C\"" in any case
(the latter is valid CIF1).
Basically, I worry that the compromise is starting to make CIF look
a bit 'messy'; perhaps all the changes should be reviewed...
Cheers
Simon
From: James Hester <jamesrhester@gmail.com>
To: ddlm-group <ddlm-group@iucr.org>
Sent: Friday, 25 February, 2011 2:50:59
Subject: [ddlm-group] Searching for a compromise on eliding
Dear DDLm-group,
I think we have all had a decent chance to argue our case for
Proposals P, F and F'. I have also been in small side discussions
with Ralf and John W. Their points of view can be summarised as
follows:
(i) Behaviour of triple-quoted strings will be too confusing unless
Python behaviour is followed (Ralf)
(ii) There is considerable criticism of CIF in the macromolecular
community because of idiosyncratic behaviour, particularly concerning
quoting. We should therefore stick to accepted standards as much as
possible (John W)
For John W and Ralf these points outweigh any of the disadvantages of
Proposal P, and so Proposal P remains their first choice. Proposal P
is therefore the first choice of 3 out of 5 COMCIFS voters, and the
last choice of the other two (I would rank it worse than doing
nothing, actually). I note that non-voting members are uniformly
opposed to Proposal P.
I therefore want to try to seek some common middle ground in the hope
that I can find a proposal that could be at least as acceptable as
Proposal P to Ralf and/or Herbert and/or John W.
Consider the following four new proposals - P-prime, Q, G and null:
* Proposal P-prime: triple-quoted strings are treated as for Python
2.7. No Unicode or raw strings are defined (ie no strings starting
u""" or r""").
I interpret John W and Ralf's position to be that they would be able
to support this proposal as the preferred choice, as our syntax would
still be entirely consistent with Python. This proposal is a
considerable improvement on Proposal P, because the dangers of raw
strings are taken out of the equation, and the Unicode database is no
longer a dependency. We are still left with a whole bunch of (frankly
pointless) elides, leading to Proposal Q:
* Proposal Q: As for Proposal P-prime, with the following changes:
(1) Only <backslash><delimiter> and <backslash><backslash> when it
precedes <backslash><delimiter> are recognised escape sequences at the
syntactical level
(2) A DDLm string type, e.g. "CText", is defined in com_val.dic for
which the remaining escape sequences have the meaning assigned to them
by the Python 2.7 standard. mmCIF and related domains can standardise
their definitions on this string type and derivatives, making the
above division between syntax and dictionary invisible to users and
programmers in their domain.
* Proposal G: Proposal F', but with a different delimiter
Ralf has indicated that he actually thinks Proposal F' is best, but
only if the delimiters are not going to be confused with Python
delimiters. I interpret John W's position to be that he would not
support such a change in delimiters as that would simply make CIF even
more idiosyncratic. Anyway, any such replacement delimiter would need
to be multi-character, easy to type and unlikely to occur as the first
characters in CIF1 datavalues. We would also need to reduce the
characterset of non-delimited CIF2 strings to exclude any such
delimiters. Ideas?
* Null proposal: do nothing as we can't agree
I think I could support Proposal Q as an acceptable fallback from F',
and if somebody can find sensible delimiters for Proposal G that works
for me as well. The preferred treatment for backslash rich text for
Proposals P,P' and Q will necessarily be semicolon-delimited strings.
James.
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Searching for a compromise on eliding (Herbert J. Bernstein)
- References:
- [ddlm-group] Searching for a compromise on eliding (James Hester)
- Prev by Date: Re: [ddlm-group] Searching for a compromise on eliding
- Next by Date: Re: [ddlm-group] Searching for a compromise on eliding
- Prev by thread: Re: [ddlm-group] Searching for a compromise on eliding
- Next by thread: Re: [ddlm-group] Searching for a compromise on eliding
- Index(es):