Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

On the whole I think there is decreasing need for human readability of CIFs, which
I used as an argument for not ruling out UTF-8. However, not rendering a UTF char in
a plain text editor is less of a problem that not being able to 'look over' a raw CIF
because its lost its whitespace.

This probably sounds like a contradiction because I had accepted dropping the
whitespace condition, so I'll try to clarify my position:

Though the new specification would not require it, I would almost certainly use it as we do now when
writing CIFs and probably even put it back in if the CIF is to be presented to a user.
For example, publCIF presents the user with the CIF in its raw form along side a more user-friendly representation of
the data. If the CIF were to contain minimal whitespace, there would be very little
point in presenting it to the user and much of the benefit of publCIF would be lost.
So as the design of  publCIF was partly intended to give users confidence in editing CIFs
(especially those new to CIF), I would have to modify it to make the raw CIF as human readable as possible
within the confines of the spec. Granted, publCIF is only one application among many, but
provides an example of the sort of impact some of these changes may have on current
software that provides direct user interaction with the CIF (enCIFer would have to do the same - though
probably easier for them as they already render the CIF for syntax highlighting - i.e. dont just drop the raw CIF in an editing widget)

As far as the benefits of removing the whitespace condition goes beyond providing a clean
specification where the same delimiters apply throughout (simple data item constructs as well as
the compound constructs), I do not know. I was under the impression that with the
whitespace condition, all the recursive stuff would get a bit messy, to the extent that it violated the
specification, but not having worked at this level I really dont know.

Thus, despite how I voted, in terms of extra work involved, loss of whitespace isnt that welcome, but I'll accept it if its
absolutely necessary and do my best to maintain human readabilty where possible.



From: James Hester <jamesrhester@gmail.com>
To: Nick.Spadaccini@uwa.edu.au; Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Thursday, 15 October, 2009 9:10:41
Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Regarding whitespace:

1. Nick detects a contradiction with Simon on the one hand saying that
CIF files aren't directly read by humans much at all, and me insisting
on them remaining readable.  I agree that there is not much point
trying to read and/or edit a 500K mmCIF file directly.  But let us not
forget the small molecule people.  For example, a CIF delivered by the
Powder Data Base can be only 10 lines long, and eminently
readable/cut-and-pastable in any text editor.  In addition, at least
one and probably more of my instrument scientist colleagues routinely
look over raw CIF files in the course of preparing publications and
checking other people's work.  I believe the differing perspectives
here are more to do with the different areas in which Simon and I
encounter CIF files.

2. If all we are concerned about is simplifying the formal syntax,
then that has been done already when we agreed on removing delimiters
from within delimited strings.  The present discussion is exactly
equivalent to deciding on using either "<whitespace>?" or
"<whitespace>+" in the grammar description.  After a review of the
formal 1.1 spec, I see no other opportunities for simplification
arising from making whitespace optional.  So I ask once again, what
other benefits are claimed for making whitespace optional, beyond
changing a plus sign to a question mark in the specification?

On 10/13/09, Nick Spadaccini <nick@csse.uwa.edu.au> wrote:
> There is a difference between insisting in a formal grammar that a value
> token is treated differently at one level than it is at another level, as
> opposed to requiring CIF writers to pad whitespace between value tokens at
> one level, but not at another level.
> My reading of the previous mail was that the balance of opinion was to
> formally terminate with the single token (irrespective of whitespace) and
> then requiring/asking/pleading/whatever-verb writers to pad token, which
> they and we all do anyway. I repeat again the formal specification of the
> language needs to be strict and consistent (Brian's maximally disruptive),
> and the parsers can be more loosely (deprecatingly?) implemented.
> However I detect a certain level of inconsistency in arguments here. What
> does "human readability" have to do with it? We just had a discussion on
> UTF-8 where it was argued in the near future no-one is going to be
> vim-img/emacs-ing/grep-ping these files and it will all be driven by
> applications. What happened to human readability then?
> On 13/10/09 8:36 PM, "James Hester" <jamesrhester@gmail.com> wrote:
>> I, for one, do not agree with dropping the requirement for whitespace
>> between tokens outside compound structures.  Is the only justification
>> avoiding a second production rule in the formal grammar?  I would like
>> to think we are getting more than this in return for sacrificing human
>> readability: see previous email somewhere long ago in this thread.
> cheers
> Nick
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA  w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
> CRICOS Provider Code: 00126G
> e: Nick.Spadaccini@uwa.edu.au
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.