Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Relationship of CIF2 to legacy platforms

Dear Colleagues

> Perhaps Brian can cast his vote on this one?

As you may have guessed, I'm very pressed for time at the moment. I've been
casting an eye over the ongoing discussions as they have developed, and have
refrained from making any intervention either because I haven't mastered all
the ramifications of a particular line of argument, or because I understand
both contesting viewpoints and see no overwhelming technical reason to
insist on one side as opposed to the other.

In general, I think the view of "the IUCr" is what is being determined
by this lengthy but careful discussion process. Within that world view,
for advocacy of the editorial office's requirements I rely heavily
on Simon's interventions, since he will bear the brunt of actual
programming in the short term.

Couple of quick messages will follow on specific points. I'll try to
find time over the weekend to review again the detailed arguments, but
I can't guarantee to be able to do so.

So: I vote for maximum line length of 2048 bytes to accommodate
record-oriented i/o, which remains an important requirement for
certain classes of crystallographic programming.

With UTF-8 Unicode encoding, that doesn't equate to 2048 characters
so the editorial support desk may have to learn to explain that to confused
authors :-)  [I make the statement tongue in cheek, since it's the job
of publCIF to hide such things from authors. But some will doubtless insist
on doing it "by hand".]

The line folding conventions allow for the construction of longer logical
lines if and as required by downstream processing applications.

I make the parenthetical comment here (I'm sure I've made it before)
that record-oriented perception of "lines" is not all evil: verbatim
display of program code (especially for languages where newline has
semantic significance), ad-hoc tabulation by visual alignment, acrostic
puzzles, scriptures and poetry are all examples of valid presentations
that can be awkward to implement in some XML systems.


On Fri, Nov 20, 2009 at 10:56:27AM +0000, SIMON WESTRIP wrote:
> I dont think the IUCr will have any issues with line length restrictions -
> they will just have to preprocess a CIF if its to be passed to any
> legacy software that
> requires shorter lines.
> Perhaps Brian can cast his vote on this one?
> Cheers
> Simon
> ________________________________
> From: Nick Spadaccini <nick@csse.uwa.edu.au>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Friday, 20 November, 2009 4:07:36
> Subject: Re: [ddlm-group] Relationship of CIF2 to legacy platforms
> Just as James changes his vote to 2048 :) I suggest we leave it at 2048
> bytes, because
> (a) it's long enough
> (b) we will have at least one thing we WON'T change from the CIF1
> specification.
> Nick
> On 19/11/09 11:19 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
> wrote:
>> Just to simply the menu of choice, I'll change my vote to 4096.  --
>> Herbert
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>                   +1-631-244-3035
>>                  yaya@dowling.edu
>> =====================================================
>> On Thu, 19 Nov 2009, David Brown wrote:
>>> I have no strong views on line length, but the arrguments for keeping them
>>> seem a little stronger than those for abolishing them.  I have no views at
>>> all on how long the lines should be other than to note that Acta Cryst.
>>> programs get upset if there are more than 80 characters in a line.
>>> David
>>> James Hester wrote:
>>> We should resolve the Fortran line length issue as I think we've got
>>> enough information on the table - could those who haven't indicated
>>> their preference please vote either
>>> (1) CIF2 should have a maximum line length specified or
>>> (2) no line length should be specified.
>>> For bonus points, you can indicate what this length should be.
>>> So (including Nick's recent email) I count the votes as:
>>> (1) Herbert (>=2048), Nick (2048), James (4096)
>>> (2) Joe
>>> I've added my vote to the fixed line length simply because I accept
>>> Herbert's argument that legacy Fortran programs are actually important
>>> in the crystallographic world, and a restriction on line length does
>>> not impose a burden on CIF readers.  It also imposes a bit of
>>> discipline on CIF writers and helps to produce a readable file.
>>> On Fri, Nov 13, 2009 at 3:47 AM, Joe Krahn <krahn@niehs.nih.gov> wrote:
>>> Nick Spadaccini wrote:
>>> On 3/11/09 12:53 AM, "Joe Krahn" <krahn@niehs.nih.gov> wrote:
>>> Herbert,
>>> I am only suggesting that maintained Fortran code ought to be able to
>>> utilize F2003 STREAM I/O, supported by current versions of GFortran,
>>> Intel Fortran and Sun Fortran.
>>> Of course, I probably am not considering all of the issues. STREAM I/O
>>> avoids the need for a fixed maximum record length, but even the newest
>>> Fortran compilers have very limited UTF-8 support. Even with STREAM I/O,
>>> it is not trivial to count trailing blanks as significant.
>>> Maybe the biggest problem is UTF-8. IMHO, it makes sense for UTF-8 to be
>>> an optional encoding, rather than just declaring CIF2 is all UTF-8. This
>>> Not sure what you gain by doing this. If it is pure ASCII only then the
>>> declaration of UTF-8 inhibits nothing, since ASCII is a subset. If it is not
>>> pure ASCII, then it needs to be UTF-8. I can't see how knowing in advance
>>> that it is a subset of UTF-8 or possibly the full set of UTF-8 gives you
>>> anything.
>>> cheers
>>> Nick
>>> A compiler/language not aware of UTF-8 could avoid errors by rejecting
>>> CIF files that contain UTF-8. However, I think the approach being taken
>>> is just to allow implementations to restrict usage, rather than put it
>>> in the specifications. For example, the plan seems to be that
>>> DDL/dictionary definitions will be used to avoid UTF-8 in data names,
>>> where it is most likely to be a problem. So, you are right: there is no
>>> reason for the CIF2 syntax to make UTF-8 optional when the dictionaries
>>> can restrict characters to the ASCII subset.
>>> The other potential legacy issues I know of are fixed maximum line
>>> lengths, and significant trailing blanks. Dictionary definitions cannot
>>> avoid these. It might be possible to take a similar approach, by
>>> avoiding them by implementation conventions rather than making it part
>>> of the spec. If these are only going to be an issue for a few more
>>> years, it would avoid having to make another syntax change in the near
>>> future.
>>> My main interest here is to avoid incompatible implementations. I also
>>> think that Fortran, and any other line-oriented I/O software, should be
>>> able to do stream-oriented I/O in the near future.
>>> Joe
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.