Discussion List Archives

[Date Prev][Date Next][Date Index]

(19) DDL types, filename handles and other matters

Dear Colleagues

Please forgive the delay since the last circular. This may be seen as
something of a blessing in disguise, but there are a couple of points I should
have brought out sooner.

Peter Murray-Rust sent the following inquiry:

PMR> 	How's things?  I've just got the included message from Chris 
PMR> Sander and I thought I's better check with you before agreeing - as I'm a 
PMR> new boy on COMCIF.  How does this fit in with other efforts.  And what is 
PMR> the view of the committee?  I think it arises as I agreed in general 
PMR> terms to publicise the mmCIF in the biological; newsgroups when it became 
PMR> announceable.  I haven't seen any mail recently so I'm not sure what 
PMR> timescale we are at.
PMR> 
PMR>> ---------- Forwarded message ----------
PMR>> Subject: COMCIF
PMR>> 
PMR>> Dear Peter, just saw your message to Rob and thought of saying hello.
PMR>> 
PMR>> We've been thinking off and on about summarizing the discussion
PMR>> of possible CIF extensions and hope that between you, Phil and
PMR>> us (Michael Scharf here with me) a discussion document can
PMR>> be drafted within the not too distant future. What do you think ?
PMR>> 
PMR>> Best regards, Chris Sander

I should suppose that the thought police of COMCIFS would not go so far as to
block anyone from free and open discussion! It would probably be useful for
groups who set up discussion lists or e-mail conferences to invite someone
from COMCIFS to listen in, so that feedback could be supplied if there was any
likelihood of people putting forward proposals that might run counter to the
standard; but it would surely be beneficial to have a wider community of
people involved in software development working together. The York and
Tarrytown meetings sparked off a lot of useful ideas, but in the longer term
there will likely be input from other groups with different interests and areas
of expertise. Phil Bourne is already doing an excellent job of coordinating
feedback from the mmCIF meetings, which is why I think it useful that he also
has a place in our discussion forum.

Any other comments?

I hope that I can make some comments on the timescale for development of
the draft dictionaries later in this week.

=============================== Agreements

A10.6  Any printable ASCII character other than white space will be permitted
       in a CIF dataname. Only the leading underscore character '_' is
       syntactically important.

This is a direct application of the STAR principle, as outlined by Syd:

S> The latest STAR specs paper in JCICS is in press and no character
S> restrictiions other than <white space> and underscore exist
S> for data names. Ipso facto, no such restrictions exist for CIF.

For the record, we need to have an official definition of 'white space'. The
ASCII characters recognised as white space in the C Standard are: space,
form feed, newline, carriage return, horizontal tab and vertical tab.

We may also need an official definition of the convention for terminating a
line (the idea of lines in CIF arises from the rule that "Lines may not exceed
80 characters"). A CIF on a Unix system has a newline character at the end of
every line; when transferred to a PC by ftp this newline will be maintained
if the transfer is in binary mode, but translated into a carriage
return/newline pair otherwise. I propose that the convention be that lines
in a CIF contain no more than 80 printable ASCII characters excluding the
terminating whitespace characters used by the local computer system. In
practice, it would be well to stop a character or two short of this. A line
written on a PC is terminated by ^M^J as the single end-of-line flag. If
transferred to a Unix system by binary ftp, the ^J maps to a newline, but
the ^M is incorporated as an extra white-space character on the line.

(Syd: May one presume that the formal BNF syntax of STAR is not affected by
these considerations, since the STAR format is considered as a byte stream
and not a sequence of records?)

A15.1 Standard prefixes
-----------------------
S> My only small contribution is that _xtal_* data names have existed in our
S> files for three years. This may also be true for _shelx_* items as well. It 
S> probably would be sensible to tell COMCIFS that one has done this, but I
S> think any restrictions beyond informal notification is overkill.

===================================== Current discussion topics

D15.1 New types
---------------
S> D15.1  At the time Tony Cook and I formulated _type_conditions we realised 
S> that it was opening Pandora's box! We just wondered how long before the 
S> enumeration list exceeded two score and ten! Judging from the mail it
S> won't be too long.
S> 
S> OK, that's the intention of this attribute, but I must remind everyone that 
S> core DDL definitions become part of the STAR restrictions and we are going
S> to look at these very carefully before including them in the initial DDL
S> core specifications. For example, the _enumeration_constraint (_construct)
S> item is as yet half-baked (as you point out) and will need much more work
S> and testing before it could be included in the core DDL definitions.

I hastened to point out to Syd that the _enumeration_constr*t was even less
than half-baked, but was intended to illustrate purely schematically how such
details might be encoded in a machine-parseable form. People will be working
on these ideas, but externally to the core DDL: they will become definite
proposals only when they have been demonstrated to work.

S> The 'date' and 'bool' conditions seem logical enough but do we really want 
S> to freeze a date construction into STAR? We have accepted this construct for
S> CIF and it seems pretty logical but are there better ones? I personally feel
S> a bit nervous about acting as God on such celestial matters. Also 'bool' is
S> interesting and logical but what about all of the existing definitions
S> involving (yes|no) -- presumably they remain as char (no-bool)!? 
S> 
S> So may this debate flourish but please take into account the global nature 
S> of these items. If the proposers of additional _type_conditions enumeration 
S> values convince Tony and me that something is a must for the core DDL, we
S> will include it in the initial publication (if they wish it). Otherwise the
S> extra enumeration values (and perhaps su/esd is in this category also) will
S> have to await extensions to the DDL for specific applications such as CIF.

A16.1 - Reopened by (18)D15.1
-----------------------------
B> David's comment on creating a date/time type is excellent. I would 
B> be happy to adopt the "yyyy.mm.dd-hh:mm:ss" form. (I will be less 
B> happy to make all the changes in the pd dictionary, but it would 
B> clearly be an improvement.)
B> 
B> I would appreciate comments on this in a rapid time frame as I would
B> like to see a decision before completing the next draft of the dictionary.

D16.1 e.s.d./s.u.
-----------------
S> Well, David put over my point succinctly. I am not too sure how to
S> interpret Howard's reply....these recommendations are in the pipeline. If
S> these recommendations are made then it will take a considerable time for 
S> them to adopted by the journals and the community. My inclination on seeing
S> these comments is to leave su/esd out of the core DDL entirely, or at least
S> until the matter is resolved.

D17.2 Revised DDL
-----------------
B> For the record I am quite comfortable with Brian's <iucr/mm/restr.lst> 
B> syntax as is. But I am still concerned about namespace uniqueness, unless
B> we set rules for naming. My reason for including a date/time in the file 
B> name was not for version tracking but to better insure uniqueness. The
B> name <private/xplor/defaults> can probably counted on as being unique 
B> but how about <private/smith/defaults> or <private/brown/defaults>?
B> (Interesting question: what is the most common surname for
B> crystallographers, stay tuned for the electronic world directory).

(Currently "Wang", with "Tanaka" in second place - but we haven't yet got
the UK and USA entries.)

B> Perhaps we can let the uniqueness issue go for include files, at least
B> for now, but I do need to resolve it very soon for inter-file references 
B> in cifdic.P94 (e.g. _pd_dataset_id). It would be nice to have a single 
B> method for addressing uniqueness for both applications.
B> 
B> At a minimum, I would like to see two concepts implemented inside
B> include files:
B>   (1) a beginning of file marker
B>   (2) a file name label
B> The reason for (1) is that if CIFs will be concatenated, it will be 
B> necessary to have a standardized (non-comment) mechanism for determining
B> where files begin. (2) allows the file name to be exchanged. Note that 
B> both of these ideas could be implemented by requiring a line such as 
B>    _file_name  iucr/mm/restr.lst
B> as the first non-comment line in the file to be included.

S> In short I am unconvinced by any of the arguments for complex filename
S> specifications to appear in the core DDL. Everyone has different views on 
S> how this should be done -- and this is usually the telltale sign that it
S> must be kept simple in the general definition. If specific applications wish
S> to introduce their own special machine-specific site-specific constructs, 
S> so be it. The _include_file value will be a character string -- no special 
S> constructs whatsoever in the core DDL specifications.
S> 
S> Unless there are convincing arguments to the contrary, _file_version_id will
S> be included in the DDL. Brian T may not be pleased with my lack of
S> imagination with the filenames, but he must poutingly accept that I did
S> pick up on his plea for tighter identification of the included file! 

Regards
Brian