Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF 1.5

Title:
Before we close off the discussion on CIF1.5 I just want to put in my final 2 cents worth.

As I mentioned before, I see no point in introducing CIF1.5 which will only muddy the waters and lead to total confusion.  CIF1.5 is quite unnecessary.

1. Legacy software will not be able to read CIF1.5 any more than it will be able to read CIF2.0 files, so we might as well go directly to  CIF 2.0.  And how quickly will the legacy software be converted to output CIF1.5 files anyway.  You need a 10 year lead time for changes in packages such as SHELX and another decade before people upload the latest version.  Even then they will produce CIF1.1 output that they can load into their other programs.

2. For the forseable future DDLm applications will have to have a CIF1.1 lexer and a preparser to convert legacy files into CIF2.0 mode.

3. What dictionaries will be used for files written in CIF1.5?  It will be difficult enough to find volunteers to convert DDL1 to DDLm, and I have no idea if there are any plans to convert DDL2 to DDLm dictionaries.  If CIF1.5 will use DDLm then why not just go straight to CIF2.0?

4. CIF2.0 datafiles can look almost exactly like CIF1.1 datafiles except for a few datanames and some undelimited data values that include forbidden characters.  Most people will not notice the difference between CIF2.0 files and regular old fashioned CIFs.  Indeed many CIF1.1 data files could probably be read in with CIF2.0 parsers without a problem.  The biggest problem are the DDL2 datanames that contain 'U[1][2]', but these are not found in DDL1.  Since the whole DDL2 data archive is centrally held I assume could it could be easily converted (if it was thought worthwhile),  If there are problems in DDL1 they are confined to one or two datanamens.  Undelimited data values containing illegal characters could be a problem.  The CIF1.1 lexer and preparser mentioned in 2 above will deal with all of these.

5. DDLm does not require that its lists, vectors and matrices be entered as arrays.  dREL allows all of these new CIF2.0 constructs to be reconstituted from their primitives as required.

The future as I foresea it will see everone carrying on with current software and CIF1.1 datafiles as long as they want.  CIF2.0 software will be developed to take advantage of the new features, but with a CIF1.1 front end to carry out the minimal required conversion to CIF2.0, such applications will be able to read all existing and future CIFs of every stripe.  Eventually CIF1.1 legacy software will die or be converted to CIF2.0 and the rest of the world will painlessly convert to to CIF2.0 data files, probably without the ueser even noticing.

I think we are imagining monsters lurking behind trees even in a treeless desert.

CIF1.5 should be dropped and not resurrected, and I am prepared to debate this with Herbert privately (so as not te waste everyone else's time)
if he is not convinced.

David



Brian McMahon wrote:
Dear Colleagues

I agree with James. The remit of this group was to finalise DDLm. An
early conclusion was that this necessarily involved syntax changes at
the STAR level, and the consequent discussions have revolved around
the idea of providing a specification for CIF (essentially at the
syntax level) that took advantage of these syntactic changes and
allowed uniform handling of CIF data files and DDLm dictionaries. For
me, the immediate benefit of these discussions has been a much more
complete account of what needs to be done upstream, at the STAR level,
to accommodate the changes that are desirable in downstream (CIF and
DDLm applications) at some point.

So, for example, the STAR spec needs formally to be revised to allow
Unicode character sets (certainly UTF-8, which is what we settled on
for CIF; as far as I recall, it's still possible that the STAR
revision could allow other Unicode encodings that Herbert needs
for imgCIF, and I'd be interested in knowing whether the new spec
could also allow the inclusion of full binary data streams so that
CBF could properly become one of the STAR family of formats). There
must also be the new delimiter characters and formal rules for
handling list items.

We've developed these conclusions by using various use cases and
Gedankenexperimente, but we've not, in the main, been driven by the
need to meet real problems currently difficult of solution in the
community. Indeed, recent work with embedded visualisation scripts and
incorporation of TeX mathematical fragments into CIFs destined for
publication in Acta show that there's much more that can still be
achieved within the existing syntactic framework.

So let us complete the job of finalising the specifications (STAR++,
DDLm, CIF2.0), and then involve the wider community in discussing
how, when and if they are to be implemented.

Brian

On Tue, Dec 01, 2009 at 02:30:09PM +1100, James Hester wrote:
Dear Herbert and colleagues,

Little quibble: I wrote 'one more type' rather than 'more than one type'.

Anyway, I suggest that we concentrate on finalising CIF2.0 syntax, then put
a draft out for discussion in the broader community, and if there is
sufficient feedback to the effect of 'we need an intermediate format', then
we can address the issue of CIF1.5.  Addressing it now distracts us from the
task of putting CIF2.0 to bed, which we will still need to do in any case.

On Tue, Dec 1, 2009 at 11:17 AM, Herbert J. Bernstein <
yaya@bernstein-plus-sons.com> wrote:

Dear James,

 Please look at the following part of your first paragraph:


"with a commitment to support CIF1.1 for the long term and a guaranteed way
to distinguish the two types of data files."

and please look at the following part of your second paragraph


"Furthermore, they now have to support one more type of file going into the
future."

I seem to be missing something.  If we are going to support CIF 1.1 for
the long term and we are going to have CIF 2 be a very different file type,
then it is not CIF 1.5 that will cause software devlopers to have
to support one more file type going into the future, but the fundamental
decisions made by this group.

If you support CIF 1.1 and a very different CIF 2, then you are going to
end up with mixed files, i.e. multiple ad hoc CIF 1.5 (or actually CIF
1.55) files.  All I am doing is proposing to formalize what is going to
happen anyway.

 I've had my say.


 Regards,
   Herbert


=====================================================
 Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
       Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Tue, 1 Dec 2009, James Hester wrote:

 (Note to those reading this later: this continues a thread started within
the 'space as
list item separator' thread.  I recommend reading those messages before
continuing on
here).

(For those who came in late:
We flirted with the idea of a minimally disruptive path from CIF1.1 to
CIF2.0 back in the
beginning of this group (late September/early October, I believe) , and
ended up choosing
to define one maximally disruptive CIF2.0 standard together with a
commitment to support
CIF1.1 for the long term and a guaranteed way to distinguish the two types
of data files.)

Picking up the CIF1.5 discussion...
Introducing CIF1.5 is a further source of confusion.  Apart from this, it
produces extra
workload for software authors.  Herb has essentially defined CIF1.5 as
CIF1.1 plus new
syntactical elements (or in other words CIF2.0 minus character set
limitations and UTF8).
So in order to support CIF1.5, authors of both CIF reading and CIF writing
software have
to add this new syntax.  Then when they decide to support CIF2.0, they
have to once again
revisit their software.  I would have thought it far more sensible to ask
them to update
and distribute their software only once.  Furthermore, they now have to
support one more
type of file going into the future.

I see absolutely no benefit in this idea.

On Tue, Dec 1, 2009 at 9:40 AM, Herbert J. Bernstein <
yaya@bernstein-plus-sons.com> wrote:
     Dear James,

      The point is that we will need to make it easy for people working
with
     CIF 1 and CIF 1.1 based tools to cobble together valid CIF 2 data.
 The
     most important bit will be a way to include vectors and matrices in
their
     data.  This will allow them to do it.

      Please note that it hase taken several years to just get to the
point
     where we are beginning to rigorously define CIF 2.  If we are lucky,
it
     will only take a few years to have a full set of tools to allow users
     and software writers to reliably produce true CIF 2 data.

      Regards,
        Herbert

     =====================================================
      Herbert J. Bernstein, Professor of Computer Science
       Dowling College, Kramer Science Center, KSC 121
            Idle Hour Blvd, Oakdale, NY, 11769

                     +1-631-244-3035
                     yaya@dowling.edu
     =====================================================

On Tue, 1 Dec 2009, James Hester wrote:

     Dear Herbert: as CIF 1.1 doesn't define lists, I'm not sure why you
     suggest that the
     example below is a valid tag.

     On Tue, Dec 1, 2009 at 12:36 AM, Herbert J. Bernstein
     <yaya@bernstein-plus-sons.com>
     wrote:
          Sorry something got lost in the prior message.  It should have
          read:

                Dear Colleagues,

                 Back to the question of commas.  If you accept the
                 desirability of having a CIF 1.5, commas in lists
                 become very useful. Someone with
                 a CIF 1.1 editor will be able to prepare a CIF 1.5 file
                 for many useful cases by doing all lists with commas
                 and no embedded blanks as long as they can make their
                 lists fit on single lines.

                 In CIF 1.1

                [[1,2,3],[4,5,6],[7,8,9]]

                is a valid value for a tag, but

                [[1 2 3] [4 5 6] [7 8 9]]
     is not.

     No, neither example is a valid CIF 1.1 tag.  CIF 1.1 explicitly
     excludes brackets as the first character of a non-delimited string.


                Having the option of commas in lists will help to smooth
                the transition for at least some people.
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.