Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Summary of proposed CIF syntax changes

Title:
Thank you for this David; it clarifies (for me at least) the relationship between CIF2 syntax and DDLm.
Furthermore, it highlights the real issues involved in realizing its potential (far better than my unqualified comments about
how people might react to reading a document that announces a new CIF2 specification).

Cheers

Simon




From: David Brown <idbrown@mcmaster.ca>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Tuesday, 8 December, 2009 17:01:09
Subject: Re: [ddlm-group] Summary of proposed CIF syntax changes


The only thing that will not be possible when DDLm dictionaries are available is for legacy software to read CIF2 data files that contain advanced features.  It should be possible to read some CIF2 data files with legacy programs if they are properly written, specifically the underscores and periods in the data names would have to be written according toe the DDL1 or the DDL2 conventions, a few data names would have to be avoided, and of course the advanced features such as arrays would not be available, but in any case the legacy software would not be designed to read arrays and would have to be modified, after which it would no longer be legacy software.  However, in DDL1 and DDL2 dictionaries arrays are defined by their individual elements and these are also available in DDLm, so that any DDLm application should be able to write a CIF2 data file that could be read by legacy software, but it would have to be carefully constructed, and would be different depending on whether the softwars was written for DDL1  or DDL2.  But this CIF2 data file, as well as the software that was reading it, would not contain any of the advanced DDLm features.  So if one is interested in advanced features one should stay with DDLm.

Going the other way, if it is to be complient with the basic rquirements _any_ CIF2 application will be able to read transparently any version of CIF data file.  Thus if you have a CIF1 data file (either DDL1 or DDL2 compliant) there should be no difficulty is reading it with a properly constructed DDLm dictionary application, and all the advanced features would be available.  If you insist on using advanced features of CIF2 data files, then write CIF2 data files, but recognise that any legacy software will not be able to use the advnaced features.  If necessary update the legacy software to be fully CIF2 compliant.

The conversion of all the CIF dictionaries to DDLm will happen slowly.  Syd has written a program to convert DDL1 and 2 dictionaries into DDLm dictionaries, but that is only a first step in the conversion.  Clearly a dictionary so converted will contain none of the desirable DDLm features, so these all have to be put in by hand.  This is not a trivial exercise and one might as well do the job properly rather than mess about with a number of home-made partial conversions.  It is possible that imgCIF will convert to CIF2 early and there may be some problems with using these ingCIF2 data files in a DDL2 environment so that something like a local CIF1.5 may be useful until the rest of the world catches up.  However I do not see the small cell community rushing to convert their software to DDLm, so they will continute to submit structures to Acta Cryst in CIF1 format for some time to come, but this does not stop the editorial office from converting their software to CIF2 in order to take advantage of the new features of DDLm.  I am not familiar with the situation in the macromolecular world but I don't sense any urgency to convert the DDL2 dictionaries to DDLm, but when the time comes to do so, it will be _much_ more efficient to do a clean conversion then muddy around with partially converted dictionaries that may not be able to read a particlar DDLm compatible data file even when it is correctly constructed.

I am currently converting the core dictionary to DDLm and I am making some progress, but the current discussion has put that project on hold, and even when I get back to the job, there are lots of questions as to what should go in the definitive approved dictionary.  How many methods should we have for any given item and how do we make sure that the methods are compatible?  How do we allow the user to signal which method to use if more than one is allowed, and if it is not, which method do we include?  There are many other similar questions that need to be aired before we can produce the definitive dictionary.  Of course anyone can write their own dictionary, but I would argue against trying to write hybrid dictionaries.  The aliases in DDLm provide all the contact one needs between the old and the new data files.

      -----------MORE BELOW, KEEP ON SCROLLIN'--------------



Joe Krahn wrote:
Obviously, the right thing to do for implementing CIF2 features is to 
require a full dictionary update/conversion/replacement with a DDLm
dictionary.

However, it is useful to begin implementing some CIF2 features, partly
to demonstrate to the CIF user community that it is useful, and worth
the effort to switch to CIF2. In that case, some sort of intermediate
code can be used, which might be called CIF1.5, CIF2-alpha, or anything
else you want, but does not really need to be defined in the snatdards.

Using a hybrid CIF1.5 as a way of demonstrating the virtues of CIF2 would be a disaster.  The first time a user tried to read a hybrid data file with an incompatible pseudo-DDLm dictionary you will have lost your first convert.  In any case why try to convert people?  If you have a good product that offers advantages over the old, people will discover and convert in their own good time.  It is easy to do more harm than good by trying thrust this down unwilling throats.  The important thing is to have a good clean product that we know works consistently according to some clear philosphy which remains to be developed.

It might be useful to have a small utility (in Perl or Python) to 
auto-generate a DDLm dictionary with default CIF2 name aliases, such as:

"-" -> "_"
"/" -> "_over_"
"[]" -> "_array_"

Then you can derive a quick-and-dirty CIF2 dictionary, rather than
quick-and-dirty CIF2 code.
The items that contain these symbols are aliased and have perfectly acceptable DDLm names and these are all listed in the DDLm dictionaries.  However, if you want to write such a utility, you might as well code it to replace the offending CIF1 names by the correct DDLm name and have a fully compliant CIF2 data file rather than a quick and dirty one.  There are probably not more than a dozen names that need conversion.  And why do you want to write a quick and dirty dictionary when we will have a clean and permanent one available?

   -------------STILL MORE BELOW -------------------

Joe


SIMON WESTRIP wrote:
When I said that the 'dictionaries will need to be re-written' I meant that
if anyone wanted to start using e.g. the list structures, they would
most likely
not be able to within the confines of the dictionaries that they
currently use.
For example, if I wanted to write a dictionary extension to any of the
DDL1 dictionaries currently used by Acta C, I would not be able to write
it in CIF2
because the DDL1 dictionaries violate CIF2?
That is, as far as I can see, we cannot make any use of the new useful
features of
CIF2 when working with CIF1-based CIFs and dictionaries until there are CIF2
versions of those dictionaries?
That is exactly why I am trying to convert the core dictionary to CIF2 as fast as is reasonably possible given that we can't even agree on the syntax.  One cannot even write an extension without knowing the syntax.  And as I indicate above,
unless you have some very limited purpose in mind, as soon as you start writing the extension, many more questions suddenly appear, and what looked like a quick shortcut leaves you mired in unresolved questions.  You may eventually achieve a working dictionary, but it may not work well and may be impossible to extend.  And it will have wasted many (wo)man-hours of labour that could have been profitably spent in doing the job right the first time.

Let us not lose sight of the eventual goal in our enthusiasm for quick fixes that are really no fixes.

David

   -----------END OF DAVID'S CONTRIBUTION--------------------


Cheers

Simon

------------------------------------------------------------------------
*From:* David Brown <idbrown@mcmaster.ca>
*To:* Group finalising DDLm and associated dictionaries 
<ddlm-group@iucr.org>
*Sent:* Monday, 7 December, 2009 19:06:03
*Subject:* Re: [ddlm-group] Summary of proposed CIF syntax changes



Joe Krahn wrote:
Aren't the following in the core dictionary, and need changing?

_symmetry_space_group_name_H-M
_refine_ls_shift/esd_max
_refine_ls_class_[]

As I suggested below, CIF2 code should allow CIF1 names, possibly with
warnings, and just exclude them from dREL, unless they can be mapped
dictionary aliases.

It is true that these names, particularly the last one, contain 
characters that are not permitted in CIF2. However, a CIF1 data file
will always require a CIF1 lexer for which these data names present no
problem. If it is to be backwardly compatible, any application designed
to make use of DDLm dictionaries will require a CIF1 as well as a CIF2
lexer. Which lexer is used depends on the presence or absence of the
magic code at the beginning of the file. The lexer then passes these
names and values (as undifferentiated strings) to the DDLm dictionary
where the first task on receiving any file from the CIF1 lexer is to
check the data names (which can be recognized from their leading
underscore) against the DDLm data names and their aliases. As soon as a
match is found, the official DDLm name is substituted for the original
CIF1 data name (in the few cases where this is necessary), and from here
on the data file is a conforming CIF2 data file to which methods can be
applied without any concern for the original format of the data file.
Note that DDL1 and DDL2 dictionaries are never used or consulted in this
process. The only dictionary used is one written in DDLm, so no changes
are needed to DDL1 and DDL2 dicrionaries.

Of course if someone decides to use a hybrid CIF1.5 format, a whole new
set of problems arises. But I suggest that we first of all agree on
CIF2, and then, if necessary, consider what CIF1.5 might look like, but
my view is that CIF1.5, if used at all, should be considered as a
temporary non-conforming standard that should not be used for archival
purposes. Its main use, as I gather from Herbert, is to allow
hand-entering of vectors and arrays. a use that is probably restricted
to one or two specialized situations. In most cases this information
will be generated by computer, and DDLm dictionaries already have
methods to convert the array elements defined in DDL1 and DDL2
dictionaries into the arrays that would normally be used by DDLm.

David


Joe


David Brown wrote:
Simon,

I am not sure what changes are needed in CIF1 dictionaries. I would be
interested to know since any changes have to be passed through the
coreCIF Dictionary Maintenance Group that I chair. It is my
understanding that no changes are needed, and if they are they must be
changes that do not invalidate the reading of any of the archive.

David

SIMON WESTRIP wrote:
I understand the name alias approach - what I was trying to highlight is
the fact that current dictionaries will need to be re-written and this
in itself might be more of an issue when selling CIF2 than the fact
that commas
as list separators could be on the table.

Cheers

Simon

------------------------------------------------------------------------
*From:* Joe Krahn <krahn@niehs.nih.gov> *To:* Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> *Sent:* Friday, 4 December, 2009 20:23:09 *Subject:* Re: [ddlm-group] Summary of proposed CIF syntax changes SIMON WESTRIP wrote:
I agree that a "rationale for all of the quotation rule
changes" might be welcome - I can imagine that at first glance many
people
will wonder what the """ and ''' are for.

I'm not sure that hinting that comma-separated lists
are also a possibilty is going to help matters?
My willingness to support commas is partly because Herbert finds it
usefule, and has already implemented it. Maybe the comma-delimited
variant can be useful as a CIF 1.5 transitional form?

Afterall, when it comes down to it, until there are
dictionaries that comply to CIF2, many disciplines
that already make use of CIF will find it difficult to
adopt CIF2 because their current dictionaries will be invalidated by
the restrictions on the dataname character set?
Name changes are not uncommon, at least for mmCIF. Hopefully, dictionary
aliases will ease the conversion. It would also help if early CIF2
software should probably allow CIF1 names within the CIF2 syntax, with
warnings, and just exclude them from dREL.

Joe
Cheers

Simon



------------------------------------------------------------------------
*From:* Joe Krahn <krahn@niehs.nih.gov <mailto:krahn@niehs.nih.gov>>
*To:* Group finalising DDLm and associated dictionaries
<ddlm-group@iucr.org <mailto:ddlm-group@iucr.org>>
*Sent:* Friday, 4 December, 2009 17:49:01
*Subject:* Re:
[ddlm-group] Summary of proposed CIF syntax changes

The summary did not include a rationale for all of the quotation rule
changes, which is the area that makes the least sense to me.

The section defining the rationale for not allowing lexical characters
outside the 7-bit range (the first Reasoning paragraph) might mention
that it affords faster parsing by deferring any UTF-8 conversions.

I see that the commas were left out of the list syntax. It may be good
to put a short paragraph about the alternative comma-delimited syntax,
so that other people reviewing the proposal have a chance to comment.

Thanks,
Joe Krahn
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org <mailto:ddlm-group@iucr.org>
<mailto:ddlm-group@iucr.org <mailto:ddlm-group@iucr.org>>
http://scripts.iucr.org/mailman/listinfo/ddlm-group

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org <mailto:ddlm-group@iucr.org> http://scripts.iucr.org/mailman/listinfo/ddlm-group ------------------------------------------------------------------------ _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
    
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
    
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.