[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Summary of proposed CIF syntax changes

To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Summary of proposed CIF syntax changes
From: David Brown <idbrown@mcmaster.ca>
Date: Tue, 08 Dec 2009 12:01:09 -0500
In-Reply-To: <4B1D99F5.8090304@niehs.nih.gov>
References: <20091204093823.GA10999@emerald.iucr.org> <4B194B8D.5050806@niehs.nih.gov> <188546.55653.qm@web87008.mail.ird.yahoo.com> <4B196FAD.2040706@niehs.nih.gov> <77447.47704.qm@web87009.mail.ird.yahoo.com> <4B1D2158.8040402@mcmaster.ca> <4B1D3FC7.1000203@niehs.nih.gov> <4B1D521B.1090901@mcmaster.ca> <561185.57703.qm@web87012.mail.ird.yahoo.com><4B1D99F5.8090304@niehs.nih.gov>

Title:

The only thing that will not be possible when DDLm dictionaries are available is for legacy software to read CIF2 data files that contain advanced features. It should be possible to read some CIF2 data files with legacy programs if they are properly written, specifically the underscores and periods in the data names would have to be written according toe the DDL1 or the DDL2 conventions, a few data names would have to be avoided, and of course the advanced features such as arrays would not be available, but in any case the legacy software would not be designed to read arrays and would have to be modified, after which it would no longer be legacy software. However, in DDL1 and DDL2 dictionaries arrays are defined by their individual elements and these are also available in DDLm, so that any DDLm application should be able to write a CIF2 data file that could be read by legacy software, but it would have to be carefully constructed, and would be different depending on whether the softwars was written for DDL1 or DDL2. But this CIF2 data file, as well as the software that was reading it, would not contain any of the advanced DDLm features. So if one is interested in advanced features one should stay with DDLm. Going the other way, if it is to be complient with the basic rquirements _any_ CIF2 application will be able to read transparently any version of CIF data file. Thus if you have a CIF1 data file (either DDL1 or DDL2 compliant) there should be no difficulty is reading it with a properly constructed DDLm dictionary application, and all the advanced features would be available. If you insist on using advanced features of CIF2 data files, then write CIF2 data files, but recognise that any legacy software will not be able to use the advnaced features. If necessary update the legacy software to be fully CIF2 compliant. The conversion of all the CIF dictionaries to DDLm will happen slowly. Syd has written a program to convert DDL1 and 2 dictionaries into DDLm dictionaries, but that is only a first step in the conversion. Clearly a dictionary so converted will contain none of the desirable DDLm features, so these all have to be put in by hand. This is not a trivial exercise and one might as well do the job properly rather than mess about with a number of home-made partial conversions. It is possible that imgCIF will convert to CIF2 early and there may be some problems with using these ingCIF2 data files in a DDL2 environment so that something like a local CIF1.5 may be useful until the rest of the world catches up. However I do not see the small cell community rushing to convert their software to DDLm, so they will continute to submit structures to Acta Cryst in CIF1 format for some time to come, but this does not stop the editorial office from converting their software to CIF2 in order to take advantage of the new features of DDLm. I am not familiar with the situation in the macromolecular world but I don't sense any urgency to convert the DDL2 dictionaries to DDLm, but when the time comes to do so, it will be _much_ more efficient to do a clean conversion then muddy around with partially converted dictionaries that may not be able to read a particlar DDLm compatible data file even when it is correctly constructed. I am currently converting the core dictionary to DDLm and I am making some progress, but the current discussion has put that project on hold, and even when I get back to the job, there are lots of questions as to what should go in the definitive approved dictionary. How many methods should we have for any given item and how do we make sure that the methods are compatible? How do we allow the user to signal which method to use if more than one is allowed, and if it is not, which method do we include? There are many other similar questions that need to be aired before we can produce the definitive dictionary. Of course anyone can write their own dictionary, but I would argue against trying to write hybrid dictionaries. The aliases in DDLm provide all the contact one needs between the old and the new data files. -----------MORE BELOW, KEEP ON SCROLLIN'-------------- Joe Krahn wrote: Obviously, the right thing to do for implementing CIF2 features is to require a full dictionary update/conversion/replacement with a DDLm dictionary. However, it is useful to begin implementing some CIF2 features, partly to demonstrate to the CIF user community that it is useful, and worth the effort to switch to CIF2. In that case, some sort of intermediate code can be used, which might be called CIF1.5, CIF2-alpha, or anything else you want, but does not really need to be defined in the snatdards.Using a hybrid CIF1.5 as a way of demonstrating the virtues of CIF2 would be a disaster. The first time a user tried to read a hybrid data file with an incompatible pseudo-DDLm dictionary you will have lost your first convert. In any case why try to convert people? If you have a good product that offers advantages over the old, people will discover and convert in their own good time. It is easy to do more harm than good by trying thrust this down unwilling throats. The important thing is to have a good clean product that we know works consistently according to some clear philosphy which remains to be developed.It might be useful to have a small utility (in Perl or Python) to auto-generate a DDLm dictionary with default CIF2 name aliases, such as: "-" -> "_" "/" -> "_over_" "[]" -> "_array_" Then you can derive a quick-and-dirty CIF2 dictionary, rather than quick-and-dirty CIF2 code. The items that contain these symbols are aliased and have perfectly acceptable DDLm names and these are all listed in the DDLm dictionaries. However, if you want to write such a utility, you might as well code it to replace the offending CIF1 names by the correct DDLm name and have a fully compliant CIF2 data file rather than a quick and dirty one. There are probably not more than a dozen names that need conversion. And why do you want to write a quick and dirty dictionary when we will have a clean and permanent one available? -------------STILL MORE BELOW ------------------- Joe SIMON WESTRIP wrote: When I said that the 'dictionaries will need to be re-written' I meant that if anyone wanted to start using e.g. the list structures, they would most likely not be able to within the confines of the dictionaries that they currently use. For example, if I wanted to write a dictionary extension to any of the DDL1 dictionaries currently used by Acta C, I would not be able to write it in CIF2 because the DDL1 dictionaries violate CIF2? That is, as far as I can see, we cannot make any use of the new useful features of CIF2 when working with CIF1-based CIFs and dictionaries until there are CIF2 versions of those dictionaries? That is exactly why I am trying to convert the core dictionary to CIF2 as fast as is reasonably possible given that we can't even agree on the syntax. One cannot even write an extension without knowing the syntax. And as I indicate above,unless you have some very limited purpose in mind,as soon as you start writing the extension, many more questions suddenly appear, and what looked like a quick shortcut leaves you mired in unresolved questions. You may eventually achieve a working dictionary, but it may not work well and may be impossible to extend. And it will have wasted many (wo)man-hours of labour that could have been profitably spent in doing the job right the first time. Let us not lose sight of the eventual goal in our enthusiasm for quick fixes that are really no fixes. David -----------END OF DAVID'S CONTRIBUTION-------------------- Cheers Simon ------------------------------------------------------------------------ *From:* David Brown <idbrown@mcmaster.ca> *To:* Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> *Sent:* Monday, 7 December, 2009 19:06:03 *Subject:* Re: [ddlm-group] Summary of proposed CIF syntax changes Joe Krahn wrote: Aren't the following in the core dictionary, and need changing? _symmetry_space_group_name_H-M _refine_ls_shift/esd_max _refine_ls_class_[] As I suggested below, CIF2 code should allow CIF1 names, possibly with warnings, and just exclude them from dREL, unless they can be mapped dictionary aliases. It is true that these names, particularly the last one, contain characters that are not permitted in CIF2. However, a CIF1 data file will always require a CIF1 lexer for which these data names present no problem. If it is to be backwardly compatible, any application designed to make use of DDLm dictionaries will require a CIF1 as well as a CIF2 lexer. Which lexer is used depends on the presence or absence of the magic code at the beginning of the file. The lexer then passes these names and values (as undifferentiated strings) to the DDLm dictionary where the first task on receiving any file from the CIF1 lexer is to check the data names (which can be recognized from their leading underscore) against the DDLm data names and their aliases. As soon as a match is found, the official DDLm name is substituted for the original CIF1 data name (in the few cases where this is necessary), and from here on the data file is a conforming CIF2 data file to which methods can be applied without any concern for the original format of the data file. Note that DDL1 and DDL2 dictionaries are never used or consulted in this process. The only dictionary used is one written in DDLm, so no changes are needed to DDL1 and DDL2 dicrionaries. Of course if someone decides to use a hybrid CIF1.5 format, a whole new set of problems arises. But I suggest that we first of all agree on CIF2, and then, if necessary, consider what CIF1.5 might look like, but my view is that CIF1.5, if used at all, should be considered as a temporary non-conforming standard that should not be used for archival purposes. Its main use, as I gather from Herbert, is to allow hand-entering of vectors and arrays. a use that is probably restricted to one or two specialized situations. In most cases this information will be generated by computer, and DDLm dictionaries already have methods to convert the array elements defined in DDL1 and DDL2 dictionaries into the arrays that would normally be used by DDLm. David Joe David Brown wrote: Simon, I am not sure what changes are needed in CIF1 dictionaries. I would be interested to know since any changes have to be passed through the coreCIF Dictionary Maintenance Group that I chair. It is my understanding that no changes are needed, and if they are they must be changes that do not invalidate the reading of any of the archive. David SIMON WESTRIP wrote: I understand the name alias approach - what I was trying to highlight is the fact that current dictionaries will need to be re-written and this in itself might be more of an issue when selling CIF2 than the fact that commas as list separators could be on the table. Cheers Simon ------------------------------------------------------------------------ *From:* Joe Krahn <krahn@niehs.nih.gov> *To:* Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> *Sent:* Friday, 4 December, 2009 20:23:09 *Subject:* Re: [ddlm-group] Summary of proposed CIF syntax changes SIMON WESTRIP wrote: I agree that a "rationale for all of the quotation rule changes" might be welcome - I can imagine that at first glance many people will wonder what the """ and ''' are for. I'm not sure that hinting that comma-separated lists are also a possibilty is going to help matters? My willingness to support commas is partly because Herbert finds it usefule, and has already implemented it. Maybe the comma-delimited variant can be useful as a CIF 1.5 transitional form? Afterall, when it comes down to it, until there are dictionaries that comply to CIF2, many disciplines that already make use of CIF will find it difficult to adopt CIF2 because their current dictionaries will be invalidated by the restrictions on the dataname character set? Name changes are not uncommon, at least for mmCIF. Hopefully, dictionary aliases will ease the conversion. It would also help if early CIF2 software should probably allow CIF1 names within the CIF2 syntax, with warnings, and just exclude them from dREL. Joe Cheers Simon ------------------------------------------------------------------------ *From:* Joe Krahn <krahn@niehs.nih.gov <mailto:krahn@niehs.nih.gov>> *To:* Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org <mailto:ddlm-group@iucr.org>> *Sent:* Friday, 4 December, 2009 17:49:01 *Subject:* Re: [ddlm-group] Summary of proposed CIF syntax changes The summary did not include a rationale for all of the quotation rule changes, which is the area that makes the least sense to me. The section defining the rationale for not allowing lexical characters outside the 7-bit range (the first Reasoning paragraph) might mention that it affords faster parsing by deferring any UTF-8 conversions. I see that the commas were left out of the list syntax. It may be good to put a short paragraph about the alternative comma-delimited syntax, so that other people reviewing the proposal have a chance to comment. Thanks, Joe Krahn _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org <mailto:ddlm-group@iucr.org> <mailto:ddlm-group@iucr.org <mailto:ddlm-group@iucr.org>> http://scripts.iucr.org/mailman/listinfo/ddlm-group _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org <mailto:ddlm-group@iucr.org> http://scripts.iucr.org/mailman/listinfo/ddlm-group ------------------------------------------------------------------------ _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group

begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Summary of proposed CIF syntax changes (SIMON WESTRIP)

References:

[ddlm-group] Summary of proposed CIF syntax changes (Brian McMahon)

Re: [ddlm-group] Summary of proposed CIF syntax changes (Joe Krahn)

Re: [ddlm-group] Summary of proposed CIF syntax changes (SIMON WESTRIP)

Re: [ddlm-group] Summary of proposed CIF syntax changes (Joe Krahn)

Re: [ddlm-group] Summary of proposed CIF syntax changes (SIMON WESTRIP)

Re: [ddlm-group] Summary of proposed CIF syntax changes (David Brown)

Re: [ddlm-group] Summary of proposed CIF syntax changes (Joe Krahn)

Re: [ddlm-group] Summary of proposed CIF syntax changes (David Brown)

Re: [ddlm-group] Summary of proposed CIF syntax changes (SIMON WESTRIP)

Re: [ddlm-group] Summary of proposed CIF syntax changes (Joe Krahn)

Prev by Date: Re: [ddlm-group] Summary of proposed CIF syntax changes

Next by Date: Re: [ddlm-group] Summary of proposed CIF syntax changes

Prev by thread: Re: [ddlm-group] Summary of proposed CIF syntax changes

Next by thread: Re: [ddlm-group] Summary of proposed CIF syntax changes

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Summary of proposed CIF syntax changes