Discussion List Archives

[Date Prev][Date Next][Date Index]

(71) CIF Core 2.0 and SHELXL; Review of COMCIFS

  • To: COMCIFS@iucr.ac.uk
  • Subject: (71) CIF Core 2.0 and SHELXL; Review of COMCIFS
  • From: bm
  • Date: Mon, 23 Jun 1997 14:40:56 +0100
Dear Colleagues

I still have a number of past submissions on various items to roll out, and
plan to do so over the next few weeks so as not to swamp us with too much at
once. If anyone feels there is something important that I'm overlooking,
please let me know. this time round, I have comments on the implementation
of the new core dictionary, and a first raft of remarks on the review of our
future directions called by David. I would recommend that all our listeners
contribute to this discussion if they have anything to offer, even if - or
especially if! - they don't think this is at all the right way to do things.


D71.1 Implementation in SHELXL-97 of core dictionary version 2
--------------------------------------------------------------
My apologies to George Sheldrick for the delay in posting his comments
below, which arrived as we were trying to tie up the powder and mmCIF
dictionaries. They serve as a useful summary of the issues that arise in
translating our formal definitions to a set of working data identifiers in
production code.
 
G> I have now released a new version of my programs - I hope for the last time
G> this millennium - and thought that you and COMCIFS might be amused by my
G> experiences in implementing the January 1997 Acta requirements for CIF
G> authors.
G> 
G> Firstly, I should say that checkcif@iucr.ac.uk is a excellent facility that
G> made my task much easier.  
G> 
G> SHELXL-97 should generate all 'essential' CIF items except those that begin
G> with _publ, using the new synonyms etc. You will know that the new program
G> has been used by _audit_creation_method 'SHELXL-97'.  I prefer this to
G> referring to a dictionary that I have no control over (the user is also at
G> liberty to use a dictionary of his choice when he adds the _publ items
G> etc.).
G> 
G> The clarification of which reflections were used for refinement and which
G> were used for calculating an "observed" R is a welcome improvement (I
G> suggested > something like this to Syd when the original core was
G> crystallizing out). A minor side-effect is that I have changed
G> _refine_ls_wR_factor_obs to _refine_ls_wR_factor_gt (for consistency)
G> although the former but not the latter is in your current dictionary,
G> because we need this value for our own purposes. I suggest that you
G> include it so that checkcif does not confuse users with a warning.

I'm not sure at the time of writing whether the lack of any reference to
_refine_ls_wR_factor_gt is merely an oversight; it is now time to gather
together and implement the extensions to core version 2.0 that were implied
by the new data names introduced in the Acta C Notes for Authors, and I
shall look carefully at whether this should also go in.

G> The inclusion of completeness is also highly desirable. I was surprised that
G> it is defined in terms of theta instead of resolution. What does one do with
G> Laue data or data collected at two different wavelengths? In any case, for
G> synchrotron data (which we regularly employ for papers published in Acta)
G> resolution (in Angstroms) is more instantly meaningful than theta.
G> SHELXL-97 calculates the completeness exactly by counting reflections, using
G> the same cell for the refinement data and theoretically possible data,
G> leaving out systematic absences (and 0,0,0) in both cases. I have
G> assumed Friedel's law for the purposes of this calculation although there
G> are cases (e.g. absolute structure determination) where the completeness
G> without assuming it would also be of interest.
G> 
G> _atom_site_disorder_assembly is also generated automatically, and includes
G> in such an assembly an atom directly bonded to atoms in the various
G> disordered components even if it itself has full occupancy. 
G> 
G> The new hydrogen bond CIF items can be generated by SHELXL-97 if the author
G> requests them. 
G> 
G> The comments produced by checkcif are now a little out of phase; you need
G> to add '_type' to '_diffrn_measurement_device' and change 'obs' to 'ref'
G> in '_refine_ls_wR_factor_obs'.  Incidentally the correct value of 
G> '_diffrn_detector_area_resol_mean' is 8.192 for the standard 1K Siemens'
G> SMART detector; for most image plates (that employ a spiral scan) this
G> item is vitually irrelevant since the data have been interpolated anyway.

Thank you for the indications of ways in which checkcif needs to be brought
up to date.

G> One matter which did not cause me any problems with SHELXL-97 but did screw
G> up my own very primitive program for making tables from CIF files called
G> CIFTAB was the surprise increase in the allowed length of CIF identifiers
G> from the previous length of 32 bytes
G> ('_diffrn_measured_fraction_theta_full')!
G> Maybe you need to change the value quoted on the IUCr CIF www page! 

Again, thanks for drawing attention to the fact that this is not
sufficiently well documented (it is discussed on the CIF home page under the
heading "Current CIF Dictionaries", but obviously this wasn't sufficient).
An important step that I shall look into shortly is the most efficient way
of establishing an alerting service and discussion forum for software
developers; suggestions on this are welcome. It's just possible I may have
time to put something together before the St Louis meeting.

G> I overlooked one of the CIF format changes, and so many people have
G> now installed SHELX-97 that it is too late for me to change it. 
G> I used the old form of '_refine_ls_weighting_scheme' and 
G> had overlooked that (for no reason sufficiently fundamental to compensate
G> for the confusion and effort by everyone who has to change their programs)
G> you have replaced it by two separate items.  I'm afraid that you will have
G> to allow your software to continue to accept the old format.

Again I feel that our documentation of changes to the dictionaries is not as
good as it should be, and this is another issue I wish to address in the
course of renovating the CIF WWW pages. (Any volunteers for extending the
web documentation will of course be greeted with open arms.)

G> Details of the new SHELX release are available on
G> http://linux.uni-ac.gwdg.de/SHELX/

========================

Current discussions
-------------------

D70.1 Review of COMCIFS
-----------------------
 From the Chairman of the EPC (Howard Flack) - an observer.

H>   The observer status for COMCIFS has proved useful to the chairman of
H> the EPC. This has enabled me to see clearly in what direction COMCIFS
H> work was proceeding in its relation to electronic publication and to aid
H> directly in the public announcement of new "products". I would not like
H> ex-officio to be promoted into a status where I had voting rights. The
H> responsibility for a complete study of all COMCIFS documents which might
H> be of only cursory concern to the interests of the EPC would be excessive. 


 From Phil Bourne, one of our consultants and a long-time promoter of
CIF across the field of ctructural biology:

P> Dear Brian: I have read David's position paper and rather than take issue
P> at this time I would like to raise an additional item to be
P> included in the discussion. 

Fine, though now is precisely the time to take issue with any other
suggestions or proposals that might determine the future shape of the
committee...

P> STAR and DDL2 provide a mechanism for maintaining data which could have
P> little or nothing to do with crystallography. Specifically, Mike Gribskov
P> and I have developed dictionaries covering protein and DNA sequences and
P> Chris Smith and I have developed a dictionary covering some aspects of
P> enzymology. Together we have developed a further small features dictionary
P> to describe any comparative feature of interest, for example, features of
P> a multiple sequence alignment, structural features associated with a
P> common catalytic core found in multiple proteins etc. Taken together we
P> are trying to use these to describe data we have been collecting on the
P> protein kinase family of enzymes - 30+ structures, 2000 sequences etc. 
P> 
P> Issues that come to mind: 
P> 
P> 1. Should COMCIFS have any role in these dictionaries?  2. Is COMCIFS or
P> the IUCr ever likely to curtail our use of STAR/DDL2 for non-standard
P> dictionaries?  3. If for-profit organizations were to access a database
P> based on STAR and were charged for the privilege, or paid to have such a
P> database locally and defined their own additions to the dictionary, what
P> are the implications re the copyright and patent held by the IUCr? 
P>   
P> The dilemma is on one hand there is a need to maintain a high standard,
P> and on the other, the more disciplines that use the standard, the more
P> software will be developed and the more interoperable and comparable will
P> be data from diverse sources. 
P> 
P> In short, should the role of COMCIFS be to recommend to the IUCr how
P> STAR/CIF should be used and thus lead to a clear policy statement made by
P> the IUCr about the general use of STAR/CIF by the scientific community at 
P> large. Comments?

This from Syd Hall, full member of COMCIFS, progenitor of STAR and coauthor
of the original core dictionary:

S> I would prefer to address sections 7 and 8 with a brief summary of 
S> my views on how these should operate in the future.
S> 
S>                      Future Membership
S> 
S> I believe that COMCIFS should be composed of voting MEMBERS (max 6)
S> and non-voting OBSERVERS (unlimited).
S> 
S> COMCIFS is an "expert" cross-disciplinary committee which acts as
S> a watchdog for the IUCr on definitions and standards for the 
S> electronic exchange and archiving of data.
S> 
S> I believe that it should be structured around an executive appointed
S> by the IUCr EC composed of a Chair, Vice-chair and Secretary. This
S> executive triumvirate would nominate and recommend the three additional
S> members based on their expertise and experience in CIF data handling.
S> These nominations would need to be approved by the EC, as per the
S> membership of JComm. 
S> 
S> The COMCIFS membership should be reviewed every 3 years at the IUCr
S> congress EC meetings (perhaps allowing for adequate overlap).
S> 
S> Observers would be appointed by COMCIFS because of specific tasks 
S> (e.g. dictionary development) or through nomination by Commissions etc.
S> 
S> The Chair of the IUCr Data Base Committee should be, at the very least,
S> automatically appointed as an Observer (perhaps even the Vice-Chair
S> of COMCIFS).
S> 
S> 
S>                  Future Mode of Operation
S> 
S> I would like to see a more structured approach to future dictionary
S> approvals. I suggest that a three tier system would be far more 
S> efficient than the present arrangement.
S> 
S>     COMCIFS
S> 
S>       project sub-committees
S> 
S>          project working parties
S> 
S> 
S> As an expert committee one of COMCIFS's prime functions must be to 
S> recommend policy to the EC and keep it informed of ongoing develop-
S> ments in the field. It must accept, by a majority vote, all issues 
S> (new dictionaries, revisions, etc.) before these are sent to the EC
S> for final approval.
S> 
S> Below COMCIFS there should be sub-committees appointed to handle the
S> detail of specific projects. This should be a limited number of people
S> (say 3-4) composed of Members and Observers of COMCIFS with specific
S> expertise and knowledge in the project topic. The function of the 
S> sub-committee will be to interact closely with the chair of the 
S> working party to ensure that any submission meets all of the formal
S> requirements. It will also offer expert guidance to working parties
S> who have, for instance, limited knowledge of how the DDL is used.
S> Most importantly, material will not be submitted to COMCIFS until 
S> the sub-committee agrees that it is ready to be voted on.
S> 
S> Members of project working parties are specialists in the assigned
S> task (e.g. new dictionaries, software). Their formation may be at the 
S> instigation of COMCIFS, or a Commission (after consultation with
S> COMCIFS). The chair of a working party will coordinate the project
S> and be responsible for involving as many workers as he/she deems
S> necessary. The chair would be appointed by the initiating body.
S> The COMCIFS sub-committee interacting with the working party would
S> be appointed by COMCIFS after the formation of the working party,
S> and normally the chair of the working party would be appointed as
S> an Observer on COMCIFS.
S> 
S> The intention of this structure is that there be a clearer delineation
S> of responsibilities and expertise. The majority of detailed communications
S> (at the COMCIFS level) would involve the sub-committees only. This will
S> reduce the very heavy workload currently being borne by the Secretary
S> and Chair of COMCIFS, and only involve those who have specific expertise
S> in the project. The current modes of communication are wasteful at 
S> several levels, and this tends to discourage rather than encourage 
S> detailed interaction.
S> 
S> Hope that these suggestions will be of some use.

Regards
Brian