Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Draft and analysis of proposed change to DDL1.4 to fix_atom_site_aniso_label

I think this would be a wonderful topic to discuss in Florence.
One issue to consider is the very important need to be able
to reliably load CIF data into a database, be able to update
the information in the database, and then be able to extract
views of the data from the database that serve the needs of
different user communities.  To this end, I would like to
suggest that we embed within the DDL structure a clear
clean path from any CIF to a fully normalized relational
datbase structure.  DDL2 is already very close to that ideal
as is STARDDL. As changes are made in DDL1 or in merging DDL1
with STARDDL to make a new-DDL, it would be helpful if we did
that there  as well.  This does not mean that every CIF has to be
decomposed into its lowest level tables, but it should mean
given any CIF and two new-DDL-compliant applications loading
the same CIF into a fully normalized internal database, they
will alway end up with the same table structure internally,
because the relevant dictionary had made all the category
and parent-child relationship clear.

This means that the core-cif dictionary really should have
an atom_site_aniso category and that all the atom_site_aniso
tags should be moved into that category, and we should
accept Nick's approach of allowing CIFs to be presented
with tables made up from two or more categories by joins.
There is a good example of this in the STARDDL documentation.
As long as we know the full category structure this
does no harm in the database load context and gives great
flexibility to users and application programmers in working
with denormalized versions of the data structure for the
sake of efficiency.  Yes, this does increase the burden
on CIF software library developers.  Nothing would need
to change for software writing CIFs.  Whether they write
aone atom_site loop with atom_site_aniso tags mixed in
or write two separate loops, their CIFs would be valid.
But processors for CIF reading would have to accept the
possibility that they might have to normalize a combined
table by distributing its contents into multiple tables,
or, if they wanted to work with a denormalized table, that
they might have to do a join internally.  I think it is time
to take at least that much from the approach suggested

At 12:03 PM -0400 6/22/05, David Brown wrote:
>I have been reading this correspondence with interest (as have other 
>members of the group), but I did not feel that I had much to offer 
>as James and Nick seemed to be sorting things out on their own. 
>James' resolution of the problem, suggesting additions to the DDL1 
>dictionary, sounds like a good fix which certainly encapsulates the 
>essence of what we were thinking when we developed the _atom_site 
>section of the core dictionary.  The separate aniso loop was put in 
>because many people seemed to like keeping the positional and 
>displacements parameters in separate tables, and this was the 
>convention adopted by SHELX and other software, presumably in order 
>to keep each row of the table on one 80 character line.  The 
>_atom_site_aniso_label was added only because STAR did not allow 
>_atom_site_label to be repeated in the second loop.  I should point 
>out, however, that COMCIFS does not have authority to change or 
>approve DDLs, it can only approve CIF dictionaries.  I am not sure 
>who is in charge of DDLs, probably Nick and Syd.
>As Nick pointed out we developed CIF by the seat of our pants.  The 
>first CIF dictionary was conceived as a typeset printed document and 
>it was only later that it was realized that it could be typeset by 
>storing the dictionary on a computer as a STAR document.  Still 
>later it was realized that a STAR dictionary could be used to 
>validate CIFs and even later that it was realized that CIFs could 
>have a relational structure.  Thus DDL was developed on the fly to 
>accommodate CIF dictionaries that were already well developed. 
>During this period Acta Cryst. was tooling up to accept structure 
>reports in CIF and decisions had to be made quickly at a time when 
>it was impossible to foresee all the implications of what we were 
>doing.  There were also compromises that were thought necessary to 
>make CIF acceptable to the community, and it was in this spirit that 
>Acta Cryst. accepted many CIFs into its archives that were not 
>strictly CIF conformant.
>Software has taken a long time to catch up with the potential of 
>what was designed into CIF and its DDLs.  Browser-editors that 
>validate coreCIFs against the dictionary have only appeared in the 
>last couple of years, more than a decade after the release of the 
>core dictionary, and even these do not validate the relational 
>structure.  By hindsight (i.e., with ten years experience as well as 
>the appearance of XML) it is clear that we should have done some 
>things differently, and at Florence we need to review the whole 
>question of where CIF goes from here.  We may decide that we need to 
>adopt starDDL which has been more carefully thought out, but there 
>will be a cost.  All the dictionaries will need to be revised, the 
>changes will have to be sold to the community and the trauma of 
>transition will have to be minimized.  It would, however, give us a 
>chance to get it right the second time.
>Apparently there has been a systems failure in Chester, which is why 
>there is been such a stunning silence from that quarter.
>David Brown
>Attachment converted: Macintosh HD:idbrown 23.vcf (TEXT/ttxt) (0017382A)
>cif-developers mailing list

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

               Office:  +1-631-244-3035
            Lab (KSC 020): +1-631-244-3451
cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.