Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Cif2-encoding] Splitting of imgCIF and other sub-topics

Dear Colleagues,

   James' and John's last interchange is so voluminous, I doubt any of
us has been able to fully appreciate the rich complexity of ideas
contained therein.  For example, one of the suggestions far down in
the text is:

(James now)  Indeed.  My intent with this specification was to ensure
that third parties would be able to recover the encoding. If imgCIF is
going to cause us to make such an open-ended specification, it is
probably a sign that imgCIF needs to be addressed separately.  For
example, should we think about redefining it as a container format,
with a CIF header and UTF16 body (but still part of the
"Crystallographic Information Framework")?

The idea of an imgCIF "header" in CIF format and a image in another is an
old, well-established, thoroughly discussed, and mistaken idea, rejected
in 1998.  The handling of multiple images in a single file (e.g.
a jpeg thumbnail and crystal image and a full-size diffraction image)
requires the ability to switch among encodings within the file -- 
something handled by the current DDL2 and MIME-based imgCIF format and 
which would be a serious problem in CIF2 has currently proposed,
increasing the chances that we will have to move imgCIF entirely into
HDF5 and abandon the CIF representation entirely, sharing only
the dictionary and not the framework.

If you look carefully, you will see a similar trend with mmCIF, in which
and XML representation sharing the dictionary plays a much more
important role than the CIF format.

Is it really desirable to make the new CIF format so rigid and 
unadaptable that major portions of macromolecular crysallography
end up migrating to very different formats, as they already are
doing?  Yes, there is great value in having a common dictionary,
but would there not be additional value in having a sufficiently
flexible common format to allow for more software sharing than
we now have?  It is really desirable for us to continue in the
direction of a single macromolecular experiment having to
deal with HDF5 and CIF/DDL2/MIME representations of the image data
during collection, CCP4-style CIF representations during processing
and deposition and legacy PDB and PDBML representations in subsequent 
community use?  If we could be a little bit more flexible, we might be
able to reduce the data interchange software burdens a little.
Right now, this discussion seems headed in the direction of simply
adding yet another data representation (DDLm/CIF2) to the mix,
increasing the chances of mistranslation and confusion, rather
that reducing them.

Please, step back a bit from the detailed discussion of UTF8 and
look at the work-flow of doing and publishing crystallographic
experiments and let us try to make a contribution that simplifies
it, not one that makes it more complex than it needs to be.

I suggest we need to meet and talk, either face-to-face, or by skype.


  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769


cif2-encoding mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.