Discussion List Archives

[Date Prev][Date Next][Date Index]

(20) New dictionaries. Date/time, multimedia

Dear Colleagues

The good news is that I shall shortly be sending you updated versions of the
Core and mm dictionaries for review. More details of that below.

First, a few items of mail to distribute.

A16.1  Date/time format
-----------------------
D> 	I have just got hold of a copy of ISO 8601:1988(E) which explains 
D> the conventions for dates and times.  There are several possible 
D> conventions including year and day of year (which is not likely to 
D> interest us) and two ways of representing the date-time in the way we are 
D> suggesting.  These are:
D> 
D> 		YYYYMMDDThhmmss
D> 		   or
D> 		YYYY-MM-DDThh:mm:ss
D> 
D> I would recommend that we adopt the latter as being easier to read and 
D> just as easy to parse if that is needed.  The time given in this form is 
D> Coordinated Universal Time (UTC) sometimes referred to incorrectly as 
D> GMT.  If local time is used, the correction factor that needs to be 
D> applied to get local time from UTC is given afterwards.  As I write this, 
D> the time can therefore be expressed as:
D> 
D> 		1994-02-21T13:55:58-05:00
D> 
D> i.e. just after lunch in Hamilton but supper time in Greenwich (UK)
D> 
D> The final :00 is optional, but some places are not at integral hour
D> differences from UTC.  For places east of Greenwich, the last few
D> characters will be +02:00 etc.  We may not think it necessary to specify
D> the time zone since we are not interested in absolute synchronicity, only
D> a way of generating a 'fairly unique' number. 
D> 
D> 	I recommend that we adopt this form of the date time with the 
D> possibility of truncating at the right hand side as appropriate e.g.
D> 
D> 		1994-02-21
D> 		1994-02-21T13
D> etc.

I shall consider this an agreed extension to A16.1 unless I hear otherwise.

D20.1 Public information resources for CIF
------------------------------------------
Peter has made the following suggestion for a summary of responses to
frequently-asked questions on CIF:

PMR> Could I suggest a FAQ for CIF?  In that way some of the bandwidth
PMR> might be reduced.  

Such a summary (a "FAQ-list") is often posted at regular intervals to
newsgroups devoted to a particular topic, such as sci.techniques.xtallography.
This seems to me a good idea. I'm not sure whether Peter is volunteering to
maintain this (if not, it's something we could put up at Chester), but it's
something COMCIFS should be involved with at an early stage, to ensure that
the community has access to the official answer to substantive questions
(such as "what is the file syntax?", "can I invent my own data names?",
"where can I get the latest CIF dictionary?"). I am equally happy to work
with Peter on this, if the Committee sees no objection to the idea.

While it may be appropriate to post such a document to designated newsgroups,
it would seem sensible also to make this available over the network, through
ftp, gopher or world-wide web services. As an example of how we might do
this, you are invited to use gopher to port 70 on diamond.iucr.ac.uk, or
mosaic (or other WWW client) to URL http://diamond.iucr.ac.uk/welcome.html.
These are trial info servers that I plan to show the Electronic Publishing
Committee a fortnight hence. Note that they are not (at present) to be
advertised to a wider public.

As another example of this approach, you are invited to look also at Howard's
WWW setup (this is also available through a link from the Chester web):

H> New World-Wide-Web URP for crystallography
H>    http://www.unige.ch/crystal/crystal_index.html
H> takes you straight to the crystallography section.

D20.2 Foreign data files embedded in CIF
----------------------------------------
PMR> 	I have persuaded my colleagues that CIF will play an important 
PMR> role in our future bioinformatics strategy and act as an abstract manager 
PMR> of simple objects.  Many of these objects are file formats which are 
PMR> already accepted in other disciplines, at least as de facto standards.  
PMR> As an example, I would like to create a loop_ of sequence files in
PMR> SWISS-PROT format9 (ASCII text).  This format is formally described in a
PMR> document. My construct would be something like:
PMR> loop_
PMR> _foreign_swiss_prot_sequence
PMR> ;
PMR> <first entry>
PMR> ;
PMR> ;
PMR> <second entry>
PMR> ;
PMR> 
PMR> The point at issue is whether COMCIFS would be prepared to consider a 
PMR> category (say 'foreign') which allows users to embed ascii files in a 
PMR> CIF.  A list of allowable foreign files could be maintained.  The virtue 
PMR> of this is that it reduces the namespace resolution required when many 
PMR> different institutions have embedded this type of information.

What do other people think of this? I'm a bit concerned about the generality
of such a '_foreign_' category, but I can see some point to it. But it's
really only worth embedding foreign data in a CIF if a CIF parser can do
something with the data. For instance, we've been thinking of embedding
graphics files in CIFs, something like:

loop_
  _publ_figure_graphic
  _publ_figure_include
  _publ_figure_type
  _publ_figure_caption
; %!-PS-Adobe-blahhh
  :::::
;
     .            image/postscript    'Fig. 1. Interesting figure.'

.    fig2.mpeg    image/mpeg          'Fig.2. Interesting movie.'

Note that in this example the PostScript file is embedded as
'_publ_figure_graphic', which is OK because it's in ASCII format. The movie
is in MPEG format, and pointed to by a '_publ_figure_include' data name.

I would prefer these to be handled as the category 'publ_figure' rather than
parts of 'foreign'; but I guess they could be subsumed as part of the foreign
scheme.

What of the values I gave above in '_publ_figure_type'? This leads us on
neatly to another topic which has arisen independently:

20.2 CIF as a standard MIME format
----------------------------------
Syd was good enough to send me a copy of some correspondence he has had
regarding data exchange standards under the general umbrella of Internet
multimedia types. Although not directed specifically at COMCIFS, I think
it's of interest.

S> I have just heard again from a Henry Rzepa about data exchange. 
S> 
S> Date: Wed, 23 Feb 1994 17:58:50 +0000
S> To: Syd
S> From: h.rzepa@ic.ac.uk (Henry Rzepa)
S> Subject: CIF, MIF and Chemistry MIME types
S> 
S> I would like to propose to the international MIME committee a chemistry
S> MIME type. ... [MIME] is now forming the basis of the more
S> general world-wide-web internet browser, as well as being supported
S> by new e-mail programs. By asking the MIME committee to ratify
S> chemistry as an (8th) basic type, we would formalise the support of
S> molecular sciences in these new types of information delivery, and
S> enable software developers to make use of this system. The MIME
S> coordinator is highly enthusiastic about this concept.
S> 
S> In detail, I propose a primary MIME type called chemistry, with
S> perhaps 10-20 secondary types. These would include CIF and MIF,
S> but also de facto standards such as the MDL molfile types, as well as
S> pdb files, and a number of formats for say quantum chemistry etc. We
S> need to agree precisely those formats that would be proposed in the
S> first tranche.
S> 
S> In operation, the way the system currently works is that a MIME
S> type (say chemistry/cif) is defined as mapping to a given program on
S> the users machine (do any CIF or MIF programs yet exist?) via
S> a preferences or configuration file on the users machine. Default
S> mappings would be defined as well. For example, on a Macintosh,
S> the type image/gif maps to a viewer call JPEGView  by default. We
S> would need to define default viewers (or in the jargon helpers) for
S> each secondary MIME type we propose. For example, XMol,
S> or RasMol might be suitable viewers for a wide range of chemistry
S> file formats on Unix machines.
S> 
S> Once the type is defined, one can then associate a file suffix with each
S> type. For example, the suffix .cif would map to the MIME type
S> chemistry/cif.  The e-mail or www MIME compliant program would
S> then recognise the suffix and launch the appropriate viewer.
S> 
S> Actually, viewers can either be external, or "in-line", and one of
S> our objectives is to produce a www browser which can draw
S> "in-line" structures from the basic chemistry file types.
S> 
S> With this infra-structure in place, the field of "chemically" cognisant
S> e-mail programs or www browsers opens up. If you happen to be
S> unfamiliar with this area, connect to the "URL" shown below for
S> a taste of what we have done.
S> 
S> Dr Henry Rzepa, Dept. Chemistry, Imperial College, LONDON SW7 2AY;
S> rzepa@ic.ac.uk  http://www.ch.ic.ac.uk/rzepa.html

I can see no objection to this. Apart from the specific Internet applications, 
this seems to provide a suitable compendium of file types for use by other
applicatiosn software (such as the semi-intelligent parser that would be needed
to handle my embedded graphics files in the example above). Can I take it that
we are all happy to see a chemistry/cif file type defined in the MIME scheme?


============================= New versions of the dictionaries

Now to the nitty-gritty. Paula is very anxious to let us see the latest
version of the mmCIF dictionary before it gets posted to the wider
macromolecular community. I am sending out her latest version, together with
a new core dictionary that seeks to reflect the enhancements we agreed at
Beijing. This is still not a final draft (indeed, several changes will
undoubtedly result from public scrutiny), but it's being made available so
that we can scan this version rapidly for evidence of any major problems.
Paula is especially keen that we should consider the chemical and
crystallographic sense of the terms defined, and not just the syntactic
aspects of the file (though it would help to consider also the consistency of
the DDL terms employed). Note that the DDL is not the most recent version
(_esd yes still appears instead of _type_conditions esd, for example).

The Core Dictionary is much expanded over cifdic.C91. It contains an
additional 65 data names, thus:
     '_atom_site_B_iso_or_equiv'
     '_atom_site_aniso_B_11' 
     '_atom_site_aniso_B_12' 
     '_atom_site_aniso_B_13' 
     '_atom_site_aniso_B_22' 
     '_atom_site_aniso_B_23' 
     '_atom_site_aniso_B_33' 
     '_atom_site_aniso_ratio'
     '_atom_site_disorder_assembly'
     '_atom_sites_fract_tran_matrix_11' 
     '_atom_sites_fract_tran_matrix_12' 
     '_atom_sites_fract_tran_matrix_13' 
     '_atom_sites_fract_tran_matrix_21' 
     '_atom_sites_fract_tran_matrix_22' 
     '_atom_sites_fract_tran_matrix_23' 
     '_atom_sites_fract_tran_matrix_31' 
     '_atom_sites_fract_tran_matrix_32' 
     '_atom_sites_fract_tran_matrix_33' 
     '_audit_author_address'
     '_audit_author_name'
     '_audit_contact_author_address'
     '_audit_contact_author_email'
     '_audit_contact_author_fax'
     '_audit_contact_author_name'
     '_audit_contact_author_phone'
     '_citation_Medline_AN'
     '_citation_abstract'
     '_citation_author_citation_id'
     '_citation_author_name'
     '_citation_book_coden_ISBN'
     '_citation_book_publisher'
     '_citation_book_title'
     '_citation_coordinate_linkage'
     '_citation_country'
     '_citation_editor_citation_id'
     '_citation_editor_name'
     '_citation_id'
     '_citation_journal_abbrev'
     '_citation_journal_coden_ASTM'
     '_citation_journal_coden_ISSN'
     '_citation_journal_coden_PDB'
     '_citation_journal_full'
     '_citation_journal_issue'
     '_citation_journal_volume'
     '_citation_language'
     '_citation_page_first' 
     '_citation_page_last' 
     '_citation_special_details'
     '_citation_title'
     '_citation_year'
     '_computing_phasing_MAD' 
     '_computing_phasing_MIR' 
     '_computing_phasing_MR' 
     '_computing_phasing_averaging' 
     '_database_code_NDB' 
     '_database_code_PDB' 
     '_diffrn_ambient_environment'
     '_diffrn_crystal_support'
     '_diffrn_crystal_treatment'
     '_diffrn_measurement_details'
     '_diffrn_radiation_collimation'
     '_publ_section_exptl_prep' 
     '_publ_section_exptl_refinement' 
     '_refine_diff_density_rms' 
     '_refine_ls_weighting_details'

In addition to this, 47 categories have been introduced, each headed by an
entry of the (infamous) style data_atom_site_[]. A review of the minutes of
the Beijing meeting will give a good indication of the principles we were
following in transferring material from the core to the mm dictionaries. I
think there are only a few extra points I need single out here.

I have added '_atom_site_aniso_ratio' at the request of Mario Nardelli. It is
(like several other items) an easily derivable quantity, but he argues that it
gives a good diagnostic indication of the eccentricity of the displacement
ellipsoids where there is no immediate access to ellipsoid diagrams, and we
have recognised this data name (from his PARSTCIF program) in Chester for
some considerable time.

The names '_publ_section_exptl_prep' and '*_refinement' are designed to allow
the experimental section of Acta C to be split into subsections describing
the chemical preparation of the sample and the refinement separately. This
will facilitate formatting of this portion of an Acta C paper to follow Syd's
wish to emphasise the sample preparation and characterisation aspects. In
the true spirit of CIF, '_publ_section_experimental' will still be
recognised, of course.

I have also added 'undef' to the enumeration list for
_refine_ls_hydrogen_treatment, and made it the default (this in response to a
letter from George of November '92. The wheels of justice grind exceeding slow,
but...).

The major recent enhancement to the mm dictionary has been the addition of a
rich set of '_entity_' definitions; but there have been a very large number
of modifications since Beijing

I have not on this occasion ciftex'd these files to give typeset versions,
because I am very busy at present; but if you think that you would find such a
representation helpful, please let me know and I'll do one as soon as I can
find time.

Please send your detailed comments on the core dictionary to me, on the mm
dictionary to Paula. And please try to give the new dictionaries as much time
as you can afford at this point.

Good reading!
Brian