Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Relationship asmong CIF2, STAR,CIF1 and Python. . . .

Title:
Herbert,

Herbert J. Bernstein wrote:
  While a flag on which tags to use in DDL1 and DDL2 would
be a very helpful addition, we also need a mechanism to
ensure that that processing works with whichever tag
was actually used in the data file, especially when
populating missing values.
This should be no problem.  When an alias is located for an input item, the value would be tagged within the program to indicate which standard the value originally appeared in.  This could then be checked at output time to ensure that the output was in the same format (if so desired).  Presumably the output items would all be written to the same standard, so one needs an 'output flag' within the progrm (not needed in the dictionary) which might be defaulted to the standerd of the input file as revealled by the format of the datanames used (or the magic number in the case of CIFm files), or failing that defaulted to CIFm.  However the user of the program may wish to output in a different format and so could decide which of the three standards to use for output, regardless of the standard used in the input.  Useful in converting a CIF1 datafile to CIF2 for example, which was one of the problems DDLm was supposed to overcome.

Using DDLm CIF-dictionaries to populste empty fields will likely generate values for items that are not defined in CIF1 or 2 dictionaries.  If these are desired the only possibility is to output in CIFm format.  The user can specify the name of items to be calculated and can use any of the aliases for this purpose, but it would be pointless trying to include an item only found in the DDLm dictionaries in a CIF1 file, since this might prevent the file being used with legacy software.  I am not sure whether there is any advantage to be gained from mixing items from different standards.  Clearly a CIFm datafile is not expected to be read by legacy software, but coule, by this mechanism, bo converted to a CIF1 or CIF2 datafile.

I cannot see that there is any problem with the processing once the datafile has been read, since the datavalues of equivalent items in the different standards are always the same.  DDLm allows for vectors and matrices while DDL1 and 2 only allow the components to be stored, but the components are also defined (and aliased) in DDLm-based dictionaries and any method that calls on a matrix will be instructed by a method how to populate the matrix from available component information (assuming that it exists in the CIF1 or 2 datafile in the first place).


  More importantly, it appears that you are trying
to ensure that your dictionary will work with CIF1
(DDL1 and DDL2) data files.  Why can we not agree
that such interoperability is, as promised on the
IUCr web site, a firm goal of this exercise.

The way I read the promise made to our users, we agreed only to make sure that the DDLm based dictionaries could be programmed to read the CIF1 and CIF2 archive datafiles, but we did not promise they will be able to output files in the older standards.  However, it appear that outputting any of the CIF formats is no problem and I am happy to go along with the objective of ensuring that DDLm allows both the reading and writing of CIF1 and CIR2 as well as CIFm datafiles.

David

  Regards,
    Herbert

=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 yaya@dowling.edu
=====================================================

On Tue, 18 Jan 2011, David Brown wrote:

In response to John B'a request I have copied below two dictionary items
from my current working version of the DDLm core dictionary to show what a
DDLm CIF dictioanry looks like.

The first save frame gives, as requested, the entry for
_exptl_crystal.density_diffrn which includes a method for calculating this
quantity.  This calls upon, inter alia, _cell.volume whose definition is
given in the second save frame.  Note the alias that allows _cell_volume
defined in the DDL1 core dictionary or _cell.volume defined in the DDL2 core
dictionary, to be recognized by an input routine designed to read CIF1 or
CIF2 datafiles (not to be confused with CIF1 or CIF2 syntax.  CIF1 and CIF2
datefiles are both written using CIF1 syntax).  This input routine will
accept the occasional appearance of () at the end of a dataname even though
this is not allowed by CIF2 syntax.  The value found for _cell.volume is
then stored as DDLm _cell.volume where it can be used directly in the method
for _exptl_crystal.density_diffrn without any further processing.  If the
input asks that the value of _exptl_crystal_density_diffrn be calculated,
the list of aliases would idenrify this as being the same as
_exptl_crystal.density_diffrn (though in this case, as in most others, the
two names are interchangeable under DDLm, though not under DDL1 or DDL2). 
By whatever means the density calculation method is invoked, the program
uses only DDLm machinery and DDLm data values to calculate the density.  If
you require CIF1 or CIF2 output (which is not implied by the 'promise' as I
read it), this can be done by referring to the _aliases for
_exptl_crystal.density_diffrn.  If it helps, it would be easy to add a flag
to identify which alias is used in CIF1 and which in CIF2 datafiles,
although this information is already implicit in the _alias loop.  The
program would likely need a special CIF1 or CIF2 datafile output routine to
match the corresponding input routine.  In this way the archived files are
converted on reading to DDLm CIF and, if desired, can be output in of the
approved formats as CIF1, CIF2 or CIFm datafiles, subject to the restriction
that CIFm has a richer list of available dataitems not all of which are
available in CIF1 or CIF2.

I cannot see why this will not work.

David

P.S. Just to avoid further questions, _cell.volume can be calculated from
_cell.vector which is calculated by a method from the cell constants and so
could be calculated from basic information supplied in the CIF1 of CIF2
datafile.

-------------------------------------------------------------------
Two save frames extracted from the developing DDLm core dictionary
-------------------------------------------------------------------


save_exptl_crystal.density_diffrn
    _definition.id             '_exptl_crystal.density_diffrn'
    _definition.update           2008-02-20
    _description.text
;
     Crystal density calculated from crystal unit cell and atomic mass of
the
     contents.
;
    _description.common         'CrystalDensityDiffrn'
    _name.category_id            exptl_crystal
    _name.object_id              density_diffrn
    _type.purpose                Measured
    _type.container              Single
    _type.contents               Real
    _enumeration.range           0.0:
    _units.code                  megagrams_per_metre_cubed
     loop_
    _method.description
    _method.purpose  
    _method.expression
    'calculation of the density from the cell voluem and cell msss'
     Evaluation
;
    _exptl_crystal.density_diffrn = 1.6605 * _cell.atomic_mass /
_cell.volume
;
    loop_
       _alias.definition_id
       _alias.dictionary_uri
 '_exptl_crystal_density_diffrn'   cifdic.C91
 '_exptl_crystal.density_diffrn'   cif_mm_1.0.dic
     save_
 


save_cell.volume
    _definition.id             '_cell.volume'
    _definition.update           2008-02-13
    _description.text
;
     Volume of the crystal unit cell.
;
    _description.common         'CellVolume'
    _name.category_id            cell
    _name.object_id              volume
    loop_
       _alias.definition_id
       _alias.dictionary_uri
          '_cell_volume'   cifdic.C91
          '_cell.volume'   cif_mm_1.0.dic
    _type.purpose                Measured
    _type.container              Single
    _type.contents               Real
    _enumeration.range           0.0:
    _units.code                  angstroms_cubed
     loop_
    _method.description
    _method.purpose  
    _method.expression
    'calculation of the cell volume from unit cell vectors'
     Evaluation
;
      With v  as  cell_vector
 
      _cell.volume =  v.a * ( v.b ^ v.c )
;
     save_
 


Bollinger, John C wrote:

On Tuesday, January 18, 2011 7:20 AM, Herbert J. Bernstein wrote:

  Now I am very confused.  You say we have not broken the promise on the
IUCr web site, but at the same time we seem to be defining a CIF2 that
will not accept CIF1 documents.

  Please bear with me, and, even if you think it has already been
explained, please explain precisely how to use CIF1 documents in the
currently proposed CIF2 environment.

  If we have a sound way in which a CIF1 document has use of a DDLm
dictionary, then we do not need to bother most of the community with CIF2
for data files at this time.  All they need right now is what I called
DDLm-2011, a CIF2ish DDLm dictionary format.

I agree with that assessment of need, but I don't see what would be gained b
y limiting CIF2 release like that.  If CIF2 is not ready or appropriate for data files, then I think a CIF2-like DDLm-2011 language leads users and espe
cially developers in the wrong direction.  If we wish to release DDLm withou
t unleashing CIF2 on the world then let the initial DDLm and dictionary rele
ases be crafted in an altogether different format, such as XML.  In the unli
kely event that there were genuine interest in such a course, it would be wo
rth mentioning that I have a suitable XML schema at hand, as well as support
ing software that could easily be adapted to translating existing DDL and di
ctionary documents.

 If we don't have a sound way
in which a CIF1 document has use of a DDLm dictionary, then I think we are
breaking the promise on the IUCr web page.  Please recall that DDL2
dictionaries are not valid CIF1 documents -- they have save frames, so it
is not unprecedented to have a different spec for dictionaries as opposed
to data files.

I accept that, but it's a different matter for the data format to be a subse
t of the dictionary format than for the data format to be a related but subt
ly incompatible format.  We will have that anyway when DDLm dictionaries are
 used to validate CIF 1 files, bet let's please not set it as the direction for the indefinite future.

 It makes a big difference to most of the user community if
we are simply telling them we have a new dictionary format rather than
telling them we are changing the data file format.

Agreed, in that much of the user community doesn't care about dictionaries.
 On the other hand, members of the user community who care about some of the
 new CIF2 features -- Unicode support, as a prime example -- would not neces
sarily take the distinction as a positive or even a neutral proposition.

  On David's description, I think I really did explain why I think we will
have trouble populating missing values involving CIF1 tags that are not
valid CIF2 tags.  Doing that using the alias mechanism would seem to
require defining the CIF1 tag in the DDLm dictionary as a primary
definition and then aliasing a CIF2 tag to that primary CIF1 tag, so that
a method working with the CIF2 tag would effectively populate instances of
the CIF1 tag, but, and this is the part I can't seem to get past, defining
the CIF1 tag in a new CIF2-style DDLm dictionary would seem to require
that the CIF1 tag be a valid CIF2 tag.

I think we will not easily get past this dispute without an example.  For th
at purpose, then, perhaps James, David, or another participant with practica
l DDLm and dREL experience would be kind enough to present a solution to thi
s exercise:
Provide DDLm definitions and a dREL method that support computing a missing value for the Core item _exptl_crystal_density_diffrn, based on Core items _
chemical_formula_weight, _cell_formula_units_Z, and _cell_volume.  The defin
itions presented should use DDLm formalism for the defined data names, and s
hould be compatible also with validating the corresponding mmCIF data names.

James's and David's comments have given me every reason to believe that this
 would be straightforward, though the definitions together with their requir
ed context might be bulky.  I am hoping that the requested definitions are i
n fact already written.

 I suspect we will get into trouble
in other areas of using existing CIF1 tags in CIF2 DDLm dictionaries.

One of the key promises of DDLm, as I see it, is that the distinctions betwe
en various syntax versions and between DDL1 and DDL2 formalisms are relevant
 to only two program activities:
1) On input, reading a file correctly and associating data items with the co
rrect DDLm definitions.
2) On output, producing well-formed files for the target syntax version that
 are valid with respect to the DDL1 (or DDL2) dictionaries with which the DD
Lm dictionary provides compatibility.

As long as those two features work correctly, details of syntax version and original target dictionary can be completely abstracted away from validation
 and dREL operations, leaving no room for other areas of trouble.  Success i
n those areas will be a function of program, DDL, and dictionary details.  C
IF2 syntax need only be sufficient to support the required DDLm features; it
 does not otherwise bear on the problem.

How important each of those trouble may be depends on our goals, so I
respectfully urge that we make certain that we are working from common
goals, so that we can then focus on whether we are meeting those goals,
rather than have debates that seem to be based on different goals for
different speakers.

That is a reasonable criticism of our process to date.  I am willing to part
icipate in the proposed goal re-evaluation process, and I hope it will help resolve some of our current disputes.  Of late, however, we have also seen s
ignificant differences in technical analyses that should be independent of p
articipants' goals.  Therefore, I do not anticipate that the goal re-evaluat
ion exercise will provide clear resolutions to *all* our current disputes.


Regards,

John

--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital


Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group







_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group

begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.