Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Please advise regarding a design of CIF dictionaries for materialproperties

Dear Nick,

many thanks for you detailed answer, and for your comprehensive example
of DDLm dictionary!

Here, for brevity, I will only highlight the key questions, one question
at a time, that might require a very definite and formal answer.

On 10/02/2011 01:18 PM, Nick Spadaccini wrote:

>> data_... block name in the dictionary no longer matches tag 
>> name. I guess this should not be a problem... Is it?
> 
> It is a convenience to have the data block name match the _name of 
> the item, it is NOT a requirement of the DDLs (well certainly was not
> at its inception, but I am not sure if interpretations have since
> changed).

This is exactly the question which bothers me: is it a must that the
data_.. block prefix in DDL1 dictionary matches the declared data name,
is it a formal recommendation, or is it just common practice and
tradition? In other words, is it a MUST match, SHOULD match or MAY match
according to the RFC 2119 (http://www.ietf.org/rfc/rfc2119.txt)?

May I explain why I insist on that precise wording. When we write a CIF
processing program, we want it to be correct, in a sense that it MUST
process every correct CIF and produce defined results, and it MUST
report an error for every incorrect CIF (provided the sets of correct
and incorrect CIFs are computable, which I guess they are according to
the current definitions).

Now, if the data block<->declare name correspondence is a MUST, then I
infer that:

a) correct software MAY use data block names to search for name
declarations (do we need this?);
b) correct software MUST report an error when data block name is not a
prefix of a declared data name;
c) if a dictionary where b) is the case is ever encountered, then the
dictionary is incorrect and it the responsibility of the dictionary
maintainer to fix the error.
d) if a validator program validates a dictionary against DDL and does
not report an error when the the non-conforming dictionary is processed,
then the validator is buggy and needs a fix.

If, however, the block<->declare name correspondence MAY or SHOULD be, then:

a) correct software MUST NOT use data block names to search for name
declarations (programmers, beware!);
b) correct software MAY/SHOULD report a (suppressable, non-fatal)
warning  when data block name is not a prefix of a declared data name;
c) if dictionaries where b) is the case are encountered, and a program
does not accept them, then a program is buggy and it is the
responsibility of the program maintainer to fix it.

As you see, a course of supposed events when a program accepts or
rejects a dictionary differs radically depending on whether the
data<->name correspondence is a MUST, SHOULD or MAY item.

>From what you say in the quote above ("It is a convenience to have the
data block name match the _name of the item"), the correspondence MAY be
present (and it MAY be not). According to what David Brown wrote (Wed,
28 Sep 2011 12:07:30 -0400, "It is not a problem in DDLm, I am not sure
about DDL1, but it could be confusing.  Best avoided."), I get
impression that it SHOULD. But according what John Bollinger wrote (Wed,
28 Sep 2011 11:05:54 -0500, "It also specifies (ITG 2.5.5) that item
names be used as definition datablock names."), it sounds more like a MUST.

So, for me to know how to write a correct CIF validator and a correct
CIF dictionary, I need to know how to interpret the definition of the
correct DDL1 dictionary -- whether:

a) "item names MUST be used as definition datablock names"
b) "item names SHOULD be used as definition datablock names"
c) "item names MAY be used as definition datablock names"

which of the a)-c) situations is the actual case?

Any choice among a-c is actually possible; I am sure that every
developer has taken this choice silently and meybe even implicitely, but
it would probably be beneficial for CIF users to make the choice
explicit, especially given the variety of possible interpretations.

BTW, I have scanned the existing (ftp://ftp.iucr.org/cifdics/) IUCr
dictionaries for the correspondence. In the mmCIF dictionary, all save
block names are prefixes of the corresponding declared tag names (data
not shown ;); however there are 4 dictionaries that have several cases
of data block names differing slightly (I attach a file with the
non-matching tag list; the first line is a Perl command that produced
it; warning -- long lines!). Thus, picking a "MUST" clause (the case
"a)" above) would probably be too restrictive and invalidate too many
existing dictionaries...

Regards,
Saulius

-- 
Dr. Saulius Gražulis
Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366
+ perl -MCIFParser -le 'for $file(@ARGV) {$p = new CIFParser; $d = $p->Run($file); for(@{$d}) { printf "%-32s %-32s %s\n", $file, "_".$_->{name}, join(",", @{$_->{values}{"_name"}}) unless !$_->{values}{"_name"} || !$_->{name} || substr($_->{values}{"_name"}[0],1) =~ /^\Q$_->{name}\E/ }}' cif_compat_1.0.dic cif_compat.dic cif_core_2.0.1.dic cif_core_2.0.dic cif_core_2.1.dic cif_core_2.2.dic cif_core_2.3.1.dic cif_core_2.3.2.dic cif_core_2.3.dic cif_core_2.4.1.dic cif_core_2.4.2.dic cif_core_2.4.dic cif_core.dic cif_core_restraints_1.0.dic cif_core_restraints.dic cif_img_1.0.dic cif_img_1.3.1.dic cif_img_1.3.2.dic cif_img.dic cif_iucr_1.0.dic cif_iucr.dic cif_mm_1.0.00.dic cif_mm_1.0.dic cif_mm_2.0.03.dic cif_mm_2.0.09.dic cif_mm.dic cif_ms_1.0.1.dic cif_ms_1.0.dic cif_ms.dic cif_pd_1.0.1.dic cif_pd_1.0.dic cif_pd.dic cif_register_1.0.dic cif_register.dic cif_rho_1.0.1.dic cif_rho_1.0.dic cif_rho.dic cif_sym_1.0.1.dic cif_sym_1.0.dic cif_sym.dic ddl2_core_2.1.3.dic ddl2_core.dic ddl_core_1.4.1.dic ddl_core_2.1.3.dic ddl_core.dic draft_cif_core_2.4.dic mmcif_ddl_2.1.6.dic mmcif_ddl.dic mmcif_std_2.0.09.dic mmcif_std.dic
#
# Dictionary name                 Data block name                   Declared tag names (comma-separated)
#
cif_compat_1.0.dic               _atom_site_aniso_B_*_nm          _atom_site_aniso_B_11_nm,_atom_site_aniso_B_12_nm,_atom_site_aniso_B_13_nm,_atom_site_aniso_B_22_nm,_atom_site_aniso_B_23_nm,_atom_site_aniso_B_33_nm
cif_compat_1.0.dic               _atom_site_aniso_B_*_pm          _atom_site_aniso_B_11_pm,_atom_site_aniso_B_12_pm,_atom_site_aniso_B_13_pm,_atom_site_aniso_B_22_pm,_atom_site_aniso_B_23_pm,_atom_site_aniso_B_33_pm
cif_compat_1.0.dic               _atom_site_aniso_U_*_nm          _atom_site_aniso_U_11_nm,_atom_site_aniso_U_12_nm,_atom_site_aniso_U_13_nm,_atom_site_aniso_U_22_nm,_atom_site_aniso_U_23_nm,_atom_site_aniso_U_33_nm
cif_compat_1.0.dic               _atom_site_aniso_U_*_pm          _atom_site_aniso_U_11_pm,_atom_site_aniso_U_12_pm,_atom_site_aniso_U_13_pm,_atom_site_aniso_U_22_pm,_atom_site_aniso_U_23_pm,_atom_site_aniso_U_33_pm
cif_compat_1.0.dic               _atom_site_Cartn_*_nm            _atom_site_Cartn_x_nm,_atom_site_Cartn_y_nm,_atom_site_Cartn_z_nm
cif_compat_1.0.dic               _atom_site_Cartn_*_pm            _atom_site_Cartn_x_pm,_atom_site_Cartn_y_pm,_atom_site_Cartn_z_pm
cif_compat_1.0.dic               _atom_type_radius_*_nm           _atom_type_radius_bond_nm,_atom_type_radius_contact_nm
cif_compat_1.0.dic               _atom_type_radius_*_pm           _atom_type_radius_bond_pm,_atom_type_radius_contact_pm
cif_compat_1.0.dic               _cell_length_*_nm                _cell_length_a_nm,_cell_length_b_nm,_cell_length_c_nm
cif_compat_1.0.dic               _cell_length_*_pm                _cell_length_a_pm,_cell_length_b_pm,_cell_length_c_pm
cif_compat_1.0.dic               _exptl_crystal_size_*_cm         _exptl_crystal_size_max_cm,_exptl_crystal_size_mid_cm,_exptl_crystal_size_min_cm,_exptl_crystal_size_rad_cm
cif_compat_1.0.dic               _refine_diff_density_*_nm        _refine_diff_density_max_nm,_refine_diff_density_min_nm,_refine_diff_density_rms_nm
cif_compat_1.0.dic               _refine_diff_density_*_pm        _refine_diff_density_max_pm,_refine_diff_density_min_pm,_refine_diff_density_rms_pm
cif_compat_1.0.dic               _reflns_d_resolution_*_nm        _reflns_d_resolution_high_nm,_reflns_d_resolution_low_nm
cif_compat_1.0.dic               _reflns_d_resolution_*_pm        _reflns_d_resolution_high_pm,_reflns_d_resolution_low_pm
cif_compat.dic                   _atom_site_aniso_B_*_nm          _atom_site_aniso_B_11_nm,_atom_site_aniso_B_12_nm,_atom_site_aniso_B_13_nm,_atom_site_aniso_B_22_nm,_atom_site_aniso_B_23_nm,_atom_site_aniso_B_33_nm
cif_compat.dic                   _atom_site_aniso_B_*_pm          _atom_site_aniso_B_11_pm,_atom_site_aniso_B_12_pm,_atom_site_aniso_B_13_pm,_atom_site_aniso_B_22_pm,_atom_site_aniso_B_23_pm,_atom_site_aniso_B_33_pm
cif_compat.dic                   _atom_site_aniso_U_*_nm          _atom_site_aniso_U_11_nm,_atom_site_aniso_U_12_nm,_atom_site_aniso_U_13_nm,_atom_site_aniso_U_22_nm,_atom_site_aniso_U_23_nm,_atom_site_aniso_U_33_nm
cif_compat.dic                   _atom_site_aniso_U_*_pm          _atom_site_aniso_U_11_pm,_atom_site_aniso_U_12_pm,_atom_site_aniso_U_13_pm,_atom_site_aniso_U_22_pm,_atom_site_aniso_U_23_pm,_atom_site_aniso_U_33_pm
cif_compat.dic                   _atom_site_Cartn_*_nm            _atom_site_Cartn_x_nm,_atom_site_Cartn_y_nm,_atom_site_Cartn_z_nm
cif_compat.dic                   _atom_site_Cartn_*_pm            _atom_site_Cartn_x_pm,_atom_site_Cartn_y_pm,_atom_site_Cartn_z_pm
cif_compat.dic                   _atom_type_radius_*_nm           _atom_type_radius_bond_nm,_atom_type_radius_contact_nm
cif_compat.dic                   _atom_type_radius_*_pm           _atom_type_radius_bond_pm,_atom_type_radius_contact_pm
cif_compat.dic                   _cell_length_*_nm                _cell_length_a_nm,_cell_length_b_nm,_cell_length_c_nm
cif_compat.dic                   _cell_length_*_pm                _cell_length_a_pm,_cell_length_b_pm,_cell_length_c_pm
cif_compat.dic                   _exptl_crystal_size_*_cm         _exptl_crystal_size_max_cm,_exptl_crystal_size_mid_cm,_exptl_crystal_size_min_cm,_exptl_crystal_size_rad_cm
cif_compat.dic                   _refine_diff_density_*_nm        _refine_diff_density_max_nm,_refine_diff_density_min_nm,_refine_diff_density_rms_nm
cif_compat.dic                   _refine_diff_density_*_pm        _refine_diff_density_max_pm,_refine_diff_density_min_pm,_refine_diff_density_rms_pm
cif_compat.dic                   _reflns_d_resolution_*_nm        _reflns_d_resolution_high_nm,_reflns_d_resolution_low_nm
cif_compat.dic                   _reflns_d_resolution_*_pm        _reflns_d_resolution_high_pm,_reflns_d_resolution_low_pm
cif_core_restraints_1.0.dic      _restr_equal_angle_details       _restr_equal_angle_detail
cif_core_restraints_1.0.dic      _restr_rigid_body_site_symmetry_ _restr_rigid_body_site_symmetry
cif_core_restraints.dic          _restr_equal_angle_details       _restr_equal_angle_detail
cif_core_restraints.dic          _restr_rigid_body_site_symmetry_ _restr_rigid_body_site_symmetry
cif_ms_1.0.1.dic                 _atom_site[ms]                   _atom_site_[ms]
cif_ms_1.0.1.dic                 _cell[ms]                        _cell_[ms]
cif_ms_1.0.1.dic                 _diffrn_refln[ms]                _diffrn_refln_[ms]
cif_ms_1.0.1.dic                 _diffrn_reflns[ms]               _diffrn_reflns_[ms]
cif_ms_1.0.1.dic                 _diffrn_standard_refln[ms]       _diffrn_standard_refln_[ms]
cif_ms_1.0.1.dic                 _exptl_crystal_face[ms]          _exptl_crystal_face_[ms]
cif_ms_1.0.1.dic                 _exptl_crystal[ms]               _exptl_crystal_[ms]
cif_ms_1.0.1.dic                 _geom_angle[ms]                  _geom_angle_[ms]
cif_ms_1.0.1.dic                 _geom_bond[ms]                   _geom_bond_[ms]
cif_ms_1.0.1.dic                 _geom_contact[ms]                _geom_contact_[ms]
cif_ms_1.0.1.dic                 _geom_torsion[ms]                _geom_torsion_[ms]
cif_ms_1.0.1.dic                 _refine[ms]                      _refine_[ms]
cif_ms_1.0.1.dic                 _refln[ms]                       _refln_[ms]
cif_ms_1.0.1.dic                 _reflns[ms]                      _reflns_[ms]
cif_ms.dic                       _atom_site[ms]                   _atom_site_[ms]
cif_ms.dic                       _cell[ms]                        _cell_[ms]
cif_ms.dic                       _diffrn_refln[ms]                _diffrn_refln_[ms]
cif_ms.dic                       _diffrn_reflns[ms]               _diffrn_reflns_[ms]
cif_ms.dic                       _diffrn_standard_refln[ms]       _diffrn_standard_refln_[ms]
cif_ms.dic                       _exptl_crystal_face[ms]          _exptl_crystal_face_[ms]
cif_ms.dic                       _exptl_crystal[ms]               _exptl_crystal_[ms]
cif_ms.dic                       _geom_angle[ms]                  _geom_angle_[ms]
cif_ms.dic                       _geom_bond[ms]                   _geom_bond_[ms]
cif_ms.dic                       _geom_contact[ms]                _geom_contact_[ms]
cif_ms.dic                       _geom_torsion[ms]                _geom_torsion_[ms]
cif_ms.dic                       _refine[ms]                      _refine_[ms]
cif_ms.dic                       _refln[ms]                       _refln_[ms]
cif_ms.dic                       _reflns[ms]                      _reflns_[ms]
cif_rho_1.0.1.dic                _atom_site_label_rho             _atom_site_label
cif_rho_1.0.1.dic                _atom_rho_multipole_kappa_       _atom_rho_multipole_kappa,_atom_rho_multipole_kappa_prime0,_atom_rho_multipole_kappa_prime1,_atom_rho_multipole_kappa_prime2,_atom_rho_multipole_kappa_prime3,_atom_rho_multipole_kappa_prime4
cif_rho_1.0.dic                  _atom_site_label_rho             _atom_site_label
cif_rho_1.0.dic                  _atom_rho_multipole_kappa_       _atom_rho_multipole_kappa,_atom_rho_multipole_kappa_prime0,_atom_rho_multipole_kappa_prime1,_atom_rho_multipole_kappa_prime2,_atom_rho_multipole_kappa_prime3,_atom_rho_multipole_kappa_prime4
cif_rho.dic                      _atom_site_label_rho             _atom_site_label
cif_rho.dic                      _atom_rho_multipole_kappa_       _atom_rho_multipole_kappa,_atom_rho_multipole_kappa_prime0,_atom_rho_multipole_kappa_prime1,_atom_rho_multipole_kappa_prime2,_atom_rho_multipole_kappa_prime3,_atom_rho_multipole_kappa_prime4

Reply to: [list | sender only]