[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal to define data names for atomic analysis information

To: Distribution list of the IUCr COMCIFS Core Dictionary Maintenance Group<[email protected]>
Subject: Re: Proposal to define data names for atomic analysis information
From: James H via coreDMG <[email protected]>
Date: Fri, 7 Oct 2022 15:12:43 +1100
Cc: James H <[email protected]>
In-Reply-To: <[email protected]>
References: <CAM+dB2f_pNY1sMEZ+FVFdwUOCkbadqSz3n1SuPiddO6Y44+tuw@mail.gmail.com><[email protected]>

On Wed, 5 Oct 2022 at 23:11, Brian McMahon via coreDMG <[email protected]> wrote:

[edit]
_atom_analytical_XRF.mass_loss_percent 12.4

because of that shared occurrence of the string "XRF". But strictly,
category and data names have no intrinsic semantic meaning. Is there
a formal mechanism in the DDL to allow the value of a data item to be
understood as a pointer to a specific category, and if not, does
there need to be?

There is no such mechanism in DDL, and I don't think there should be, as mechanisms exist to make that link explicit. See below.

It's possible that there doesn't need to be, since in the full example
as worked out on GitHub, the links between the categories are followed
through the *.id and *.meas_id codes:

loop_
_atom_analytical.atom_type
_atom_analytical.meas_id # <- point to atom_analytical_source.id
_atom_analytical.chemical_species
_atom_analytical.analytical_mass_percent
_atom_analytical.chemical_species_mass_percent
Si a SiO2 ? 22.7
Al a Al2O3 ? 27.4
Ti b TiO2 ? 2.7
Si c . 10.5 .
Si d Si 11.7 11.7

loop_
_atom_analytical_source.id
_atom_analytical_source.technique
a XRF
b XRF
c ICP
d EDS

loop_
_atom_analytical_XRF.meas_id # <- point to _atom_analytical_source.id
_atom_analytical_XRF.mass_loss_percent
_atom_analytical_XRF.ignition_temperature
a 12.4 1000
b 13.5 1100

Arguably the middle loop is redundant, and the use of (say) XRF is
deduced from the fact that you have ended up in a category that will
be defined as applicable only for XRF measurements; but in practice
I assume that having a synoptic overview of the different techniques
used would be welcomed by most users.

The middle loop was introduced to express a real-world restriction: '_atom_analytical.meas_id' identifies an analytical measurement, so the identifiers for each of the atom_analytical_XXX categories must not be duplicated between those categories. This restriction is expressed by forcing all of the 'meas_id's to be drawn from key data name of the category atom_analytical_source.

Similarly, as Brian notes, the data names themselves should never contain actionable information, so by linking the measurement identifier with a value that identifies the technique used in atom_analytical_source it becomes possible to state in a machine-readable way what type of measurement a particular atomic content was derived from. Because of this decoupling of data name from value, should we find two analytical techniques that required the same data names, we would not need to create two separate categories, which seems like a Good Thing.

all the best,

James.

Brian

On 04/10/2022 02:43, James H via coreDMG wrote:
> Dear Core CIF dictionary maintenance group:
>
> A proposal has been developed by some powder people to include results
> of sample analysis for atomic content in CIF files. It seems to me that
> this is more general than just powder, so I have transferred to proposal
> to the CIF core dictionary issues list. Please read through
> https://github.com/COMCIFS/cif_core/issues/304
> <https://github.com/COMCIFS/cif_core/issues/304> and add your comments
> there or here. For reference, the final comment on that issue, which
> gives the flavour of the proposal, is reproduced below.
>
> best wishes,
> James.
> =================================
>
> Given the desire to allow general details of techniques to be included,
> we will need one or more separate categories. One alternative is to
> bundle all details of techniques into a general purpose category, in the
> example below it is |ATOM_ANALYTICAL_TECHNIQUE|
>
> Here is an example of how this would appear in the CIF file (adapted
> from above):
>
> |data_block1 loop_ _atom_analytical.id <http://atom_analytical.id>
> _atom_analytical.atom_type _atom_analytical.chemical_species
> _atom_analytical.analytical_mass_percent
> _atom_analytical.chemical_species_mass_percent
> _atom_analytical.technique_id #<- refers to an
> _atom_analytical_technique.id <http://atom_analytical_technique.id> 1 Si
> SiO2 ? 22.7 XRF 2 Al Al2O3 ? 27.4 XRF 3 Ti TiO2 ? 2.7 XRF 4 Si . 10.5 .
> ICP 5 Si Si 11.7 11.7 EDS loop_ _atom_analytical_technique.id
> <http://atom_analytical_technique.id> #<- this could be a run id if more
> than one performed _atom_analytical_technique.type #XRF/ICP/EDS etc.
> _atom_analytical_technique.special_details XRF XRF ; The loss on
> ignition at 1000 °C was 12.4 wt% ; #... |
>
> The other alternative, and the one most conducive to expansion in the
> future, is to define categories for each analytical technique. These can
> remain largely empty until others want to fill them in. So the above
> would become:
>
> |loop_ _atom_analytical.id <http://atom_analytical.id>
> _atom_analytical.atom_type _atom_analytical.chemical_species
> _atom_analytical.analytical_mass_percent
> _atom_analytical.chemical_species_mass_percent
> _atom_analytical.technique_type 1 Si SiO2 ? 22.7 XRF 2 Al Al2O3 ? 27.4
> XRF 3 Ti TiO2 ? 2.7 XRF 4 Si . 10.5 . ICP 5 Si Si 11.7 11.7 EDS
> _atom_analytical_XRF.mass_loss_percent 12.4
> _atom_analytical_XRF.ignition_temperature 1000 |
>
> Of these I strongly prefer the second as it ensures that the minimum
> amount of information is hidden in machine-opaque text.
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
> _______________________________________________
> coreDMG mailing list
> [email protected]
> http://mailman.iucr.org/cgi-bin/mailman/listinfo/coredmg
_______________________________________________
coreDMG mailing list
[email protected]
http://mailman.iucr.org/cgi-bin/mailman/listinfo/coredmg

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

_______________________________________________
coreDMG mailing list
[email protected]
http://mailman.iucr.org/cgi-bin/mailman/listinfo/coredmg

[Send comment to list secretary]
[Reply to list (subscribers only)]

References:
- Proposal to define data names for atomic analysis information (James H via coreDMG)
- Re: Proposal to define data names for atomic analysis information (Brian McMahon via coreDMG)

Prev by Date: Re: Proposal to define data names for atomic analysis information
Next by Date: Ambiguity in atom_site.disorder_group value -1
Prev by thread: Re: Proposal to define data names for atomic analysis information
Next by thread: _database.dataset_doi - any problems if this might be a DOI for rawdata?
Index(es):
- Date
- Thread

Discussion List Archives

Re: Proposal to define data names for atomic analysis information