Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Updating list of _audit.schema

OK Herbert, I can only go on what is in the dictionaries. Please explain how mmCIF can "loop freely" the categories containing an _entry.id child key data name within a single data block given the definition. I will leave John W to further comment on how _entry.id is supposed to be used if he wishes. Meanwhile, in order to make progress I suggest simply
(i) removing the "Macromolecular" option, noting the previous "Experiments" option covers multi-wavelength, multi-crystal setups. 
(ii) removing the "imgCIF" option
(iii) returning in the future to add corresponding _audit.schema options corresponding to mmCIF and imgCIF if necessary

By the way, I think I share your enthusiasm for managing information using the relational model, but data containers (data blocks/files/directories etc.) are unavoidable, and your characterisation of them as projections over particular values of one or more key data names I think is the precise way of defining the relationship between a data block and a relational schema and is vital for proper understanding of how to build datasets from constituent pieces.


On Thu, 7 Jan 2021 at 13:11, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
Dear James,
  John Westbrook will have to speak to the question of why his dictionary says that, but the reality is that he also runs a database that
in fact supports a lot more than one entry id, and it is certainly the case that imgCIF data can have a very complex and tangled relationship
with mmCIF entry ids.  Further it is not unusual to present one entry as multiple datablocks pulled together by a common entry id when
they are not in same data file, e.g. for structure factors and coordinates.
  All of which is beside the point.  Both imgCIF and mmCIF are database schema and stick keys on everything and loop them quite freely.
Sure you can always pick any key and extract only the tuples that contain that key at a single value and such other tuples from related
child categories as make some kind of sense and call it a datablock, but that is a very narrow and minimally useful view of how to manage information.
  Regards,
    Herbert
  

On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester@gmail.com> wrote:
Herbert - are you arguing that imgCIF and mmCIF should not be assigned different schema names? If your comments are not about that, feel free to ignore the following.

If you scrutinize the definition in mmCIF of _entry.id (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html), you will see that it "identifies the data block" so is therefore restricted to a single value in a single data block. It follows that all the child data items of _entry.id are restricted to single values, so where these child items are the sole keys of their categories those categories become single-row categories. Such categories are entirely functionally equivalent to DDLm Set categories and so it would be possible to list which Set categories in core CIF are multi-row in mmCIF, satisfying the criteria for a schema. Frankly I was a bit too lazy to write the code to determine this but from memory it is only diffrn and exptl_crystal. If there are objections to the label "macromolecular" we can change it to "multi-crystal multi-wavelength" to avoid any implications or restrictions on mmCIF.

Although imgCIF does not have any categories that have child data names of _entry.id (so every imgCIF category can have multiple rows), it does add new key data names to a few mmCIF categories, thereby creating a distinct "_audit.schema". I don't think that is a controversial statement. For example, the Diffrn_Detector category in imgCIF has key data names "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id", whereas mmCIF has only the former (as per the text at bottom of p203 of Vol G). 

all the best,
James.


On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
I believe both imgCIF and mmCIF only use loop categories and any set categories picked up for inclusion with their datasets will need to have
keys added and be mapped into loop categories.  That is certainly the case for imgCIF -- Herbert

On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester@gmail.com> wrote:
Apologies for the lax terminology. By "looped" I mean "able to have more than one row in a loop". Perhaps the explanations should be rewritten to use 'Loop category' and 'Set category' rigorously?

On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
 In imgCIF (as with mmCIF) any and all categories may be looped -- its how you put information into database tables.  - Herbert

On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <comcifs@iucr.org> wrote:
Dear COMCIFS,

FIrst of all, Happy New Year to you all, I hope you've all been keeping well.

I am writing to propose updating the list of _audit.schema in the core dictionary. Normally this would be core DMG business, but as it concerns most dictionaries covered by COMCIFS I believe this is the more appropriate forum. This has been prompted by reviewing the DDLm dictionary chapters for the next edition of Volume G. Please examine the list below and discuss any changes you would like to see.  The formal changes to the dictionary can be viewed as a diff at this link: https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e

As a reminder, the _audit.schema dataname indicates that one or more categories have become looped relative to the core CIF dictionary. For example, where multiple crystals are used in a measurement, the exptl_crystal category becomes looped. Ideally software will check this dataname and exit if the dataname has an incompatible value.

best wishes,
James.

=====================================================
loop_
_enumeration_set.state
_enumeration_set.detail
    Base                'Original Core CIF schema'
   'Space group tables' 'space_group category is looped'
    Entry              
;
    entry category is defined and looped: multiple experiments
    with results may be present
;
    Powder              'Multiple compounds (phases) may be present'
    Modulated           'Multiple subsystems may be present'
    Experiments        
;
    diffrn and exptl_crystal categories are looped: multiple
    diffraction measurements on multiple samples may be present
;
    Macromolecular      
;
    mmCIF equivalent. Only single-key mmCIF categories containing children
    of _entry.id are Set categories
;
    Raw                
;
    imgCIF equivalent. As for Macromolecular, with the addition of
    multiple detectors.
;
    Laue                
;
    diffrn_radiation is looped: Multiple wavelengths are used.
;
    Custom              'Examine dictionaries provided in _audit_conform'
    Local               'Locally modified dictionaries. Datafile not for distribution'
_enumeration.default    Base
=======================
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

Reply to: [list | sender only]