Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Updating list of _audit.schema

Hi James,

Related to the loop_ presentation issue.  Nothing prohibits creating a loop_ with a single row even if
the category logically has unit cardinality.



On 1/7/21 7:00 AM, Herbert J. Bernstein via comcifs wrote:
> Dear James,
>    This reminds me very much about the bitter fight in the early 1970s between the proponents of hierarchical databases
> and relational databases.  At the time I was on the wrong side and thought that there was something very neat and
> organized about always forcing your information into a tree that allowed the use of highly efficient pointers.   Just putting
> information into tables of unsorted tuples and eschewing pointers seemed horribly inefficient.  Those of us who
> liked hierarchies and pointers were wrong.  Codd was right.  My enthusiasm is the enthusiasm of a convert.  Relations
> rule!!!
>    Regards,
>      Herbert
> On Wed, Jan 6, 2021 at 11:39 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:
>     OK Herbert, I can only go on what is in the dictionaries. Please explain how mmCIF can "loop freely" the categories containing
>     an _entry.id <http://entry.id> child key data name within a single data block given the definition. I will leave John W to
>     further comment on how _entry.id <http://entry.id> is supposed to be used if he wishes. Meanwhile, in order to make progress I
>     suggest simply
>     (i) removing the "Macromolecular" option, noting the previous "Experiments" option covers multi-wavelength, multi-crystal setups.
>     (ii) removing the "imgCIF" option
>     (iii) returning in the future to add corresponding _audit.schema options corresponding to mmCIF and imgCIF if necessary
>     By the way, I think I share your enthusiasm for managing information using the relational model, but data containers (data
>     blocks/files/directories etc.) are unavoidable, and your characterisation of them as projections over particular values of one
>     or more key data names I think is the precise way of defining the relationship between a data block and a relational schema and
>     is vital for proper understanding of how to build datasets from constituent pieces.
>     On Thu, 7 Jan 2021 at 13:11, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:
>         Dear James,
>            John Westbrook will have to speak to the question of why his dictionary says that, but the reality is that he also runs a
>         database that
>         in fact supports a lot more than one entry id, and it is certainly the case that imgCIF data can have a very complex and
>         tangled relationship
>         with mmCIF entry ids.  Further it is not unusual to present one entry as multiple datablocks pulled together by a common
>         entry id when
>         they are not in same data file, e.g. for structure factors and coordinates.
>            All of which is beside the point.  Both imgCIF and mmCIF are database schema and stick keys on everything and loop them
>         quite freely.
>         Sure you can always pick any key and extract only the tuples that contain that key at a single value and such other tuples
>         from related
>         child categories as make some kind of sense and call it a datablock, but that is a very narrow and minimally useful view of
>         how to manage information.
>            Regards,
>              Herbert
>         On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:
>             Herbert - are you arguing that imgCIF and mmCIF should not be assigned different schema names? If your comments are not
>             about that, feel free to ignore the following.
>             If you scrutinize the definition in mmCIF of _entry.id <http://entry.id>
>             (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html
>             <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html>), you will see that it "identifies the
>             data block" so is therefore restricted to a single value in a single data block. It follows that all the child data
>             items of _entry.id <http://entry.id> are restricted to single values, so where these child items are the sole keys of
>             their categories those categories become single-row categories. Such categories are entirely functionally equivalent to
>             DDLm Set categories and so it would be possible to list which Set categories in core CIF are multi-row in mmCIF,
>             satisfying the criteria for a schema. Frankly I was a bit too lazy to write the code to determine this but from memory
>             it is only diffrn and exptl_crystal. If there are objections to the label "macromolecular" we can change it to
>             "multi-crystal multi-wavelength" to avoid any implications or restrictions on mmCIF.
>             Although imgCIF does not have any categories that have child data names of _entry.id <http://entry.id> (so every imgCIF
>             category can have multiple rows), it does add new key data names to a few mmCIF categories, thereby creating a distinct
>             "_audit.schema". I don't think that is a controversial statement. For example, the Diffrn_Detector category in imgCIF
>             has key data names "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id <http://diffrn_detector.id>", whereas
>             mmCIF has only the former (as per the text at bottom of p203 of Vol G).
>             all the best,
>             James.
>             On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:
>                 I believe both imgCIF and mmCIF only use loop categories and any set categories picked up for inclusion with their
>                 datasets will need to have
>                 keys added and be mapped into loop categories.  That is certainly the case for imgCIF -- Herbert
>                 On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:
>                     Apologies for the lax terminology. By "looped" I mean "able to have more than one row in a loop". Perhaps the
>                     explanations should be rewritten to use 'Loop category' and 'Set category' rigorously?
>                     On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:
>                           In imgCIF (as with mmCIF) any and all categories may be looped -- its how you put information into
>                         database tables.  - Herbert
>                         On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <comcifs@iucr.org <mailto:comcifs@iucr.org>> wrote:
>                             Dear COMCIFS,
>                             FIrst of all, Happy New Year to you all, I hope you've all been keeping well.
>                             I am writing to propose updating the list of _audit.schema in the core dictionary. Normally this would
>                             be core DMG business, but as it concerns most dictionaries covered by COMCIFS I believe this is the more
>                             appropriate forum. This has been prompted by reviewing the DDLm dictionary chapters for the next edition
>                             of Volume G. Please examine the list below and discuss any changes you would like to see.  The formal
>                             changes to the dictionary can be viewed as a diff at this link:
>                             https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e
>                             <https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e>
>                             As a reminder, the _audit.schema dataname indicates that one or more categories have become looped
>                             relative to the core CIF dictionary. For example, where multiple crystals are used in a measurement, the
>                             exptl_crystal category becomes looped. Ideally software will check this dataname and exit if the
>                             dataname has an incompatible value.
>                             best wishes,
>                             James.
>                             =====================================================
>                             loop_
>                             _enumeration_set.state
>                             _enumeration_set.detail
>                                  Base                'Original Core CIF schema'
>                                 'Space group tables' 'space_group category is looped'
>                                  Entry
>                             ;
>                                  entry category is defined and looped: multiple experiments
>                                  with results may be present
>                             ;
>                                  Powder              'Multiple compounds (phases) may be present'
>                                  Modulated           'Multiple subsystems may be present'
>                                  Experiments
>                             ;
>                                  diffrn and exptl_crystal categories are looped: multiple
>                                  diffraction measurements on multiple samples may be present
>                             ;
>                                  Macromolecular
>                             ;
>                                  mmCIF equivalent. Only single-key mmCIF categories containing children
>                                  of _entry.id <http://entry.id> are Set categories
>                             ;
>                                  Raw
>                             ;
>                                  imgCIF equivalent. As for Macromolecular, with the addition of
>                                  multiple detectors.
>                             ;
>                                  Laue
>                             ;
>                                  diffrn_radiation is looped: Multiple wavelengths are used.
>                             ;
>                                  Custom              'Examine dictionaries provided in _audit_conform'
>                                  Local               'Locally modified dictionaries. Datafile not for distribution'
>                             _enumeration.default    Base
>                             =======================
>                             -- 
>                             T +61 (02) 9717 9907
>                             F +61 (02) 9717 3145
>                             M +61 (04) 0249 4148
>                             _______________________________________________
>                             comcifs mailing list
>                             comcifs@iucr.org <mailto:comcifs@iucr.org>
>                             http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs
>                             <http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs>
>                     -- 
>                     T +61 (02) 9717 9907
>                     F +61 (02) 9717 3145
>                     M +61 (04) 0249 4148
>             -- 
>             T +61 (02) 9717 9907
>             F +61 (02) 9717 3145
>             M +61 (04) 0249 4148
>     -- 
>     T +61 (02) 9717 9907
>     F +61 (02) 9717 3145
>     M +61 (04) 0249 4148
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs

John Westbrook
RCSB, Protein Data Bank
Rutgers, The State University of New Jersey
Institute for Quantitative Biomedicine at Rutgers
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087
e-mail: john.westbrook@rcsb.org
Ph: (848) 445-4290 Fax: (732) 445-4320

Reply to: [list | sender only]