Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Updating list of _audit.schema

Dear All,

It seems to me that we are not all debating from the same starting point.  Perhaps we do not share a common understanding of the significance of the _audit.schema​ item, or perhaps we have lost sight of the fact that that's what we're talking about.

As I understand it, the purpose of the item is to allow CIF data blocks to declare important details about how their contents are meant to be interpreted.  This encompasses not only which dictionaries apply, but in some cases also how the data within a block may or must be logically structured.

The question is not about whether or when any given CIF serialization presents single-packet categories using loop_​ syntax.  Neither DDLm nor CIF 2.0 makes distinctions about that.  The question also is not about whether categories afford multiple packets in any broad, ontological sense.

The question can, however, be construed partly as speaking to which columns of some categories' compound natural keys appear in specific CIF data blocks.  That is, to the extent that one construes data blocks as containing projections of data from a more complex data space, _audit.schema​ speaks to the details of the specific projection upon which a particular block is based.  The characteristics of the broader space are tangential to the topic.  So is the related question of aggregating data from multiple blocks into a larger collection.


Best Regards,

John

--
John C. Bollinger, Ph.D., RHCSA
Computing and X-ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital


From: comcifs <comcifs-bounces@iucr.org> on behalf of john.westbrook--- via comcifs <comcifs@iucr.org>
Sent: Thursday, January 7, 2021 7:32 AM
To: comcifs@iucr.org <comcifs@iucr.org>
Cc: john.westbrook@rcsb.org <john.westbrook@rcsb.org>
Subject: Re: Updating list of _audit.schema
 
Caution: External Sender. Do not open unless you know the content is safe.


Hi James,

Related to the loop_ presentation issue.  Nothing prohibits creating a loop_ with a single row even if
the category logically has unit cardinality.

Regards,

John

On 1/7/21 7:00 AM, Herbert J. Bernstein via comcifs wrote:
> Dear James,
>    This reminds me very much about the bitter fight in the early 1970s between the proponents of hierarchical databases
> and relational databases.  At the time I was on the wrong side and thought that there was something very neat and
> organized about always forcing your information into a tree that allowed the use of highly efficient pointers.   Just putting
> information into tables of unsorted tuples and eschewing pointers seemed horribly inefficient.  Those of us who
> liked hierarchies and pointers were wrong.  Codd was right.  My enthusiasm is the enthusiasm of a convert.  Relations
> rule!!!
>    Regards,
>      Herbert
>
> On Wed, Jan 6, 2021 at 11:39 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:
>
>     OK Herbert, I can only go on what is in the dictionaries. Please explain how mmCIF can "loop freely" the categories containing
>     an _entry.id <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fentry.id%2F&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710331025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9wB5kr22hZLtZEBZqbPYkEXL9u2vbD1%2FwTpg4eFO0tI%3D&amp;reserved=0> child key data name within a single data block given the definition. I will leave John W to
>     further comment on how _entry.id <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fentry.id%2F&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=l9To5LQhPglEpNQY0uHdjx1Ymx0Z8UjziMJdOiHYs2Q%3D&amp;reserved=0> is supposed to be used if he wishes. Meanwhile, in order to make progress I
>     suggest simply
>     (i) removing the "Macromolecular" option, noting the previous "Experiments" option covers multi-wavelength, multi-crystal setups.
>     (ii) removing the "imgCIF" option
>     (iii) returning in the future to add corresponding _audit.schema options corresponding to mmCIF and imgCIF if necessary
>
>     By the way, I think I share your enthusiasm for managing information using the relational model, but data containers (data
>     blocks/files/directories etc.) are unavoidable, and your characterisation of them as projections over particular values of one
>     or more key data names I think is the precise way of defining the relationship between a data block and a relational schema and
>     is vital for proper understanding of how to build datasets from constituent pieces.
>
>
>     On Thu, 7 Jan 2021 at 13:11, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:
>
>         Dear James,
>            John Westbrook will have to speak to the question of why his dictionary says that, but the reality is that he also runs a
>         database that
>         in fact supports a lot more than one entry id, and it is certainly the case that imgCIF data can have a very complex and
>         tangled relationship
>         with mmCIF entry ids.  Further it is not unusual to present one entry as multiple datablocks pulled together by a common
>         entry id when
>         they are not in same data file, e.g. for structure factors and coordinates.
>            All of which is beside the point.  Both imgCIF and mmCIF are database schema and stick keys on everything and loop them
>         quite freely.
>         Sure you can always pick any key and extract only the tuples that contain that key at a single value and such other tuples
>         from related
>         child categories as make some kind of sense and call it a datablock, but that is a very narrow and minimally useful view of
>         how to manage information.
>            Regards,
>              Herbert
>
>         On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:
>
>             Herbert - are you arguing that imgCIF and mmCIF should not be assigned different schema names? If your comments are not
>             about that, feel free to ignore the following.
>
>             If you scrutinize the definition in mmCIF of _entry.id <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fentry.id%2F&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=l9To5LQhPglEpNQY0uHdjx1Ymx0Z8UjziMJdOiHYs2Q%3D&amp;reserved=0>
>             (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmmcif.wwpdb.org%2Fdictionaries%2Fmmcif_pdbx_v50.dic%2FItems%2F_entry.id.html&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5j1yMKOvD04asbmc1fCWcyoTbeNtu%2BXjihcTp433eAQ%3D&amp;reserved=0
>             <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmmcif.wwpdb.org%2Fdictionaries%2Fmmcif_pdbx_v50.dic%2FItems%2F_entry.id.html&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5j1yMKOvD04asbmc1fCWcyoTbeNtu%2BXjihcTp433eAQ%3D&amp;reserved=0>), you will see that it "identifies the
>             data block" so is therefore restricted to a single value in a single data block. It follows that all the child data
>             items of _entry.id <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fentry.id%2F&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=l9To5LQhPglEpNQY0uHdjx1Ymx0Z8UjziMJdOiHYs2Q%3D&amp;reserved=0> are restricted to single values, so where these child items are the sole keys of
>             their categories those categories become single-row categories. Such categories are entirely functionally equivalent to
>             DDLm Set categories and so it would be possible to list which Set categories in core CIF are multi-row in mmCIF,
>             satisfying the criteria for a schema. Frankly I was a bit too lazy to write the code to determine this but from memory
>             it is only diffrn and exptl_crystal. If there are objections to the label "macromolecular" we can change it to
>             "multi-crystal multi-wavelength" to avoid any implications or restrictions on mmCIF.
>
>             Although imgCIF does not have any categories that have child data names of _entry.id <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fentry.id%2F&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=l9To5LQhPglEpNQY0uHdjx1Ymx0Z8UjziMJdOiHYs2Q%3D&amp;reserved=0> (so every imgCIF
>             category can have multiple rows), it does add new key data names to a few mmCIF categories, thereby creating a distinct
>             "_audit.schema". I don't think that is a controversial statement. For example, the Diffrn_Detector category in imgCIF
>             has key data names "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdiffrn_detector.id%2F&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=21ghfjznoWbmXsXDEEUZgdey1D3NiN8Qe9xP7Hp31pg%3D&amp;reserved=0>", whereas
>             mmCIF has only the former (as per the text at bottom of p203 of Vol G).
>
>             all the best,
>             James.
>
>
>             On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:
>
>                 I believe both imgCIF and mmCIF only use loop categories and any set categories picked up for inclusion with their
>                 datasets will need to have
>                 keys added and be mapped into loop categories.  That is certainly the case for imgCIF -- Herbert
>
>                 On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:
>
>                     Apologies for the lax terminology. By "looped" I mean "able to have more than one row in a loop". Perhaps the
>                     explanations should be rewritten to use 'Loop category' and 'Set category' rigorously?
>
>                     On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:
>
>                           In imgCIF (as with mmCIF) any and all categories may be looped -- its how you put information into
>                         database tables.  - Herbert
>
>                         On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <comcifs@iucr.org <mailto:comcifs@iucr.org>> wrote:
>
>                             Dear COMCIFS,
>
>                             FIrst of all, Happy New Year to you all, I hope you've all been keeping well.
>
>                             I am writing to propose updating the list of _audit.schema in the core dictionary. Normally this would
>                             be core DMG business, but as it concerns most dictionaries covered by COMCIFS I believe this is the more
>                             appropriate forum. This has been prompted by reviewing the DDLm dictionary chapters for the next edition
>                             of Volume G. Please examine the list below and discuss any changes you would like to see.  The formal
>                             changes to the dictionary can be viewed as a diff at this link:
>                             https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCOMCIFS%2Fcif_core%2Fpull%2F190%2Fcommits%2F5e3b84e6f84997f9822f704a9f380ff500e0410e&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JUGl5rKed2FFotu%2FO%2Bh48p5DurA6RxzKXAv5%2BF7VUI0%3D&amp;reserved=0
>                             <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCOMCIFS%2Fcif_core%2Fpull%2F190%2Fcommits%2F5e3b84e6f84997f9822f704a9f380ff500e0410e&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JUGl5rKed2FFotu%2FO%2Bh48p5DurA6RxzKXAv5%2BF7VUI0%3D&amp;reserved=0>
>
>                             As a reminder, the _audit.schema dataname indicates that one or more categories have become looped
>                             relative to the core CIF dictionary. For example, where multiple crystals are used in a measurement, the
>                             exptl_crystal category becomes looped. Ideally software will check this dataname and exit if the
>                             dataname has an incompatible value.
>
>                             best wishes,
>                             James.
>
>                             =====================================================
>                             loop_
>                             _enumeration_set.state
>                             _enumeration_set.detail
>                                  Base                'Original Core CIF schema'
>                                 'Space group tables' 'space_group category is looped'
>                                  Entry
>                             ;
>                                  entry category is defined and looped: multiple experiments
>                                  with results may be present
>                             ;
>                                  Powder              'Multiple compounds (phases) may be present'
>                                  Modulated           'Multiple subsystems may be present'
>                                  Experiments
>                             ;
>                                  diffrn and exptl_crystal categories are looped: multiple
>                                  diffraction measurements on multiple samples may be present
>                             ;
>                                  Macromolecular
>                             ;
>                                  mmCIF equivalent. Only single-key mmCIF categories containing children
>                                  of _entry.id <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fentry.id%2F&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=l9To5LQhPglEpNQY0uHdjx1Ymx0Z8UjziMJdOiHYs2Q%3D&amp;reserved=0> are Set categories
>                             ;
>                                  Raw
>                             ;
>                                  imgCIF equivalent. As for Macromolecular, with the addition of
>                                  multiple detectors.
>                             ;
>                                  Laue
>                             ;
>                                  diffrn_radiation is looped: Multiple wavelengths are used.
>                             ;
>                                  Custom              'Examine dictionaries provided in _audit_conform'
>                                  Local               'Locally modified dictionaries. Datafile not for distribution'
>                             _enumeration.default    Base
>                             =======================
>                             --
>                             T +61 (02) 9717 9907
>                             F +61 (02) 9717 3145
>                             M +61 (04) 0249 4148
>                             _______________________________________________
>                             comcifs mailing list
>                             comcifs@iucr.org <mailto:comcifs@iucr.org>
>                             https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.iucr.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcomcifs&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=8JihdZbjsbrl2lsG69cs7mjphOQfCzfybKKQ86EExKY%3D&amp;reserved=0
>                             <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.iucr.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcomcifs&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=8JihdZbjsbrl2lsG69cs7mjphOQfCzfybKKQ86EExKY%3D&amp;reserved=0>
>
>
>
>                     --
>                     T +61 (02) 9717 9907
>                     F +61 (02) 9717 3145
>                     M +61 (04) 0249 4148
>
>
>
>             --
>             T +61 (02) 9717 9907
>             F +61 (02) 9717 3145
>             M +61 (04) 0249 4148
>
>
>
>     --
>     T +61 (02) 9717 9907
>     F +61 (02) 9717 3145
>     M +61 (04) 0249 4148
>
>
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.iucr.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcomcifs&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=8JihdZbjsbrl2lsG69cs7mjphOQfCzfybKKQ86EExKY%3D&amp;reserved=0
>

--
John Westbrook
RCSB, Protein Data Bank
Rutgers, The State University of New Jersey
Institute for Quantitative Biomedicine at Rutgers
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087
e-mail: john.westbrook@rcsb.org
Ph: (848) 445-4290 Fax: (732) 445-4320
_______________________________________________
comcifs mailing list
comcifs@iucr.org
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.iucr.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcomcifs&amp;data=04%7C01%7CJohn.Bollinger%40stjude.org%7C57d3ba5814834d846e5908d8b310b93d%7C22340fa892264871b677d3b3e377af72%7C0%7C0%7C637456231710340974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=8JihdZbjsbrl2lsG69cs7mjphOQfCzfybKKQ86EExKY%3D&amp;reserved=0


Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs