Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Updating list of _audit.schema

Hi James,
Related to the loop_ presentation issue.  Nothing prohibits creating a loop_ with a single row even ifthe category logically has unit cardinality.
On 1/7/21 7:00 AM, Herbert J. Bernstein via comcifs wrote:> Dear James,>    This reminds me very much about the bitter fight in the early 1970s between the proponents of hierarchical databases> and relational databases.  At the time I was on the wrong side and thought that there was something very neat and> organized about always forcing your information into a tree that allowed the use of highly efficient pointers.   Just putting> information into tables of unsorted tuples and eschewing pointers seemed horribly inefficient.  Those of us who> liked hierarchies and pointers were wrong.  Codd was right.  My enthusiasm is the enthusiasm of a convert.  Relations> rule!!!>    Regards,>      Herbert> > On Wed, Jan 6, 2021 at 11:39 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:> >     OK Herbert, I can only go on what is in the dictionaries. Please explain how mmCIF can "loop freely" the categories containing>     an _entry.id <http://entry.id> child key data name within a single data block given the definition. I will leave John W to>     further comment on how _entry.id <http://entry.id> is supposed to be used if he wishes. Meanwhile, in order to make progress I>     suggest simply>     (i) removing the "Macromolecular" option, noting the previous "Experiments" option covers multi-wavelength, multi-crystal setups.>     (ii) removing the "imgCIF" option>     (iii) returning in the future to add corresponding _audit.schema options corresponding to mmCIF and imgCIF if necessary> >     By the way, I think I share your enthusiasm for managing information using the relational model, but data containers (data>     blocks/files/directories etc.) are unavoidable, and your characterisation of them as projections over particular values of one>     or more key data names I think is the precise way of defining the relationship between a data block and a relational schema and>     is vital for proper understanding of how to build datasets from constituent pieces.> > >     On Thu, 7 Jan 2021 at 13:11, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:> >         Dear James,>            John Westbrook will have to speak to the question of why his dictionary says that, but the reality is that he also runs a>         database that>         in fact supports a lot more than one entry id, and it is certainly the case that imgCIF data can have a very complex and>         tangled relationship>         with mmCIF entry ids.  Further it is not unusual to present one entry as multiple datablocks pulled together by a common>         entry id when>         they are not in same data file, e.g. for structure factors and coordinates.>            All of which is beside the point.  Both imgCIF and mmCIF are database schema and stick keys on everything and loop them>         quite freely.>         Sure you can always pick any key and extract only the tuples that contain that key at a single value and such other tuples>         from related>         child categories as make some kind of sense and call it a datablock, but that is a very narrow and minimally useful view of>         how to manage information.>            Regards,>              Herbert> >         On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:> >             Herbert - are you arguing that imgCIF and mmCIF should not be assigned different schema names? If your comments are not>             about that, feel free to ignore the following.> >             If you scrutinize the definition in mmCIF of _entry.id <http://entry.id>>             (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html>             <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html>), you will see that it "identifies the>             data block" so is therefore restricted to a single value in a single data block. It follows that all the child data>             items of _entry.id <http://entry.id> are restricted to single values, so where these child items are the sole keys of>             their categories those categories become single-row categories. Such categories are entirely functionally equivalent to>             DDLm Set categories and so it would be possible to list which Set categories in core CIF are multi-row in mmCIF,>             satisfying the criteria for a schema. Frankly I was a bit too lazy to write the code to determine this but from memory>             it is only diffrn and exptl_crystal. If there are objections to the label "macromolecular" we can change it to>             "multi-crystal multi-wavelength" to avoid any implications or restrictions on mmCIF.> >             Although imgCIF does not have any categories that have child data names of _entry.id <http://entry.id> (so every imgCIF>             category can have multiple rows), it does add new key data names to a few mmCIF categories, thereby creating a distinct>             "_audit.schema". I don't think that is a controversial statement. For example, the Diffrn_Detector category in imgCIF>             has key data names "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id <http://diffrn_detector.id>", whereas>             mmCIF has only the former (as per the text at bottom of p203 of Vol G).> >             all the best,>             James.> > >             On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:> >                 I believe both imgCIF and mmCIF only use loop categories and any set categories picked up for inclusion with their>                 datasets will need to have>                 keys added and be mapped into loop categories.  That is certainly the case for imgCIF -- Herbert> >                 On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:> >                     Apologies for the lax terminology. By "looped" I mean "able to have more than one row in a loop". Perhaps the>                     explanations should be rewritten to use 'Loop category' and 'Set category' rigorously?> >                     On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:> >                           In imgCIF (as with mmCIF) any and all categories may be looped -- its how you put information into>                         database tables.  - Herbert> >                         On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <comcifs@iucr.org <mailto:comcifs@iucr.org>> wrote:> >                             Dear COMCIFS,> >                             FIrst of all, Happy New Year to you all, I hope you've all been keeping well.> >                             I am writing to propose updating the list of _audit.schema in the core dictionary. Normally this would>                             be core DMG business, but as it concerns most dictionaries covered by COMCIFS I believe this is the more>                             appropriate forum. This has been prompted by reviewing the DDLm dictionary chapters for the next edition>                             of Volume G. Please examine the list below and discuss any changes you would like to see.  The formal>                             changes to the dictionary can be viewed as a diff at this link:>                             https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e>                             <https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e>> >                             As a reminder, the _audit.schema dataname indicates that one or more categories have become looped>                             relative to the core CIF dictionary. For example, where multiple crystals are used in a measurement, the>                             exptl_crystal category becomes looped. Ideally software will check this dataname and exit if the>                             dataname has an incompatible value.> >                             best wishes,>                             James.> >                             =====================================================>                             loop_>                             _enumeration_set.state>                             _enumeration_set.detail>                                  Base                'Original Core CIF schema'>                                 'Space group tables' 'space_group category is looped'>                                  Entry>                             ;>                                  entry category is defined and looped: multiple experiments>                                  with results may be present>                             ;>                                  Powder              'Multiple compounds (phases) may be present'>                                  Modulated           'Multiple subsystems may be present'>                                  Experiments>                             ;>                                  diffrn and exptl_crystal categories are looped: multiple>                                  diffraction measurements on multiple samples may be present>                             ;>                                  Macromolecular>                             ;>                                  mmCIF equivalent. Only single-key mmCIF categories containing children>                                  of _entry.id <http://entry.id> are Set categories>                             ;>                                  Raw>                             ;>                                  imgCIF equivalent. As for Macromolecular, with the addition of>                                  multiple detectors.>                             ;>                                  Laue>                             ;>                                  diffrn_radiation is looped: Multiple wavelengths are used.>                             ;>                                  Custom              'Examine dictionaries provided in _audit_conform'>                                  Local               'Locally modified dictionaries. Datafile not for distribution'>                             _enumeration.default    Base>                             =======================>                             -- >                             T +61 (02) 9717 9907>                             F +61 (02) 9717 3145>                             M +61 (04) 0249 4148>                             _______________________________________________>                             comcifs mailing list>                             comcifs@iucr.org <mailto:comcifs@iucr.org>>                             http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs>                             <http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs>> > > >                     -- >                     T +61 (02) 9717 9907>                     F +61 (02) 9717 3145>                     M +61 (04) 0249 4148> > > >             -- >             T +61 (02) 9717 9907>             F +61 (02) 9717 3145>             M +61 (04) 0249 4148> > > >     -- >     T +61 (02) 9717 9907>     F +61 (02) 9717 3145>     M +61 (04) 0249 4148> > > _______________________________________________> comcifs mailing list> comcifs@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs> 
-- John WestbrookRCSB, Protein Data BankRutgers, The State University of New JerseyInstitute for Quantitative Biomedicine at Rutgers174 Frelinghuysen RdPiscataway, NJ 08854-8087e-mail: john.westbrook@rcsb.orgPh: (848) 445-4290 Fax: (732) 445-4320_______________________________________________comcifs mailing listcomcifs@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.