Re: [ddlm-group] On schema, syntax and semantics,was Preparing CIF for multi-block datasets
- To: "Herbert J. Bernstein" <yayahjb@gmail.com>, "james.r.hester@gmail.com"<james.r.hester@gmail.com>, Group finalising DDLm and associated dictionaries<ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] On schema, syntax and semantics,was Preparing CIF for multi-block datasets
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Thu, 2 Apr 2020 15:50:55 +0000
- Accept-Language: en-US
- ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=passsmtp.mailfrom=stjude.org; dmarc=pass action=none header.from=stjude.org;dkim=pass header.d=stjude.org; arc=none
- ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;bh=rJWHZCyjgG0BhZpAiuMTRSgU6SnIQ5J/OSOl+s0B4MU=;b=QSrqZHEQF6RK2b7WOBYlSYkko8FsYBR4qyH15B3RRLn/B2vGjPFhCOBlon41niDT8MoXq1ABG2Flz9LgszJkcblwCIcRmxAlnvdRqe02wIL1xQcg4Fqo1eI/oPEmZ0GdokKJZzgt3mmTjiDfsaqH8Va3rxYGfRG8YmrdwMAlRYxmgmWANXPJYOdGaBhr0iGlb+CwkCURWkJ1PalYDSjFWxzVu5sG2NNK7wjysGRDUzCLoEZjmEeYBp/OM6f2AM9byI/AqkRmeRiG27HKwLAi2r+GPsojt6/YTouWvCFnt1/YTHrB6sGBGu59a2qu4jzg3iC7wR+t3RDFasVtq2zIBA==
- ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;b=MzlfkivwnW9O8NXzUsaDk/Dk2aTpgYv274fTwIPx3VO0MZgYsObnWwArIVcjpDtXXJpYUaW3mG4J4Rmf3uHVsgNIA7QpLvblzOrJPeX80rcOVoFCU71ewvMQKbdyBKGZGddmpv/9r1yFjUUvOKzv4tC3Ux0Dz0XPI6PLvtwAArIxSmqj/GA2CxI+qA21bcPtNDa3XOHHuCg93+DH/Zu8gUmmCzFNc3OPeqZo/yI+nmQvol0DkjwJpWFLIR0tsSvn994t2RPfqcfWKUrE1B7KutpgHWUfJi+ER6WfEu3TzqYFAWJM8Ql2ox4ow8BC/cAFUlpK0NaAuoyPJUmBi/grxA==
- authentication-results: spf=none (sender IP is )smtp.mailfrom=John.Bollinger@STJUDE.ORG;
- In-Reply-To: <CAM+dB2dsQd3wU69bmeRZxbV5v+bK=Q841=Wgr53aJ3nHudhr6Q@mail.gmail.com>
- IronPort-SDR: 57I1NiZiGWl6T3DVizCPVCyZAaQY3XdZI7Y/rXDKTBduE+Tw2u9NDPXfM1vXGATtl6mPvk3xqzg4neZ96T0xzOAGkA+xDM2Ve7SJtLqyKAzNCtOJtzcoLeOGooaUug0VuuWroPbBs1kBx3t727+zYFXJ3f+sl/yAZ+tta479w0jnFmTaDVnfkzE4dHR8h9ezuDp5GfXUF8FnjgsBWgrPwcBCnSj7rl3UeABN7pvfUcvNiwe6kuJlzkOYafhuzoM/F4bjQftjROmmctoFBOcG4F4QnLSnRga+ql+QbmqbMa0=
- References: <CAM+dB2eFZ+-yUVWfNBVnKUaNNr9bUC9S3B8QJ9pYHNYk4ETnfA@mail.gmail.com><CABcsX26hg1KG+1P08W=GbjjV-upjKtbgyzbH4WW+qDhwZQR4zA@mail.gmail.com><CAM+dB2dBTdoXj_VegOibsFaKowy-+kXT6OQ2MxaVA=wOcD1akg@mail.gmail.com><1ffdd7d8-f29f-4c7b-e6b7-0bff08358484@rcsb.org><CAM+dB2fOodbuyMFhRnY5EZebYtPP3+RWh9pRLbAQvYmxvHYBrw@mail.gmail.com><CABcsX27tt801DdX8cmFwuBFY5JmMcm2T3od-VgnNMygP29TfLQ@mail.gmail.com>,<CAM+dB2dsQd3wU69bmeRZxbV5v+bK=Q841=Wgr53aJ3nHudhr6Q@mail.gmail.com>
Herbert wrote:
From the use of the DDL1-associated term "looped", I interpret Herbert to be at least partially referring here to the significance DDL1 purports to attribute to the form in which the CIF representation of a category is presented -- that is, a single-packet
loop_ construct vs. one or more scalar items. (By "DDL1" I mean first and foremost the dictionary definition language itself.) I agree that such significance is artificial, and indeed out of place. In practice, it served to simplify software development,
but probably also misled some people.
As far as I am aware, however, DDLm (the dictionary definition language) does not provide for making such distinction, and therefore none of the dictionaries expressed in that language do so. What DDLm does have is the concept of Set categories, which are
a natural fit for categories whose DDL1 definitions specify that they be unlooped-only. The validity of a CIF representation of a DDLm Set category does not depend on whether a loop_ construct is used. When talking about DDLm or dictionaries expressed in
that language, then, the term "looped" can be understood only from an historical perspective. The term remains significant for CIF documents, of course, but it is unrelated to their interpretation with respect to dictionaries expressed in DDLm.
James wrote:
Indeed, Set categories are not problematic from a relational
perspective. Such a category simply corresponds to a relation having a candidate key drawn from a single-element domain. The key's domain having only one element, its specific value has no semantic significance, and we
need not and do not define or represent the key explicitly.
That model is of course focused on data blocks (and save frames)
serving as self-contained databases. If we want to consider its embedding into a broader relational model encompassing a flat representation of multiple data blocks, then those erstwhile unspecified Set category keys must each have a 1:1 relationship with
their host block, and thus with its unique identifier. A natural choice in this broader model is therefore to _equate_ those keys with their host blocks' unique identifiers, which, in combination with expanding the keys' domains appropriately, brings us to
exactly the same place that mmCIF reached (via a similar route, I suspect).
On the other hand, although a flat relational representation
affords advantages that I'm sure motivated its choice for mmCIF, it is not semantically different from the trivial embedding we already have: the one with data block identifiers as keys, and _data_blocks_ as values (to the extent that data block identifiers
are unique).
Perhaps it would be worthwhile presenting something along the lines of the above discussion in an appropriate place (i.e. some place more official than this list), but it's unclear to me what more than that would be needful.
John
--
John C. Bollinger, Ph.D.
Computing and X-ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
From: ddlm-group <ddlm-group-bounces@iucr.org> on behalf of James Hester <jamesrhester@gmail.com>
Sent: Wednesday, April 1, 2020 7:36 PM To: Herbert J. Bernstein <yayahjb@gmail.com> Cc: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> Subject: [ddlm-group] On schema, syntax and semantics, was Preparing CIF for multi-block datasets
Caution: External Sender
Dear all,
See my comments inline below.
On Thu, 2 Apr 2020 at 10:23, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
I agree up until Herbert's final sentence. Is DDLm muddled because of the lack of decent documentation, or because the concepts are imperfect? As far as I can tell, DDLm in its current version provides a mechanism that is about as simple as it can be and
still handle the enormous diversity of the powder/magnetism/modulated/Laue world and combinations thereof in a machine-actionable manner - machine-actionable means that dREL methods can be written that will work with whatever combination of dictionaries you
have come up with (think combined neutron / X-ray powder diffraction on a mixture expressed using a dictionary). Doing this has relied on relational data structures.
I believe Herbert is thinking here of links within a CIF data block pointing to items that are not straightforward DDLm-conforming CIF data blocks, thus necessitating a mapping between the pointed-to contents and the DDLm schema. Absolutely true that
such a mapping is necessary. So perhaps Herbert is suggesting a further '_audit_link' data name that would identify the particular mapping to use? I agree. The lack of such mappings doesn't mean we can't define the data name. I would also add that, while
one scenario might put such links into a 'global' block (like a Nexus master file) making a sort of container for other data blocks, another scenario might simply link one block with the next one along.
I don't see why Herbert thinks that specifying the relationship between DDLm (I assume he means core CIF) and DDL2 (I assume he means mmCIF) is difficult. If the DIFFRN category is a set category in default core CIF, then it corresponds to a single-row DIFFRN
category in mmCIF. I thoroughly agree that the fundamental underlying structure of any scientific data is relational, some data presentations require more untangling than others.
By saying that a category is unlooped you are specifying the scope of a single data block (e.g. *one* compound, *one* sample), that is the significance of unlooped categories. DDL2 does exactly the same thing by specifying that the value of _entry.id
is the data block identifier. So all children of _entry.id
are single row i.e. Set categories. And there is no abandonment of relational integrity if you restrict some loops to having a single row as Herbert seems to be implying.
We already have 'looped sets' as a result of the _audit.schema discussions several years ago. The documentation might still be a bit sparse.
all the best,
James.
T +61 (02) 9717 9907
F +61 (02) 9717 3145 M +61 (04) 0249 4148 Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer |
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] On schema, syntax and semantics,was Preparing CIF for multi-block datasets (Herbert J. Bernstein)
- References:
- [ddlm-group] Preparing CIF for multi-block datasets (James Hester)
- Re: [ddlm-group] Preparing CIF for multi-block datasets (Herbert J. Bernstein)
- Re: [ddlm-group] Preparing CIF for multi-block datasets (James Hester)
- Re: [ddlm-group] Preparing CIF for multi-block datasets (john.westbrook@rcsb.org)
- Re: [ddlm-group] Preparing CIF for multi-block datasets (James Hester)
- Re: [ddlm-group] Preparing CIF for multi-block datasets (Herbert J. Bernstein)
- [ddlm-group] On schema, syntax and semantics,was Preparing CIF for multi-block datasets (James Hester)
- Prev by Date: Re: [ddlm-group] On schema, syntax and semantics,was Preparing CIF for multi-block datasets
- Next by Date: Re: [ddlm-group] On schema, syntax and semantics,was Preparing CIF for multi-block datasets
- Prev by thread: Re: [ddlm-group] On schema, syntax and semantics,was Preparing CIF for multi-block datasets
- Next by thread: Re: [ddlm-group] On schema, syntax and semantics,was Preparing CIF for multi-block datasets
- Index(es):