Re: [ddlm-group] Restricting identifiers to integers: a good idea?
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Restricting identifiers to integers: a good idea?
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Mon, 13 Sep 2021 15:14:43 +0000
- Accept-Language: en-US
- ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=passsmtp.mailfrom=stjude.org; dmarc=pass action=none header.from=stjude.org;dkim=pass header.d=stjude.org; arc=none
- ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=p/jR6bMR8jHflge4QWW3KHzCVkndJi1dmYgBl18c1Sk=;b=SqjXzoqG1xakAS6yHKmTd5k7JoT6/h/njQS0szgkyf8fkVcY7lxsuziGFY6wg3hWvm1vvmu5SA7YGaG+cWnBXxm5hL7JS3VWMBd2qYVz8Db9uN9zPetbDMSzZpRHbYDVkYgq2CNFz+h7oJuIpJC24u5zGJ8EWccIhDbPYpm1COX1rpKHi/aDwAXNIvTSA1ao/9n8OApW40yInOHh5cWWEiu6NdVX0UP5c70J+kHZFz+4LttdPeJqY1zMvJpxEoEW9BbiobuARUnMqSK4ZEehl2SF+fUgf+uUevFupvLB9CukI0F+7sf3GsgblNwELQEADIQvD/MHQwdmQaM6FvhK2w==
- ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;b=LdREn2/A+lcj8sucRBZm+Geemqxasg8K1+2vFNeQ+D21mXz3ISsV2aSepBSEdS2u6f6MBRLvbJ2lYfXKo3lwn4QwvJdROS7eiRecqH5aeuz1U8hHt/1O49OQBKu5KQf691a8qc3RFK9v16G4FJ/ZM/hEZRL6iI456LCDsamETKbY4NdqiMEmP8paF42j62mHeyvxu+WsD3SsrxhRy+MaQMwCiLxvOqvz6kVbCZoK5PpQAMLILatKWtyvXcj3mDyclgINR8YSZ09ifN4V61DpiGOGtCfBfAGHtPloOXytGIpU4uh//inwp/yrvAF+6dJb2soMym2UxNlQmyrS33Q1Aw==
- In-Reply-To: <CAM+dB2eyrh0zf_CNOha5Y3oR9RrDqLQ7KKxkO=kLBmnGWD5ENw@mail.gmail.com>
- IronPort-SDR: ukpQCVep7NxXNCZEUiC4XKOh44Iyj4p07xbYPQMUd+i7FxB1b1u3yQjk/XOPeyNUSa1sANjJr+nYoL3BKLOkNPA5I5E1DuW7WxtO3xVAC+S7cZyCWmO3FMZbv8iD2zZRFsH28ABSEQqlrILZl6j4nzgwiFyNhyCpSW5v+Y7+FmSzkR0mn1fEtpRiuvNO64gWS6quig0WVbc+TEqs1a+xuKdcjaRKahwddL0/JxPqihDHzmUg8/VCtpiLR0ej5qAgIN3SqUyvEzaPl5M1N5x9lTF4qZ3xj0Vbbk2z1/0bWB0=
- References: <CAM+dB2eyrh0zf_CNOha5Y3oR9RrDqLQ7KKxkO=kLBmnGWD5ENw@mail.gmail.com>
There is a popular school of thought in DB design that every entity’s primary key should be a single-column surrogate key. Such a key is typically assigned a numeric data type for various practical reasons, such as storage size and key
generation strategy. The proposal seems to align with that practice, but I am not convinced that the practice is necessarily a good strategy for data dictionaries. Although we do have some surrogate keys in our dictionaries (mmCIF’s _symmetry_equiv.id, for example), we tend to use natural keys instead. Where a suitable natural key exists, this both simplifies our dictionaries and makes CIF instance
documents simpler and easier for humans to read. Also, continuing with _symmetry_equiv.id as an example, I observe that its definition explicitly states that it is NOT numeric. I think is typical of our historic practice for such keys. I am also skeptical of the proposal’s stated objective of simplifying input and storage, as this seems to fall outside the purpose of a CIF dictionary. A CIF dictionary’s primary purpose is to define data semantics, and any consideration
of its suitability as a storage model is secondary, at best, notwithstanding the relational characteristics of DDL2 and DDLm. Moreover, it is not evident to me how input or storage would in fact be simplified without attributing some kind of additional significance to the key values, which would turn them into natural keys whose specific significance should be
defined. The dictionary authors might want to consider, for example, whether their input and storage plans would be foiled by key sets that were not contiguous, did not start at 0 or 1, included large gaps between keys, or included very large numbers. I’m
also suspicious that any perceived input advantages assume restrictions on the lexical form in which keys would be expressed. For example, the dictionary authors should consider whether integers expressed in scientific notation (e.g. 0.10e+01 or 1e+00) or
in other forms besides straight decimal digit sequences would defeat their goals. Best regards, John -- John C. Bollinger, Ph.D., RHCSA Computing and X-ray Scientist Department of Structural Biology St. Jude Children's Research Hospital From: ddlm-group <ddlm-group-bounces@iucr.org> On Behalf Of
James H Caution: External Sender. Do not open unless you know the content is safe. Hello DDLm experts, This time I have a relational model question. One of our dictionary author groups would like to restrict the key data name of a category (an opaque identifier) to positive integers (instead of arbitrary text), to simplify input and storage. I have commented that this risks the integer
acquiring some sort of meaning, such as specifying that the items in the category are arranged in a particular sequence. However, I think some of you have more experience in why integer identifiers may or may not be a good idea. Can any of you comment on the
value of restricting/not restricting the form of an identifier? Note this is a new dictionary so I'm not talking about changing an existing data name. thanks, James. T +61 (02) 9717 9907 Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer |
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Prev by Date: Re: [ddlm-group] Restricting identifiers to integers: a good idea?
- Next by Date: [ddlm-group] Is the string '2.0' an Integer?
- Prev by thread: Re: [ddlm-group] Restricting identifiers to integers: a good idea?
- Next by thread: [ddlm-group] Revising dictionary_valid category
- Index(es):