RE: Proposal to regulate markup in CIF files
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: RE: Proposal to regulate markup in CIF files
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Wed, 13 Sep 2017 15:08:42 +0000
- Accept-Language: en-US
- authentication-results: spf=none (sender IP is )smtp.mailfrom=John.Bollinger@STJUDE.ORG;
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=SJCRH.onmicrosoft.com; s=selector1-stjude-org;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;bh=XTxRomIYs/3KaissfaWvj0jTTnymvuF1hxuo+cQTpOQ=;b=b28WPmRT5IeLG7/YSWT6Uc2TS2v2/bgalb5dW/2IH+SffZZKHU1zwH8rb8Bn9gm0Sq9iOAvcmYHo8Q6tk3Hggb3Fw8CopHnSuSM0UCK+OxpHiQRxhvQoDOVApQQcBlIa9RA6FL9eRyy9u3H8jJePKKKW43taV3B/sqOUQ4U2xRI=
- In-Reply-To: <CAM+dB2cSRx=68QpdH5nQMuwWYU_pCwDmf2YVM7caU4vcw3fDsQ@mail.gmail.com>
- References: <CAM+dB2cSRx=68QpdH5nQMuwWYU_pCwDmf2YVM7caU4vcw3fDsQ@mail.gmail.com>
- spamdiagnosticmetadata: NSPM
- spamdiagnosticoutput: 1:99
Dear Colleagues, I think the proposed approach of allowing markup convention to be specified in data files could be useful. Moreover, the proposal would provide a mechanism for
formalizing the specification of which items are subject to markup in the first place. Supposedly, that is determined by items’ definitions, but in practice, few, if any, current dictionaries or definitions actually address the topic explicitly. Considerations: 1. To the extent that the proposal envisions data files being enabled to self-specify a particular markup convention from among several choices, it seems to violate
the principle that the meaning of an item should not depend on the value of a different item. 2. Since the codes for various supported markup conventions would be defined in domain dictionaries, it seems we might be setting ourselves up for future issues
in the event that dictionary maintainers want to support new markup conventions, for then we will need to change existing definitions (which, granted, we have lately afforded ourselves some freedom to do). 3. Additionally, the proposal seems to imply that each domain dictionary would need to either specify its own analogue(s) of `_publ.markup_convention`, or else
to explicitly depend on the Core item. I am uncertain whether this is a good or bad thing. 4. The proposal does not preclude individual items from being defined to have whatever content type is desired for them, as long as the definitions are not flagged
as carrying `Marked-up` data. That’s good, but I can imagine hypothetical cases in which it would be confusing to dictionary authors. For example, if an item were defined to contain HTML markup, it would be necessary for its definition to specify that its
values were NOT `Marked-up`. 5. Although we call the existing CIF conventions “markup” – and that’s fitting – for the most part they don’t serve the same purpose as Markdown, reStructured
text, etc.. Almost all of our markup is aimed at encoding specific characters, whereas the other markup conventions mentioned are focused on document structure and styling. These are largely orthogonal considerations, so perhaps we should approach them separately. 6. The discussion accompanying the proposal distinguishes between items that are intended to be machine-actionable and those intended purely for human consumption,
with the assertion that only the latter kind should be subject to markup. I would accept that if it referred to structural markup only, but it is not so clear that such a rule is appropriate for markup that serves to encode characters. Consider, for example,
`_atom_site.label`. As a key data name, it certainly has machine significance, but its values are also meant to identify atoms to humans. Should CIFs, then, be forbidden from using `Cα`
(i.e. `C\a`) as or in atom labels? Forbidding use of markup would prevent literal `Cα` from being expressed in an atom label in a CIF 1.1 document, but not in a CIF 2.0 document,
so that sets up a pathway wherein markup gets introduced into data values through format transliteration from CIF2 to CIF1. If only the original document were considered valid in such cases, then that would constitute a rather nasty trap to set for ourselves. Initial analysis: As presented, the proposal’s largest impact would probably be to provide for DDLm dictionaries to specify which items are primarily (or exclusively) intended
for human consumption: those that are defined to have `Marked-up` content, regardless of whether any markup is actually present in their values. Initially, at least, this would establish in which values to interpret the standard CIF markup conventions. That
would be worthwhile. I am less certain about the prospects for or usefulness of enabling data files to select alternative markup conventions. Perhaps that could be used to good effect
to support revisions to the markup conventions. Perhaps it would be more broadly applicable. But perhaps we should avoid making any items’ meanings depend on other items’ values. John From: comcifs [mailto:comcifs-bounces@iucr.org]
On Behalf Of James Hester Dear COMCIFS Please see below a draft proposal for dealing with markup in CIF files. Let me know if you agree with the general approach, or suggest better alternatives. Once our general direction is agreed, our COMCIFS
subcommittees will go into a huddle and sort out the definitions. James.
T +61 (02) 9717 9907 Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer |
Reply to: [list | sender only]
- Follow-Ups:
- Re: Proposal to regulate markup in CIF files (James Hester)
- References:
- Proposal to regulate markup in CIF files (James Hester)
- Prev by Date: Proposal to regulate markup in CIF files
- Next by Date: Re: Proposal to regulate markup in CIF files
- Prev by thread: Proposal to regulate markup in CIF files
- Next by thread: Re: Proposal to regulate markup in CIF files
- Index(es):