[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Finalizing DDLm

To: [email protected], Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Finalizing DDLm
From: James Hester <[email protected]>
Date: Fri, 16 Apr 2010 09:51:31 +1000
In-Reply-To: <[email protected]>
References: <C7EB96DD.131EC%[email protected]><[email protected]><[email protected]><[email protected]><[email protected]>

No, I didn't say that dramatic changes to existing DDL2 entries would be necessary.� I said that, compared to having to roll-up DDL2 loops, an order of magnitude more work would be necessary to convert legacy files.� If the amount of time required to write the code for rolling-up unrolled loops is about 2 hours, all I am saying is that around 20 hours would need to be spent on other aspects of conversion, my point being that concerns with ease of conversion are not a basis to insist on allowing one-packet loops to be unrolled.

The DDLm data model and the DDL2 data models *are* closely aligned.� The overall category tree defined in mmCIF could remain largely untouched.� However, DDLm gives more explicit semantic meaning to the category-subcategory relationship, and provides options to fine-tune that relationship, e.g. automatic joining.� Therefore, I anticipate that the mmCIF dictionaries would need some tweaking if they are behave properly given the extra semantic relationships.� As a result, a few datanames might e.g. find themselves moved into different categories, so aliases would need to be defined.� I think at least 20 hours could be spent on such work.

On Fri, Apr 16, 2010 at 12:58 AM, John Westbrook <[email protected]> wrote:

This thread appears to be spiraling in a direction that I find both
confusing and disturbing. �The suggestion that supporting
DDLm would require dramatic changes in existing DDL2 entries implies that
DDLm will require changes in the logical model underlying the existing
dictionaries. �This hardly seems reasonable particularly for DDL2 entries
and appears to me like a very large departure from prior discussions
on maintaining support for legacy data instances.

I very strongly discourage the group from moving in any direction that
do not provide support for existing archives of data files.

John

On 4/15/10 6:38 AM, Herbert J. Bernstein wrote:
> I would appreciate a clarification of intent for DDL1 and DDL2 data
> files in the transition to DDLm:
>
> 1. Please assume somebody has an existing data file conformant to the
> current COMCIFS-approved DDL1 dictionaries, esp. the core, what are the
> specific changes that will be required to those data files for them to
> be acceptable under the proposed new DDLm conformant dictionaries?
>
> 2. Please assume somebody has an existing data file conformant to the
> current COMCIFS-approved DDL2 dictionaries, esp. mmCIF and imgCIF, what
> are the specific changes that will be required to those data files for
> them to be acceptable under the proposed new DDLm conformant dictionaries?
>
> Answers to these two questions would help to quantify the "order of
> magnitude more work" we will have to do as per James' remark:
>
>> PDB mmCIF files are not an issue for DDLm *at all*, as the mmCIF data
>> files
>> are written with respect to the DDL2 specification (not DDLm). �If and
>> when
>> a DDLm version of mmCIF appears, conversion of legacy files will
>> involve an
>> order of magnitude more work than just rolling up unrolled loops, so the
>> outcome of the present discussion will be by comparison background noise.
>
>
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
>
> +1-631-244-3035
> [email protected]
> =====================================================
>
> On Thu, 15 Apr 2010, James Hester wrote:
>
>> Herbert, it seems to me that both of your issues are not relevant to this
>> discussion, in that they refer to situations for which DDLm is not used.
>> First, a clarification. �When I talk about a dictionary being
>> 'available' to
>> a program, I have in mind that it could be available at program
>> writing time
>> (i.e. available to the programmer) and/or at program running time. �I
>> hope
>> this corresponds with other peoples' usages.
>>
>> On Wed, Apr 14, 2010 at 9:23 PM, Herbert J. Bernstein
>> <[email protected]> wrote:
>> Inasmuch as we appear to be discussing, rather than voting,
>> please allow me to clarify my position:
>>
>> I am _not_ concerned about whether a DDLm-conformant dictionary
>> does or does not have rules to say that a particular category is
>> or is not allowed to be presented "unrolled". �I am concerned
>> with how to handle two important cases:
>>
>> �1. �Existing legacy data files that have "rolled" or "unrolled"
>> loops that do not conform to the new dictionary rules; and
>>
>>
>> Those legacy data files were written with a particular dictionary in
>> mind.
>> If that dictionary DDL allows loop unrolling (i.e. DDL2) then any
>> application that presumes to read datafiles based on that dictionary will
>> need to support it. �But what we are discussing is how to specify the
>> construction of data files written with respect to a DDLm (not DDL2)
>> based
>> dictionary. �So I don't see how your case (1) is relevant.
>>
>> �2. �Applications that are confronted with a data file, portions
>> of which are not in dictionaries to which that application has
>> access.
>>
>>
>> If an application has no access to the dictionary relevant to a given
>> dataname, it cannot be compelled to issue an error or warning when
>> confronted with an unrolled loop, because it has no way of knowing
>> that the
>> loop is unrolled. � In such a situation it would be bizarre to specify
>> any
>> dictionary-derived behaviour, and I am not proposing to do so.
>> Likewise, if
>> a CIF-writing application has no dictionary information about a dataname
>> that it is writing, we are unable to impose any dictionary-based
>> behaviour. � The latter is a fairly 'Alice in Wonderland' situation: a
>> program writing a dataname that neither it (nor the programmer) knows
>> nothing about...
>>
>> If an application has a dictionary handy and that dictionary
>> says something relevant about the rolledness or unrolledness of
>> a loop, then I am reluctantly willing to accept the DDLm
>> specification requiring the issuance of a warning or an error
>> message. �Some application writers may decide not to do that,
>> but that is a different discussion.
>>
>> What I am concerned about is the very practical issues above --
>> of doing something useful with user data that either does not
>> conform to this stricture as presented in a dictionary or on
>> which the dictionaries available to the application are silent.
>> �I am proposing that, rather than requiring an application to
>> throw up its hands and die, we try to maximize the useful work
>> to be accomplished and try to do something sensible with the
>> data, i.e. roll that which is unrolled or unroll that which is
>> rolled, if it allows the work of the application to get done.
>>
>>
>> If an application has no access to a dictionary for the datanames, it
>> will
>> not be able to roll up an unrolled loop, as it won't know what datanames
>> should be in the loop. �So I would make a counter-suggestion that (in
>> order
>> to get useful work done), we can help this dictionary-challenged
>> program by
>> making sure all datafiles that it is presented with have their loop
>> structures left intact.
>>
>>
>> I have yet to hear of a reason not to adopt that approach for
>> the cases listed above. �Once we have those two cases settled, I
>> would be happy to discuss the subtleties of whether the List
>> attribute itself should be modified or not, but first, please,
>> let us deal with this practical issue.
>>
>> Regards,
>> � �Herbert
>>
>> P.S. to Nick: �It is the current DDLm specification that would
>> require every application writer to read the dictionary in order
>> to process a CIF, else we would have no way to tell whether
>> rolled or unrolled presentation was in conformance with the
>> dictionary. �The list attribute is in the dictionary, not the
>> data file. �The discussion we are having is orthogonal to the
>> question of whether the DDLm specification requires the reading
>> of the dictionary.
>>
>>
>> I think this is the wrong way around. �*If* an application writer
>> wants to
>> see if a key-value dataitem should be instead in a loop, *then* they will
>> need to read the dictionary. �If they can do useful work without knowing
>> this information, then I'm not standing in their way. �A program which
>> claims to validate a data file *cannot* do the work it was designed for
>> unless it reads the dictionary, and must flag unrolled loops as a
>> violation
>> of the standard. �It may then offer to roll up the loops, to create a
>> conformant file. �What is the problem here?
>>
>>
>> P.S. to James: �I have read Nick's argument and on the DDLm
>> specification
>> issue and stick to voting for 2. �If we change the specification
>> then
>> strict adherence will no longer require List categories to be
>> presented
>> as looped data, and no more or less dictionary reading will be
>> required
>> than is required by the current specification, but users will be
>> annoyed by one less warning/error message they are not likely to
>> understand or be able to do anything about. �However, no matter
>> how
>> that vote comes out, we really do need to deal with the
>> practical issue
>> above -- there are an awful lot of PDB mmCIF data files.
>>
>>
>> PDB mmCIF files are not an issue for DDLm *at all*, as the mmCIF data
>> files
>> are written with respect to the DDL2 specification (not DDLm). �If and
>> when
>> a DDLm version of mmCIF appears, conversion of legacy files will
>> involve an
>> order of magnitude more work than just rolling up unrolled loops, so the
>> outcome of the present discussion will be by comparison background noise.
>>
>> =====================================================
>> �Herbert J. Bernstein, Professor of Computer Science
>> � Dowling College, Kramer Science Center, KSC 121
>> � � � �Idle Hour Blvd, Oakdale, NY, 11769
>>
>> � � � � � � � � +1-631-244-3035
>> � � � � � � � � [email protected]
>> =====================================================
>>
>> On Wed, 14 Apr 2010, Nick Spadaccini wrote:
>>
>>
>> This doesn?t actually make things clearer or easier James.
>> I will repeat it again.
>>
>> Strict adherence to the formal specification of DDLm
>> REQUIRES List categories to be presented as looped data,
>> even
>> if there is only one row.
>>
>> If the IUCr wishes to universally adopt the case where
>> instances of List categories that contain only one row may
>> be presented as a Set category then it can do so as an
>> accepted extension. This of course requires every
>> application writer to necessarily read the dictionary to
>> establish if the data is really a Set category or
>> possibly a List category. Once the IUCr adopts this
>> extension to DDLm within its implementation, I would
>> assume
>> every application writer would be required to adhere to
>> it.
>>
>> On 14/04/10 3:29 PM, "James Hester"
>> <[email protected]> wrote:
>>
>> � � �Dear all,
>>
>> � � �Both John and Herb have come out in favour of
>> allowing one-row loops to be unrolled. �Nick and I are
>> � � �both sceptical about the value of this idea. �We have
>> a few options:
>>
>> � � �1. �Disallow loop unrolling altogether (as in DDL1).
>> � � �2. �Allow loop unrolling for all DDLm dictionaries
>> � � �3. �Add a category-scope DDLm attribute stating that
>> one-row loops in this category and child
>> � � �categories may be unrolled. �If it appears in the
>> 'Head' category of the dictionary, it would mean
>> � � �that all categories in the dictionary could be
>> unrolled.
>>
>> � � �We have not discussed option 3: it basically means
>> deferring the decision on loop unrolling to the
>> � � �dictionary writers. �It also means that programmers
>> of generic CIF software will need to be prepared
>> � � �for either behaviour, so in that sense it is slightly
>> more burdensome than option 2.
>>
>> � � �Unless the silent majority would like to contribute
>> further thoughts on this matter, I suggest that we
>> � � �vote and move on. �I discern that the voting so far
>> would be:
>>
>> � � �Option 1: James, Nick
>> � � �Option 2: John, Herb
>> � � �Option 3: ?
>>
>> � � �(Some comments on John's post are inserted below).
>>
>> � � �James.
>> � � �On Thu, Apr 1, 2010 at 10:41 AM, John Westbrook
>> <[email protected]> wrote:
>> � � � � � �Hi all,
>>
>> � � � � � �Coming in late on this in support of Herb's
>> position.
>>
>> � � � � � �I �have never understood the necessity of
>> marking a category as
>> � � � � � �a 'list' type in the dictionary in the early
>> CIF DDL,
>> � � � � � �and in DDLm I find this even more confusing.
>> Given
>> � � � � � �that DDLm supports a category key which
>> provides a
>> � � � � � �well defined basis for each category, this
>> alone
>> � � � � � �would seem to provide the appropriate
>> expression of
>> � � � � � �cardinality.
>>
>>
>> � � �Absolutely agree, my objection is not to the loss of
>> some packet ordering information, this is
>> � � �explicitly excluded from the infoset produced by the
>> parser in any case.
>>
>>
>> � � � � � �The choice of exporting a category with a
>> single row as
>> � � � � � �a collection of keyword-value pairs or �using a
>> table
>> � � � � � �format via a loop_ �seems like a presentation
>> style
>> � � � � � �matter rather than dictionary issue.
>>
>>
>> � � �It is more than a presentation issue, as you have
>> lost the information that those key-value pairs
>> � � �belong together, and so you need to refer to your
>> dictionary to reconstitute them as a group. �And if
>> � � �you allow the possibility of unrolling single-row
>> loops for all categories, then significant extra
>> � � �work is done to check, and if necessary, transform
>> the internal representation back to a canonical
>> � � �looped form. �This reconstruction of the canonical
>> form is highly desirable in a DDLm context, where
>> � � �we often wish to apply dREL operations to all packets
>> in a loop.
>>
>>
>> � � � � � �As Herb has observed, the vast majority of DDL2
>> files opt
>> � � � � � �for key-value output for any category with a
>> single row.
>> � � � � � �I do not see what additional semantics are
>> conveyed by
>> � � � � � �regulating the manner data presentation in
>> these cases.
>>
>>
>> � � �See above - some semantic information is lost.
>>
>>
>> � � � � � �John
>>
>>
>>
>>
>> _______________________________________________
>> ddlm-group mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>>
>>
>
>
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] Finalizing DDLm (Nick Spadaccini)

Re: [ddlm-group] Finalizing DDLm (Herbert J. Bernstein)

Re: [ddlm-group] Finalizing DDLm (James Hester)

Re: [ddlm-group] Finalizing DDLm (Herbert J. Bernstein)

Re: [ddlm-group] Finalizing DDLm (John Westbrook)

Prev by Date: Re: [ddlm-group] Finalizing DDLm

Next by Date: [ddlm-group] Support for legacy files in DDLm

Prev by thread: Re: [ddlm-group] Finalizing DDLm

Next by thread: Re: [ddlm-group] Finalizing DDLm

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Finalizing DDLm