[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Finalizing DDLm

No, I didn't say that dramatic changes to existing DDL2 entries would be necessary.  I said that, compared to having to roll-up DDL2 loops, an order of magnitude more work would be necessary to convert legacy files.  If the amount of time required to write the code for rolling-up unrolled loops is about 2 hours, all I am saying is that around 20 hours would need to be spent on other aspects of conversion, my point being that concerns with ease of conversion are not a basis to insist on allowing one-packet loops to be unrolled.

The DDLm data model and the DDL2 data models *are* closely aligned.  The overall category tree defined in mmCIF could remain largely untouched.  However, DDLm gives more explicit semantic meaning to the category-subcategory relationship, and provides options to fine-tune that relationship, e.g. automatic joining.  Therefore, I anticipate that the mmCIF dictionaries would need some tweaking if they are behave properly given the extra semantic relationships.  As a result, a few datanames might e.g. find themselves moved into different categories, so aliases would need to be defined.  I think at least 20 hours could be spent on such work.

On Fri, Apr 16, 2010 at 12:58 AM, John Westbrook <jwest@rcsb.rutgers.edu> wrote:
This thread appears to be spiraling in a direction that I find both
confusing and disturbing.  The suggestion that supporting
DDLm would require dramatic changes in existing DDL2 entries implies that
DDLm will require changes in the logical model underlying the existing
dictionaries.  This hardly seems reasonable particularly for DDL2 entries
and appears to me like a very large departure from prior discussions
on maintaining support for legacy data instances.

I very strongly discourage the group from moving in any direction that
do not provide support for existing archives of data files.

John

On 4/15/10 6:38 AM, Herbert J. Bernstein wrote:
> I would appreciate a clarification of intent for DDL1 and DDL2 data
> files in the transition to DDLm:
>
> 1. Please assume somebody has an existing data file conformant to the
> current COMCIFS-approved DDL1 dictionaries, esp. the core, what are the
> specific changes that will be required to those data files for them to
> be acceptable under the proposed new DDLm conformant dictionaries?
>
> 2. Please assume somebody has an existing data file conformant to the
> current COMCIFS-approved DDL2 dictionaries, esp. mmCIF and imgCIF, what
> are the specific changes that will be required to those data files for
> them to be acceptable under the proposed new DDLm conformant dictionaries?
>
> Answers to these two questions would help to quantify the "order of
> magnitude more work" we will have to do as per James' remark:
>
>> PDB mmCIF files are not an issue for DDLm *at all*, as the mmCIF data
>> files
>> are written with respect to the DDL2 specification (not DDLm).  If and
>> when
>> a DDLm version of mmCIF appears, conversion of legacy files will
>> involve an
>> order of magnitude more work than just rolling up unrolled loops, so the
>> outcome of the present discussion will be by comparison background noise.
>
>
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
>
> +1-631-244-3035
> yaya@dowling.edu
> =====================================================
>
> On Thu, 15 Apr 2010, James Hester wrote:
>
>> Herbert, it seems to me that both of your issues are not relevant to this
>> discussion, in that they refer to situations for which DDLm is not used.
>> First, a clarification.  When I talk about a dictionary being
>> 'available' to
>> a program, I have in mind that it could be available at program
>> writing time
>> (i.e. available to the programmer) and/or at program running time.  I
>> hope
>> this corresponds with other peoples' usages.
>>
>> On Wed, Apr 14, 2010 at 9:23 PM, Herbert J. Bernstein
>> <yaya@bernstein-plus-sons.com> wrote:
>> Inasmuch as we appear to be discussing, rather than voting,
>> please allow me to clarify my position:
>>
>> I am _not_ concerned about whether a DDLm-conformant dictionary
>> does or does not have rules to say that a particular category is
>> or is not allowed to be presented "unrolled".  I am concerned
>> with how to handle two important cases:
>>
>>  1.  Existing legacy data files that have "rolled" or "unrolled"
>> loops that do not conform to the new dictionary rules; and
>>
>>
>> Those legacy data files were written with a particular dictionary in
>> mind.
>> If that dictionary DDL allows loop unrolling (i.e. DDL2) then any
>> application that presumes to read datafiles based on that dictionary will
>> need to support it.  But what we are discussing is how to specify the
>> construction of data files written with respect to a DDLm (not DDL2)
>> based
>> dictionary.  So I don't see how your case (1) is relevant.
>>
>>  2.  Applications that are confronted with a data file, portions
>> of which are not in dictionaries to which that application has
>> access.
>>
>>
>> If an application has no access to the dictionary relevant to a given
>> dataname, it cannot be compelled to issue an error or warning when
>> confronted with an unrolled loop, because it has no way of knowing
>> that the
>> loop is unrolled.   In such a situation it would be bizarre to specify
>> any
>> dictionary-derived behaviour, and I am not proposing to do so.
>> Likewise, if
>> a CIF-writing application has no dictionary information about a dataname
>> that it is writing, we are unable to impose any dictionary-based
>> behaviour.   The latter is a fairly 'Alice in Wonderland' situation: a
>> program writing a dataname that neither it (nor the programmer) knows
>> nothing about...
>>
>> If an application has a dictionary handy and that dictionary
>> says something relevant about the rolledness or unrolledness of
>> a loop, then I am reluctantly willing to accept the DDLm
>> specification requiring the issuance of a warning or an error
>> message.  Some application writers may decide not to do that,
>> but that is a different discussion.
>>
>> What I am concerned about is the very practical issues above --
>> of doing something useful with user data that either does not
>> conform to this stricture as presented in a dictionary or on
>> which the dictionaries available to the application are silent.
>>  I am proposing that, rather than requiring an application to
>> throw up its hands and die, we try to maximize the useful work
>> to be accomplished and try to do something sensible with the
>> data, i.e. roll that which is unrolled or unroll that which is
>> rolled, if it allows the work of the application to get done.
>>
>>
>> If an application has no access to a dictionary for the datanames, it
>> will
>> not be able to roll up an unrolled loop, as it won't know what datanames
>> should be in the loop.  So I would make a counter-suggestion that (in
>> order
>> to get useful work done), we can help this dictionary-challenged
>> program by
>> making sure all datafiles that it is presented with have their loop
>> structures left intact.
>>
>>
>> I have yet to hear of a reason not to adopt that approach for
>> the cases listed above.  Once we have those two cases settled, I
>> would be happy to discuss the subtleties of whether the List
>> attribute itself should be modified or not, but first, please,
>> let us deal with this practical issue.
>>
>> Regards,
>>    Herbert
>>
>> P.S. to Nick:  It is the current DDLm specification that would
>> require every application writer to read the dictionary in order
>> to process a CIF, else we would have no way to tell whether
>> rolled or unrolled presentation was in conformance with the
>> dictionary.  The list attribute is in the dictionary, not the
>> data file.  The discussion we are having is orthogonal to the
>> question of whether the DDLm specification requires the reading
>> of the dictionary.
>>
>>
>> I think this is the wrong way around.  *If* an application writer
>> wants to
>> see if a key-value dataitem should be instead in a loop, *then* they will
>> need to read the dictionary.  If they can do useful work without knowing
>> this information, then I'm not standing in their way.  A program which
>> claims to validate a data file *cannot* do the work it was designed for
>> unless it reads the dictionary, and must flag unrolled loops as a
>> violation
>> of the standard.  It may then offer to roll up the loops, to create a
>> conformant file.  What is the problem here?
>>
>>
>> P.S. to James:  I have read Nick's argument and on the DDLm
>> specification
>> issue and stick to voting for 2.  If we change the specification
>> then
>> strict adherence will no longer require List categories to be
>> presented
>> as looped data, and no more or less dictionary reading will be
>> required
>> than is required by the current specification, but users will be
>> annoyed by one less warning/error message they are not likely to
>> understand or be able to do anything about.  However, no matter
>> how
>> that vote comes out, we really do need to deal with the
>> practical issue
>> above -- there are an awful lot of PDB mmCIF data files.
>>
>>
>> PDB mmCIF files are not an issue for DDLm *at all*, as the mmCIF data
>> files
>> are written with respect to the DDL2 specification (not DDLm).  If and
>> when
>> a DDLm version of mmCIF appears, conversion of legacy files will
>> involve an
>> order of magnitude more work than just rolling up unrolled loops, so the
>> outcome of the present discussion will be by comparison background noise.
>>
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>   Dowling College, Kramer Science Center, KSC 121
>>        Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                 +1-631-244-3035
>>                 yaya@dowling.edu
>> =====================================================
>>
>> On Wed, 14 Apr 2010, Nick Spadaccini wrote:
>>
>>
>> This doesn?t actually make things clearer or easier James.
>> I will repeat it again.
>>
>> Strict adherence to the formal specification of DDLm
>> REQUIRES List categories to be presented as looped data,
>> even
>> if there is only one row.
>>
>> If the IUCr wishes to universally adopt the case where
>> instances of List categories that contain only one row may
>> be presented as a Set category then it can do so as an
>> accepted extension. This of course requires every
>> application writer to necessarily read the dictionary to
>> establish if the data is really a Set category or
>> possibly a List category. Once the IUCr adopts this
>> extension to DDLm within its implementation, I would
>> assume
>> every application writer would be required to adhere to
>> it.
>>
>> On 14/04/10 3:29 PM, "James Hester"
>> <jamesrhester@gmail.com> wrote:
>>
>>      Dear all,
>>
>>      Both John and Herb have come out in favour of
>> allowing one-row loops to be unrolled.  Nick and I are
>>      both sceptical about the value of this idea.  We have
>> a few options:
>>
>>      1.  Disallow loop unrolling altogether (as in DDL1).
>>      2.  Allow loop unrolling for all DDLm dictionaries
>>      3.  Add a category-scope DDLm attribute stating that
>> one-row loops in this category and child
>>      categories may be unrolled.  If it appears in the
>> 'Head' category of the dictionary, it would mean
>>      that all categories in the dictionary could be
>> unrolled.
>>
>>      We have not discussed option 3: it basically means
>> deferring the decision on loop unrolling to the
>>      dictionary writers.  It also means that programmers
>> of generic CIF software will need to be prepared
>>      for either behaviour, so in that sense it is slightly
>> more burdensome than option 2.
>>
>>      Unless the silent majority would like to contribute
>> further thoughts on this matter, I suggest that we
>>      vote and move on.  I discern that the voting so far
>> would be:
>>
>>      Option 1: James, Nick
>>      Option 2: John, Herb
>>      Option 3: ?
>>
>>      (Some comments on John's post are inserted below).
>>
>>      James.
>>      On Thu, Apr 1, 2010 at 10:41 AM, John Westbrook
>> <jwest@rcsb.rutgers.edu> wrote:
>>            Hi all,
>>
>>            Coming in late on this in support of Herb's
>> position.
>>
>>            I  have never understood the necessity of
>> marking a category as
>>            a 'list' type in the dictionary in the early
>> CIF DDL,
>>            and in DDLm I find this even more confusing.
>> Given
>>            that DDLm supports a category key which
>> provides a
>>            well defined basis for each category, this
>> alone
>>            would seem to provide the appropriate
>> expression of
>>            cardinality.
>>
>>
>>      Absolutely agree, my objection is not to the loss of
>> some packet ordering information, this is
>>      explicitly excluded from the infoset produced by the
>> parser in any case.
>>
>>
>>            The choice of exporting a category with a
>> single row as
>>            a collection of keyword-value pairs or  using a
>> table
>>            format via a loop_  seems like a presentation
>> style
>>            matter rather than dictionary issue.
>>
>>
>>      It is more than a presentation issue, as you have
>> lost the information that those key-value pairs
>>      belong together, and so you need to refer to your
>> dictionary to reconstitute them as a group.  And if
>>      you allow the possibility of unrolling single-row
>> loops for all categories, then significant extra
>>      work is done to check, and if necessary, transform
>> the internal representation back to a canonical
>>      looped form.  This reconstruction of the canonical
>> form is highly desirable in a DDLm context, where
>>      we often wish to apply dREL operations to all packets
>> in a loop.
>>
>>
>>            As Herb has observed, the vast majority of DDL2
>> files opt
>>            for key-value output for any category with a
>> single row.
>>            I do not see what additional semantics are
>> conveyed by
>>            regulating the manner data presentation in
>> these cases.
>>
>>
>>      See above - some semantic information is lost.
>>
>>
>>            John
>>
>>
>>
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>>
>>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]