Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Dear Colleagues,

When we went from CIF 1.0 to CIF 1.1, we all tried very hard to make as 
many CIF 1.0 files as possible remain valid CIF 1.1 files without the need 
for any changes.  When DDLm was introduced a promise was made to the 
community that is still on the IUCr web site in bold face:

"No changes are required in existing archival data files in order to apply 
domain dictionaries written in DDLm."

If we are now breaking that promise, which it appears we are about to do 
if we are not very, very careful, then I believe the have an ethical 
obligation to make that clear to the community and invite them into the 
discussion.

I have to run to get ready to submit a proposal now, but I will respond 
more directly to James Hester's questions about the details of how this 
change impacts existing CIFs later today, but please do take a look at 
what we said on

   http://www.iucr.org/resources/cif/ddl/ddlm

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Fri, 2 Oct 2009, James Hester wrote:

> Herbert writes:
>
> 	" Bottom line -- what is proposed is a very different language
> 	that will use a significantly different lexer and parser from
> 	the one used for DDL1 and DDL2 CIFS, guaranteeing to leave us
> 	with multiple dialects for a very long time.  I think that is
> 	a shame -- rather than DDLm consolidating DDL1 and DDL2 and
> 	adding useful new features, we are simply going to end up with
> 	DDL1, DDL2 and DDL3 as three distinct dialects.
>
> 	  I think this is unwise."
>
> In order not to confuse matters, let us restrict the use of the terms
> DDL1, DDL2 and DDL3 to dictionary definition languages, not the syntax
> variations we are currently discussing.  I believe Herbert has in mind
> CIF 1.0, 1.1 and 1.2. I would like to explore his concern about the
> difference in the proposed CIF 1.2 parser.  Some difference is
> inevitable in that we have added two new constructs, the triple quote
> delimited string and the bracketed list.  Because of this, a CIF 1.1
> parser will break on a CIF 1.2 file regardless of any changes to
> string content rules, so that is presumably not the main
> concern. Perhaps the concern is that a CIF1.2 parser will not be able
> to parse all files built according to previous CIF syntax versions?
> But this is always going to be the case due to the (theoretical)
> possibility of a triple quote appearing as a value in a CIF 1.1 file,
> which would mean a single quote under CIF1.1 rules, but the beginning
> of a string under CIF1.2. Perhaps Herbert could expand on why this
> inability of a CIF 1.2 parser to parse a CIF 1.1 file is a problem.
>
> To take the DDL1/DDL2/DDL3 comment at face value, these are by design
> three distinct dictionary languages, with DDL3 taking the best of DDL1
> and 2.  I don't see why this is a shame.
>
> Herbert goes on to say:
>
> 	  Just to be clear, I do think the restriction on character
> 	  set of non-delimited strings is unwise -- of all the changes
> 	  proposed, I believe that it is the one that invalidates the
> 	  largest number of existing CIFS, and serves no useful
> 	  purpose that could not be achieved by the simple exclusion
> 	  of specific cases, as we have already done.
>
> In what sense are existing CIFs 'invalidated'?  They are all still
> valid CIF1.1 files, which is a published standard.  Perhaps Herbert or
> somebody could expand on what the real world issues might be because
> of the proposed change?
>
> Finally, Herbert writes:
>
>  "I would also consider all the printable UTF-8 characters as valid."
>
> Herbert, could you please explain in more detail this proposal.  Do
> you mean that only the one-byte printable UTF-8 characters (= ASCII)
> are included?  Or do you mean that all of UTF-8 is included,
> i.e. characters may need up to 4 bytes to be represented?  If the
> latter, then are we proposing to accept all legal UTF-8 byte values,
> without using an intermediate representation?  Is this use of UTF8
> restricted to delimited strings?
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.