Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Herbert writes:

	" Bottom line -- what is proposed is a very different language
	that will use a significantly different lexer and parser from
	the one used for DDL1 and DDL2 CIFS, guaranteeing to leave us
	with multiple dialects for a very long time.  I think that is
	a shame -- rather than DDLm consolidating DDL1 and DDL2 and
	adding useful new features, we are simply going to end up with
	DDL1, DDL2 and DDL3 as three distinct dialects.

	  I think this is unwise."

In order not to confuse matters, let us restrict the use of the terms
DDL1, DDL2 and DDL3 to dictionary definition languages, not the syntax
variations we are currently discussing.  I believe Herbert has in mind
CIF 1.0, 1.1 and 1.2. I would like to explore his concern about the
difference in the proposed CIF 1.2 parser.  Some difference is
inevitable in that we have added two new constructs, the triple quote
delimited string and the bracketed list.  Because of this, a CIF 1.1
parser will break on a CIF 1.2 file regardless of any changes to
string content rules, so that is presumably not the main
concern. Perhaps the concern is that a CIF1.2 parser will not be able
to parse all files built according to previous CIF syntax versions?
But this is always going to be the case due to the (theoretical)
possibility of a triple quote appearing as a value in a CIF 1.1 file,
which would mean a single quote under CIF1.1 rules, but the beginning
of a string under CIF1.2. Perhaps Herbert could expand on why this
inability of a CIF 1.2 parser to parse a CIF 1.1 file is a problem.

To take the DDL1/DDL2/DDL3 comment at face value, these are by design
three distinct dictionary languages, with DDL3 taking the best of DDL1
and 2.  I don't see why this is a shame.

Herbert goes on to say:

	  Just to be clear, I do think the restriction on character
	  set of non-delimited strings is unwise -- of all the changes
	  proposed, I believe that it is the one that invalidates the
	  largest number of existing CIFS, and serves no useful
	  purpose that could not be achieved by the simple exclusion
	  of specific cases, as we have already done.

In what sense are existing CIFs 'invalidated'?  They are all still
valid CIF1.1 files, which is a published standard.  Perhaps Herbert or
somebody could expand on what the real world issues might be because
of the proposed change?

Finally, Herbert writes:

  "I would also consider all the printable UTF-8 characters as valid."

Herbert, could you please explain in more detail this proposal.  Do
you mean that only the one-byte printable UTF-8 characters (= ASCII)
are included?  Or do you mean that all of UTF-8 is included,
i.e. characters may need up to 4 bytes to be represented?  If the
latter, then are we proposing to accept all legal UTF-8 byte values,
without using an intermediate representation?  Is this use of UTF8
restricted to delimited strings?


-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.