[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] CIF-2 changes
- To: [email protected], Group finalising DDLm and associated dictionaries <[email protected]>
- Subject: Re: [ddlm-group] CIF-2 changes
- From: "Herbert J. Bernstein" <[email protected]>
- Date: Tue, 17 Nov 2009 08:11:52 -0500 (EST)
- In-Reply-To: <C72863BD.12491%[email protected]>
- References: <C72863BD.12491%[email protected]>
Dear Colleagues,
We have now come full circle with no agreement on anything. I for one,
for the reasons outlined in many prior messages, do not think this
latest (=oldest) approach to be a good idea.
Clearly, if we are ever to resolve this, we need to get all the players
into a meeting at one time and work things out. I suspect we will not be
able to arrange a timely physical meeting. Perhaps some sort of an
emeeting (Ajax, Skype or somesuch) would work.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[email protected]
=====================================================
On Tue, 17 Nov 2009, Nick Spadaccini wrote:
> David�s Option 3 is the simplest way forward, and actually revisits much of
> what was discussed back in 2007-08. Somehow those discussions were locked
> far back in my brain, only to be awakened by David�s summary. Thanks for
> that.
>
> So now I return to the STAR syntax. DDLm is part of STAR and hence
> restrictions on data names so they can be parsed etc is a STAR issue. I am
> brought around to Joe�s idea that STAR accepts any 8 bit character sequence
> since that is the most complete set � and that this will be restricted to
> UTF-8 within the CIF specification. Any other adoptee of STAR can choose
> whatever restricted encoding they wish.
>
> I still need to treat data names as programming identifiers within dREL so
> accordingly I propose we restrict the data names in STAR (and all variants)
> to be ASCII [A-Za-z0-9_.] as we have used in the sample dictionaries, DDLm
> and dREL.
>
> The data values will be represented as discussed in previous threads and
> that the reverse solidus and the token delimiters discussed will be ASCII
> characters. We can now return to [] as the list delimiters, and {} as the
> associative array delimiters.
>
> Backward compatibility to CIF1 names is handled by exploiting the _alias
> attributes in the definition. A CIF2 parser with dictionary can handle
> everything. Any CIF1 parser can handle CIF1 data files (also CIF2 data files
> up to a point, but won�t know what the data names mean � unless they have
> hardcoded it).
>
> A CIF2 parser would like a leading comment to tell it what sort of file it
> is parsing. It the absence of that comment, a pre-scan will need to be done.
> The telltale indicators it is a CIF1 data file are multiple occurrences of,
>
> (1) data names that potentially contain [] or /
> (2) unquoted strings with illegal characters
> (3) quoted strings that result in parse failure (typically because they must
> have an embedded [but not elided] quote character as allowed in CIF1).
>
> It needs to be a pre-scan because all 3 of the above in an identified CIF2
> data file would result in something quite different since there are coercion
> rules for when the whitespace separator is missing.
>
> For instance IF I KNOW it is a CIF2 file and I read
>
> _name[1]
>
> Then this can only be an error and I coerce into
>
> _name [1]
>
> IF I DON�T KNOW the file type, the occurrence of _name[1] flags it as
> potentially a CIF1 file. If _name[1] is in an alias list, this re-enforces
> the likelihood of CIF1. Multiple instances of these �errors� (or any others
> in the above list) indicate it is a CIF1 file (my only other conclusion
> would be it is a VERY BADLY written CIF2).
>
> I think this takes us back to a very simple rule set, and I don�t think the
> restriction in the character set for data names will cause problems. For all
> the excitement of UTF-8 etc I know of programming languages that support
> reading and writing data in such encodings but I haven�t seen one that
> allows/encourages one to write programmes declaring identifiers in UTF-8
> character sets. (They well exist I just haven�t seen them).
>
>
> On 17/11/09 12:04 AM, "David Brown" <[email protected]> wrote:
>
>> James,
>>
>> There seems to be a lull in the discussions on CIF2 syntax so this would be a
>> good time for you, or appointed chosen by you, to summarize where we are at
>> and propose a set of rules that will can work with as we move forward. I
>> realize that much of the work I have already done on dictionaries will need to
>> be revisited, and Herbert also seems anxious to have some decisions on the
>> various topics that have been discussed.
>>
>> I believe we have a consensus on a number of points, but these need to be
>> written down clearly and need our formal agreement so we can move ahead.
>>
>> David
>>
>>
>> _______________________________________________
>> ddlm-group mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia t: +61 (0)8 6488 3452
> 35 Stirling Highway f: +61 (0)8 6488 1089
> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick
> MBDP M002
>
> CRICOS Provider Code: 00126G
>
> e: [email protected]
>
>
>
>
_______________________________________________ ddlm-group mailing list [email protected] http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] CIF-2 changes (Nick Spadaccini)
- References:
- Re: [ddlm-group] CIF-2 changes (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] CIF-2 changes
- Next by Date: Re: [ddlm-group] CIF-2 changes
- Prev by thread: Re: [ddlm-group] CIF-2 changes
- Next by thread: Re: [ddlm-group] CIF-2 changes
- Index(es):

