[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Data-name character restrictions - one last time

To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Data-name character restrictions - one last time
From: Joe Krahn <krahn@niehs.nih.gov>
Date: Wed, 09 Dec 2009 14:00:11 -0500
In-Reply-To: <a06240801c74578ec8b59@[192.168.2.104]>
References: <20091209144035.GB29341@emerald.iucr.org><a06240801c74578ec8b59@[192.168.2.104]>

In practice, CIF2 parsers should allow CIF1 data names within a CIF2 
formatted file. The question is whether these files should be allowed as 
valid CIF2, or just for convenience as a non-standard CIF2.

When CIF files are used as working data files, the restrictions should 
be relaxed. For long-term archival files, it makes sense to be more 
restrictive. I would just make the CIF1 names inaccessible to dREL. 
Alternatively, an implementation could allow CIF1 names only on reading, 
and require dictionary alias mappings to CIF2 names.

One argument in favor of allowing them would be that someone wants to 
convert all data files to CIF2 format, but they want to preserve the 
original data as-is, without alias mapping.

I think that the current CIF2 syntax makes it possible to use CIF1 names 
without any ambiguities. The question is whether they should be 
considered valid CIF2, or just a non-standard version that will be 
useful for the transitional period.

Joe


Herbert J. Bernstein wrote:
> Personally, I would greatly prefer to allow all data names that do not
> create a major lexer/parser conflict to appear in a data CIF and
> only apply the strong restrictions to data names that appear in CIF2
> dictionaries as defined data names (not as aliases).  -- Herbert
> 
> 
> At 2:40 PM +0000 12/9/09, Brian McMahon wrote:
>> I have one remaining niggle that I'd like to revisit before we put
>> this finally to bed. As has been mentioned a couple of times
>> recently, restricting the data-name character set does invalidate
>> syntactically many existing CIF 1 files (e.g. _refine_ls_shift/esd_max ).
>> We have discussed strategies for handling this, and I think these
>> are workable strategies, but will involve investment and hence expense
>> in workflow management in CIF archives.
>>
>> I understand the rationale behind this restriction is to simplify
>> future processing of data names in areas such as dREL
>> applications. The question really is whether we're choosing the right
>> trade-off in making things cleaner at that end of the processing
>> chain. I would suppose that a dREL or other application could ingest a
>> data name with dangerous characters, convert it internally into a
>> "safe" identifier that's used for all processing, and then restore the
>> original form upon output; but writing that intermediate layer of
>> processing is of course expensive (especially if there aren't readily
>> available libraries that will do this transparently).
>>
>> I suspect that some of the original proposed syntactic changes also
>> had the effect (whether by design or collaterally) of simplifying i/o,
>> data structure management, symbol table processing etc., but those may
>> have suffered in the subsequent revision exercise we've just been
>> practising. Given the consensus we are now approaching, would the code
>> builders now be prepared to incur the addition expense of handling
>> "dangerous" data names?
>>
>> I really don't want to spark off a long discussion on this - if a
>> quick round of response shows that there's no appetite to allow
>> the additional punctuation characters in data names, I'll accept that
>> gracefully.
>>
>> ***
>>
>> One last comment while I have the floor, though it is related in part
>> to the above question. A concern raised in the editorial office was
>> that there would be circumstances where users didn't know if they were
>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting
>> to the vi editor - and we're imagining most of them are dealing with
>> small-molecule/inorganic CIFs). My supposition is that the IUCr
>> editorial offices would only want to use CIF2 seriously in association
>> with DDLm dictionaries, and that we would expect the revised core
>> dictionaries to use the dot component in data names to signal this
>> further evolution. So even a superficial glimpse of the middle of a
>> CIF would make it clear whether it was CIF1 or CIF2.
>>
>> Does that fit in with how others see this progressing?
>>
>> Cheers
>> Brian
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Data-name character restrictions - one last time (John Westbrook)

Re: [ddlm-group] Data-name character restrictions - one last time (John Westbrook)

References:

[ddlm-group] Data-name character restrictions - one last time (Brian McMahon)

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

Prev by Date: Re: [ddlm-group] Elide close quotes by doubling?

Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Data-name character restrictions - one last time