Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Updated draft from subgroup discussing encodings

Dear all

The following is something that ought to be left until CIF3, but
perhaps there is no harm in introducing the concept here, in support of a
string concatenation mechanism (and partly because any feedback would be useful
to me in exploring ways to include 'richer' content in CIFs for publication purposes).

With the introduction of a string-concatenation mechanism, something I have been thinking about
for some time has become a real possibility (to my mind at least).

Basically, I've been looking for a mechanism to reference another data value in a data value without having to
make the reference using some sort of application-specific escape sequence within the value.

With a concatenation mechanism, something like the following beomes possible:


_publ_text
;
...Most of this text is written according to traditional cif text markup conventions...
However, here is an equation written in TeX...

; _ $_publ_object_value{"data_":"I" "_publ_object_id":"1"} _
;
...If it were included in TeX directly in this text field there would be no
way of knowing that the syntax was intended to be interpretted and processed as TeX...
;


In this example the publ_object is defined by a loop:

loop_
_publ_object_id
_publ_object_type
_publ_object_value
1 'tex' 'W^{1} = \matrix{1 & 0 & 0 & 0 \cr 0 & 1 & 0 & 0 \cr 0 & 0 & 1 & 0 \cr 0 & 0 & 0 & 1 \cr}'
...

The structure of the 'reference' could be seen as a 'query language' in this context -

$_publ_object_value reads 'VALUE OF _publ_object_value'

{"data_":"I" "_publ_object_id":"1"} reads "WHERE the data block id is 'I' and _publ_object_id is '1'"

Such queries could be nested when e.g. loops are related by key values.

This concept is based on using the $ and assumes that $ has no other syntactic use in this context
at this level (I suspect it may have use in dREL but havent found reference to it in the docs I've seen - nor the DDLm docs)?

In addition, although it utilizes syntax 'structures' that will already be in CIF2 and builds on the concepts of dREL/DDLm,
it would still require its own 'chapter' in the specification and close scrutiny to ensure that it is a robust specification
(there are a number of issues to address, e.g. the value returned by the query must obviously be of the appropriate type
for the item that invoked the query, or castable; ... if more than one value matches the query, should it be returned using a list structure...etc.).
Furthermore, it is not the sort of thing that you would expect to be implemented 'by hand'.

However, without a concatenation mechanism, there would be no point in me exploring this at all
(part of my work for the IUCr involves such exploration).

Cheers

Simon


From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Saturday, 9 October, 2010 12:24:27
Subject: Re: [ddlm-group] Updated draft from subgroup discussing encodings

The change in the handling of an underscore in a tag name (requiring
at least one more character) is a good idea in any case (whether
or not the use for cancatenation is adopted).  I suggest we put that
change to a vote promptly and separately.

As a matter of clean style, the use of whitespace around the underscore
is certainly a good idea for a compliant CIF2 writer.

=====================================================
Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Sat, 9 Oct 2010, SIMON WESTRIP wrote:

> I would prefer the first form only - i.e. treat the lonely underscore as a
> 'keyword'
> and thus require separation by whitespace. But this preference has more to
> do with fitting
> this operator element in the current 'classes' of cif elements.
>
> On a related point, the draft spec states:
>
> "A data name begins with an ASCII _ and may be followed by any number of
> characters within the 2048
> character restriction."
>
> I think this should read:
>
> "A data name begins with an ASCII _  and is followed by one or more
> characters within the 2048
> character restriction."
>
> Or words to that effect - especially if the underscore is adopted as an
> operator.
>
> Cheers
>
> Simon
>
>
> ____________________________________________________________________________
> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Friday, 8 October, 2010 23:38:10
> Subject: Re: [ddlm-group] Updated draft from subgroup discussing encodings
>
> I think it is.
>
> The current form of the proposal, as per your suggestion, is:
>
>
> "string1" _ "string2" or
> "string1"_"string2"
> etc.
>
> will represent the concatenation of string1 and string2
> for any quoted strings, string1 and string2 using any
> of the valid quote marks.
>
> The first form does not conflict with any valid cif2
> or cif1 construct unless we accept underscore by itself as
> a tag.  The second form does conflict with a cif1
> quoted string and therefore should not be used if there
> is any ambiguity as to whether the file in question
> is a cif1 or a cif2 file.
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
>
> On Fri, 8 Oct 2010, SIMON WESTRIP wrote:
>
> > Dear all
> >
> > "Once we resolve the string concatenation operator issue..."
> >
> > Is this issue still on the table?
> >
> > Cheers
> >
> > Simon
> >
> >___________________________________________________________________________
> ______________________________________________________
> > From: James Hester <jamesrhester@gmail.com>
> > To: ddlm-group <ddlm-group@iucr.org>
> > Sent: Tuesday, 5 October, 2010 23:52:08
> > Subject: [ddlm-group] Updated draft from subgroup discussing encodings
> >
> > Dear DDLm group,
> >
> > The encoding group that was split off this group and tasked with
> > developing a mutually satisfactory approach to encodings in CIF2 has
> > now produced an updated draft of the CIF2 'changes' document.  Brian
> > has posted this on the IUCr website at
> >http://www.iucr.org/__data/assets/pdf_file/0016/41911/cif2_syntax_changes_j
> rh20101005.pdf
> > The changes relative to the July draft are in section 2 describing the
> > character set, and some additional text in section 1.
> >
> > Once we resolve the string concatenation operator issue, I think we
> > are in good shape to take CIF2 to COMCIFS for approval.  I would once
> > again urge anybody with any outstanding issues regarding DDLm or dREL
> > to bring those issues up as soon as possible.
> >
> > James.
> > --
> > T +61 (02) 9717 9907
> > F +61 (02) 9717 3145
> > M +61 (04) 0249 4148
> > _______________________________________________
> > ddlm-group mailing list
> > ddlm-group@iucr.org
> > http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
> >
>
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.