Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Powder CIF Proposals

  • Subject: Re: Powder CIF Proposals
  • From: "ROBIN SHIRLEY (USER)" <R.Shirley@xxxxxxxxxxxx>
  • Date: Fri, 20 Oct 2000 13:48:50 +0100 (BST)
Apologies for these delayed responses to Nick's comments of 29 Sept:

1) 00-2-11.2) _pd_index_appendix

> > The sort of indexing history envisaged in my original proposal can
> > now be captured and updated automatically in the form of the
> > Crysfire logfile for that dataset - an example is attached.

> My main concern with these proposals is that I would like to see
> the dictionary definitions for these data items. Until then I
> accept that all of these proposed items are reasonable but I would
> like to know how I would parse their contents. For instance how
> will an indexed history be represented and parsed?

The intention is that this should essentially provide an opportunity
to record human-readable summary of the indexing history of the
current dataset.  Thus, even if a Crysfire logfile now forms a good
basis for this section, it should remain undefined free text,
between lines containing semi-colons in col 1.  e.g...

_pd_index_appendix
;
Any human- or program-generated text that summarises the indexing 
history of this dataset.
;

2) _pd_index_merit

> > Thus this would become:
> >    _pd_index_merit M FOM program
> > 
> >    (e.g. _pd_index_merit 21.7 M20 ITO12,
> >    or _pd_index_merit 54.215 M1 CRYS934h).

> This is syntactically incorrect, and it begs the question are
> these 3 components of one object (a list) or 3 separate objects?

Thanks for pointing out the incorrect syntax, which slipped in while
the concept was developing in response to people's suggestions. 

My original intention was that this should simply be a piece of 
quoted text:

e.g. _pd_index_merit 'M20 = 21.7 (ITO12)'

But in response to people's feedback this became elaborated into a 
sequence of three separate items (which could be looped if 
necessary):

  M (the numerical value)

  FOM (the generic type, left as quoted text)

  "program", or as I now prefer, "source" (another piece of quoted  
  text, which for example summarises the program version or other  
  source of the specific algorithm used)

e.g.

_pd_index_merit_M          21.7
_pd_index_merit_FOM        'M20'
_pd_index_merit_source     'ITO12'

The reason for leaving the FOM and source items as quoted text is
that I see no early prospect of standardising them, and have doubts
whether such a restriction would actually be desirable.

Some possible FOM terms are still rather fluid (especially the most
popular, M20, which was originally defined in a way that left to the
judgement of the implementor what was meant by an "indexed" line,
and has since been subject to various reinterpretations and
extensions). There are also newer FOM contenders (e.g. FN, M1, PM)
which outperform M20 for particular purposes.  Thus I'm not sure that
we are ready to try to compile a list of standardised FOM
definitions.  A way round this is to leave them as text.

This argument applies more strongly in the case of "source", which
could then be left open to whatever elaboration might seem helpful, 
such as the addition of a reference or a URL.

> I also have a general comment concerning worries of the potential
> size of data files and looped items. I think many are increasingly
> coming around to the idea that it is important to retain
> "primitive" (read non-derivable) data as much as possible. If these
> trial cells are "relevant" to the discipline then there should be a
> mechanism for retaining such information.

I have no particular position on this issue, except to point out
that one should perhaps not rely too boldly on the increasing 
storage capacity of modern computers, since such lists could easily 
become very large, and in the case of indexing most of their bulk 
would refer to low-probability hypotheses.

This is why I tend to favour keeping relatively concise summary logs
in a section such as_pd_index_appendix rather than retaining more
bulky looped lists.

Best wishes

Robin Shirley

-------------------------------------------------------

Date:          Fri, 29 Sep 2000 08:25:33 +0100 (BST)
From:          Nick Spadaccini <nick@cs.uwa.edu.au>
Subject:       Re: Powder CIF Proposals

On Thu, 28 Sep 2000, ROBIN SHIRLEY (USER) wrote:

> 00-2-11.1) _pd_proc_quadr_Q  (or _pd_index_quad_Q - see discussion
> below)
> 
> I accept that if this could be derived directly from
> _pd_peak_d_spacing, then the case for including it would be weak,

The fact that a quantity may be directly derivable from another is NOT an
argument for its exclusion. Such an argument would (strictly) see
structure coordinates (as an extreme example) not defined since these are
derivable from intensity measurements.

The STAR developers have spent the last three years working on the
definition and semantics of the method attributes supported by STAR and by
inference CIF. This is the mechanism by which the exact relationships
between data items may be specified (algorithmically). Hence in our
prototype the dictionary (which is the MOST important component of the
STAR and CIF systems) is literally compiled into a suite of Java and
Python objects. A request for a data item results in the value if stored
in the data file or an invocation of the objects which will eventually
result in a value by evaluation. 

My point here is that, the fact that some quantity is derivable from
another is an important INCLUSION to be made into the dictionary rather
that a reason to exclude it.

> 00-2-11.2) _pd_index_appendix

> The sort of indexing history envisaged in my original proposal can
> now be captured and updated automatically in the form of the Crysfire
> logfile for that dataset - an example is attached.

My main concern with these proposals is that I would like to see the
dictionary definitions for these data items. Until then I accept that all
of these proposed items are reasonable but I would like to know how I
would parse their contents. For instance how will an indexed history be
represented and parsed?

> Thus this would become:
>    _pd_index_merit M FOM program
> 
>    (e.g. _pd_index_merit 21.7 M20 ITO12,
>    or _pd_index_merit 54.215 M1 CRYS934h).

This is syntactically incorrect, and it begs the question are these 3
components of one object (a list) or 3 separate objects?

I also have a general comment concerning worries of the potential size of
data files and looped items. I think many are increasingly coming around
to the idea that it is important to retain "primitive" (read
non-derivable) data as much as possible. If these trial cells are
"relevant" to the discipline then there should be a mechanism for
retaining such information.

cheers

Nick

--------------------------------
Dr Nick Spadaccini
Department of Computer Science              voice: +(61 8) 9380 3452
University of Western Australia               fax: +(61 8) 9380 1089
Nedlands, Perth,  WA  6907                 email: nick@cs.uwa.edu.au
AUSTRALIA                        web: http://www.cs.uwa.edu.au/~nick





Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.