[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
> The troubling part of this is "unique within a loop". The handling of
> relational keys is complex but clear, because categories are well-defined.
> The content of a loop beyond the relational model is not clear without much
> more information, especially for numeric data and unicode data, both of which
> come with major ambiguities in terms of uniqueness.
The proposed uniqueness constraint does not introduce any new ambiguities in
terms of value uniqueness. The '_category_key.name' data item already allows
to use data items of any type and, as a result, requires the validating program
to handle composite unique keys. In addition to that, in some cases even the
Reply to: [list | sender only]
Re: [ddlm-group] Adding a DDLm attribute for uniqueness
- To: ddlm-group@iucr.org
- Subject: Re: [ddlm-group] Adding a DDLm attribute for uniqueness
- From: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
- Date: Mon, 10 Feb 2020 17:29:53 +0200
Dear DDLm maintainers,
thank you for allowing me to join the discussion. I will combine my answers to two
previous post in a single e-mail.
> The troubling part of this is "unique within a loop". The handling of
> relational keys is complex but clear, because categories are well-defined.
> The content of a loop beyond the relational model is not clear without much
> more information, especially for numeric data and unicode data, both of which
> come with major ambiguities in terms of uniqueness.
The proposed uniqueness constraint does not introduce any new ambiguities in
terms of value uniqueness. The '_category_key.name' data item already allows
to use data items of any type and, as a result, requires the validating program
to handle composite unique keys. In addition to that, in some cases even the
'_category.key_id' data item references items that allow Unicode values
(i.e. '_atom_site.label' in the ATOM_SITE category).
> The situation gets even more confusing when trying to make a database from
> multiple entries. We add keys precisely to allow for duplication of existing
> keys. How will we handle these new pseudo-keys? I would suggest that any
> proposal be presented with a clear view of how we will handle databases
> without breaking the new proposed constraints
Each CIF data block can be viewed as a small relational database. In order
to store data from several such data blocks in a single database, one would
still need a column which maps values to their original data blocks. For
example, in order to store atom information from multiple data blocks,
the table would need a column that references the original data block,
i.e. an integer key which acts as a foreign key to the file/entry table.
If such key indeed exists in the table, then it can be combined with the unique
column(s) ("pseudo-key") to produce a new unique key. This new unique key should
be used instead of the one defined in the dictionary when dealing with databases
(atom labels may not be unique across several data blocks, but the combination
of an atom label and a data block identifier still retains uniqueness).
> The situation gets even more confusing when trying to make a database from
> multiple entries. We add keys precisely to allow for duplication of existing
> keys. How will we handle these new pseudo-keys? I would suggest that any
> proposal be presented with a clear view of how we will handle databases
> without breaking the new proposed constraints
Each CIF data block can be viewed as a small relational database. In order
to store data from several such data blocks in a single database, one would
still need a column which maps values to their original data blocks. For
example, in order to store atom information from multiple data blocks,
the table would need a column that references the original data block,
i.e. an integer key which acts as a foreign key to the file/entry table.
If such key indeed exists in the table, then it can be combined with the unique
column(s) ("pseudo-key") to produce a new unique key. This new unique key should
be used instead of the one defined in the dictionary when dealing with databases
(atom labels may not be unique across several data blocks, but the combination
of an atom label and a data block identifier still retains uniqueness).
> The proponent is aware of the currently available attributes for category keys.
> I believe this proposal is aimed at providing further checks in software for
> data names that are not category keys but are also supposed to be unique,
> the canonical example being symmetry operators. My objection is that expansion
> dictionaries can remove this uniqueness, e.g. listing magnetic symmetry operations
> as spatial symmetry operations + magnetic symmetry operations might involve
> repeating symmetry operations. We have developed an approach in DDLm to
> handle this for expanding category keys (the _audit.schema data name) but
> dealing with this for an independent uniqueness attribute seems to be a bit
> messy and I don't really see the benefit of that extra definitional work.
In general, the uniqueness constraint seems like a useful feature to have
when curating data/constructing an ontologies. Most relational databases,
XML Schema and even the recently defined JSON Schema all have equivalent
constraints. I fully understand the fear that people will not track the removal
of such constraints across dictionaries. However, there is also no guarantee
that people will honour the '_audit.schema' data item. Hopefully, as long as
there are well-behaving open implementations of a DDLm validator, they can be
used as a reference by other programmers dabbling in DDL/CIF.
> The other thing I've pointed out is that ad-hoc uniqueness checks can be
> coded in dREL and placed in a dictionary of data names to be used for
> validation.
dREL is a powerful tool, but in this case it introduces slight complexity and
does not really solve the underlying problem. The dREL methods can still
(probably?) be overridden in other dictionaries and although the dREL method
delivers the desired final result, it does it in a slightly less standardised
manner. Reading a fixed tag/keyword is much simpler that automatically
analysing actual code.
I understand that there are probably not that many IUCr-curated data items
that would actually benefit from an additional uniqueness constraint so the
whole proposal indeed seems somewhat excessive. However, my proposal
> I believe this proposal is aimed at providing further checks in software for
> data names that are not category keys but are also supposed to be unique,
> the canonical example being symmetry operators. My objection is that expansion
> dictionaries can remove this uniqueness, e.g. listing magnetic symmetry operations
> as spatial symmetry operations + magnetic symmetry operations might involve
> repeating symmetry operations. We have developed an approach in DDLm to
> handle this for expanding category keys (the _audit.schema data name) but
> dealing with this for an independent uniqueness attribute seems to be a bit
> messy and I don't really see the benefit of that extra definitional work.
In general, the uniqueness constraint seems like a useful feature to have
when curating data/constructing an ontologies. Most relational databases,
XML Schema and even the recently defined JSON Schema all have equivalent
constraints. I fully understand the fear that people will not track the removal
of such constraints across dictionaries. However, there is also no guarantee
that people will honour the '_audit.schema' data item. Hopefully, as long as
there are well-behaving open implementations of a DDLm validator, they can be
used as a reference by other programmers dabbling in DDL/CIF.
> The other thing I've pointed out is that ad-hoc uniqueness checks can be
> coded in dREL and placed in a dictionary of data names to be used for
> validation.
dREL is a powerful tool, but in this case it introduces slight complexity and
does not really solve the underlying problem. The dREL methods can still
(probably?) be overridden in other dictionaries and although the dREL method
delivers the desired final result, it does it in a slightly less standardised
manner. Reading a fixed tag/keyword is much simpler that automatically
analysing actual code.
I understand that there are probably not that many IUCr-curated data items
that would actually benefit from an additional uniqueness constraint so the
whole proposal indeed seems somewhat excessive. However, my proposal
was more in the spirit of bringing the constraint set supported by DDLm
dictionaries closer to that of other popular schema/ontology formats and
in doing so make it more applicable in situations outside of the IUCr-curated
dictionaries.
Sincerely,
Antanas Vaitkus
--
room V325, Saulėtekio al. 7,
LT-10257 Vilnius, Lithuania
Antanas Vaitkus,
PhD student at Vilnius University Institute of Biotechnology,room V325, Saulėtekio al. 7,
LT-10257 Vilnius, Lithuania
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Adding a DDLm attribute for uniqueness (James Hester)
- Prev by Date: Re: [ddlm-group] Improving the enumeration_range definition.
- Next by Date: Re: [ddlm-group] Adding a DDLm attribute for uniqueness
- Prev by thread: Re: [ddlm-group] Adding a DDLm attribute for uniqueness
- Next by thread: Re: [ddlm-group] Adding a DDLm attribute for uniqueness
- Index(es):