Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Minor change to DateTime DDLm type

In looking back in revision history for the mmcif route.  It would appear that the flex date was added in 2007 - but hand only been used for _diffrn_detector.pdbx_collection_date.

If I am reading the flex regex correctly:

'[0-9][0-9][0-9][0-9](-[0-9]?[0-9])?(-[0-9][0-9])?(:[0-9]?[0-9]:[0-9][0-9])?'

Yes - the timestamp is optional -- But also the month and day.

Why is this?  My theory is that when time from data collection to structure solution took an extended time, depositors might not remember precisely when they collected their data.  Maybe the month, but not the day. I suspect that this was added to provide flexibility. The legacy PDB format requires the date of data collection in DD-MMM-YY - so some translation is involved.

By the time I started working at RCSB PDB in 2010, the processing procedures indicated that if depositors were unsure, they were suggested to "guess" - and fill in the middle of the month. Besides, when your were collecting data on a crystal for a week - what is meant by collection date - the start or the end date?

I will need to see if this was a short lived collection approach.  By the time OneDep deposition system was implemented in 2014, we were restricting input to require the month and day -- but no timestamp.

So - I would suggest that you should not put too much stock in the flex date.  It appears to have been introduced for a very specific purpose.  The yyyy-mm-dd:hh:mm and  yyyy-mm-dd formats are used throughout.  


On 5/8/25 10:55 PM, James H wrote:
CAM+dB2c-36SqKGmcGwE6KWu1f7255xL9Y9jKeWZQth8z2rMKCw@mail.gmail.com">
Thanks, Herbert, for alerting us to the fact that time types may be diverging between mmcif and core cif. We do indeed aim to keep meanings identical between DDL2 and DDLm.

There is only one dataname shared between mmcif and cif_core that specifies a time - that is "_audit_creation.date", which is a plain date in mmcif and a DateTime in cif_core. To be strictly compatible, the DDLm definition should specify a Date type. Apart from this we have no divergence in data names as far as dates/times are concerned (fortunately).

However, thinking about the DDLm and DDL2 approaches more generally:

mmcif/pdbx version 5 (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Data/index.html) defines two time types, "yyyy-mm-dd:hh:mm" and "yyyy-mm-dd:hh:mm-flex". The regexes for these differ from the text extract that Herbert posted:

(1) No 'T' character between the date and time is allowed in mmcif
(2) No seconds or timezone are allowed in mmcif
(3) Years may be two-digit, months may be one digit in mmCIF

So we have four timestamp types in the broader CIF universe:
(1) ISO8601   (DDLm DateTime)
(2) "Flexible" ISO8601 (corresponding to the IUCr text Herbert posted)
(3) mmcif hh:mm
(4) mmcif hh:mm-flex

As all of these have been around for a while and so are essentially impossible to remove, I suggest we simply define a new DateTime type in DDLm, called for example "DateTimeFlex", which matches the IUCr text given above. The translation between DDLm DateTimeFlex values and mmcif/pdbx time values is then one-to-one so that round-trip translation is possible, and the original issue is also solved. I'd also be keen to deprecate DateTime in favour of DateTimeFlex.

Thoughts?

On Fri, 9 May 2025 at 03:50, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
The DDL2 definition in the imgCIF dictionary is directly tied to that link.  If the DDLm type definition
is not tied to it, it should be.  The intention was for the DDLm and DDL2 versions of CBF to be
interoperable, and having a different meaning for datetime would seriously prevent such interoperability,
If we are going to have a meaningful standard, all the regex specifications used for the DDL2 CBF
specification should take precedence over whatever minor variants DDLm tried to create in general,
and absolutely in this case, since it has a direct impact on radiation damage calculations.

I recommend formal adoption of the 2007 definition as the current definition of datetime for all versions
of CIF.  It is a good, workable definition for crystallography.

On Thu, May 8, 2025 at 12:45 PM Antanas Vaitkus <antanas.vaitkus90@gmail.com> wrote:
Dear Herbert,

I was indeed not talking about the crystallographic data collection.
That is why I suggested not to change the existing datetime format,
but rather introduce a new one.

Thank you for the reference (https://www.iucr.org/resources/cif/spec/ancillary/datetime)
it is similar to what I would need except that:
- This description is not formally tied to a specific DDLm content type therefore
  there is no reliable automatic way to determine to which items these restrictions
  should apply. Currently, DDLm defines two similar time related content types:
  `date` which is an ISO date of the form YYYY-MM-DD, and `DateTime` which is either
  a YYYY-MM-DD date or a timestamp in the form defined by the full-date productions
  of RFC 3339 ABNF (which must always include the seconds, e.g. 2025-05-08T17:38:12+03:00).
  What I would like is to have a third date/time related DDLm content type (or relax one of
  the existing ones) with the semantics most similar to the ones described in the IUCr page
  that you linked that allow partial time. Having it formalised as a distinct content type would
  allow CIF validators that dynamically interact with dictionaries to check such values automatically
  based on the data item dictionary definitions.
- The date/time format convention that you referenced does not allow to both omit
   the seconds and still retain the timezone offset, e.g. 2025-05-08T17:38+03:00
   is not valid.

Sincerely,
Antanas

On Thu, 8 May 2025 at 18:59, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
I am lost.  I thought we were talking about the specific CIF data items in a crystallographic data collection
making the date/time as which each particular data frame was collected, which when we last defined it
was

CIF Date and Time

Many CIF data items take as value a date or a date and time (e.g. _audit_creation_date) or may include a date/time string as part of their expected content (e.g. _audit_update_record). The convention for expressing a date/time string is as follows, and is consistent with the ISO standard ISO 8601:1988(E). A unique instant in time may be defined by concatenating
  • a date string in the format YYYY-MM-DD, where YYYY represents the year number in the Occidental Gregorian calendar, MM is the (zero-padded) month number, and DD is the (zero-padded) day number
and optionally
  • the character "T" followed by a time in the 24-hour clock format hh:mm:ss, where hh, mm and ss are respectively the hour, minute and second, zero-padded as necessary
  • a plus or minus character, corresponding to time zone offsets respectively east and west of Greenwich, followed by the offset value in the format hh:mm (representing hours and minutes difference from Coordinated Universal Time)
Depending on the required precision of the date/time, the full string may be truncated from the right as appropriate.

Examples

1997-08-12T13:55:58-05:00
Four minutes and two seconds before two o'clock on the afternoon of 12 August 1997, at the latitude of Hamilton, Ontario (corresponding to supper time at Greenwich).
1997-08-12T13:55:58+05:45
Four minutes and two seconds before two o'clock on the afternoon of 12 August 1997, at the latitude of Kathmandu, Nepal
1997-08-12T13:55:58
Four minutes and two seconds before two o'clock on the afternoon of 12 August 1997, local time
1997-08-12T13:55
Five minutes to two, afternoon of 12 August 1997
1997-08-12
12 August 1997
Updated 12 August 1997

Copyright © 1997 International Union of Crystallography

IUCr Webmaster
========================================================================================================

Certainly if you are going to define the time for a cup of coffee you need much less precision than you need
to keep thousands of frames what were collected in a modest number of seconds in the correct order.  In
that case I believe it is both bad science and a very bad idea to record times to a low precision that may
jumble the order in which the frames were collected and mess up, say, radiation damage studies.  I believe
the current DDL2 definition is in this case, clear, accurate, and appropriate to the use proposed.  If you are
proposing something else, please state precisely what you are proposing to change to what with examples
that are appropriate to the use being discussed.

Notice the wording "Depending on the required precision of the date/time, the full string may be 
truncated from the right as appropriate."  Isn't that good enough to fit your coffee use case?  
For actual data collection, we may choose to specify the su or ESD, but with frame rates now headed higher 
and higher, I think encouraging imprecision in datetime stamps is a big mistake and we should use
whatever precision is available to us at each beamline.  The question is not one of elegance, but of
practical utility.
_______________________________________________
ddlm-group mailing list
ddlm-group@mailman.iucr.org
https://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group


--
Antanas Vaitkus,
Vilnius University,
Life Sciences Center,
Institute of Biotechnology,
room C521, SaulÄ—tekio al. 7,
LT-10257 Vilnius, Lithuania


_______________________________________________
ddlm-group mailing list
ddlm-group@mailman.iucr.org
https://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

_______________________________________________
ddlm-group mailing list
ddlm-group@mailman.iucr.org
https://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
_______________________________________________
ddlm-group mailing list
ddlm-group@mailman.iucr.org
https://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]