Previous part

From: Stuart Anderson <sba@srl.caltech.edu>
Date: Wed, 20 Aug 1997 10:46:12 -0700
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: bug
[-/+]

I am confused when people refer to Unix time or xntpd as being
UTC. On my Solaris box at least, the number of seconds returned
by gettimeofday() is not the actual number of seconds since
0h UT 1 Jan 1970

Stated another way, if I subtract the timestamp over a long period
of time (including a leap second) the answer is wrong.

Is this sort of behavior to be considered:

1) A bug in Solaris

2) A feature of the definition of Unix time as, "well sort of UTC"

--
Stuart Anderson   sba@srl.caltech.edu   PGP 2AA64B7D


From: Tom Lane <tgl@netcom.com> [-/+]
Date: Thu, 21 Aug 1997 04:35:26 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: bug
[-/+] configuration [-/+] update [-/+]

Stuart Anderson <sba@srl.caltech.edu> writes:
> Is this sort of behavior to be considered:
> 1) A bug in Solaris
> 2) A feature of the definition of Unix time as, "well sort of UTC"

Well, Solaris is not alone --- AFAIK no version of Unix has ever
accounted for leap seconds.

Even if the standard C library routines did know about leap seconds,
plenty of application programs do their own timekeeping calculations,
and most of 'em assume there are exactly 86400 seconds per day, every
day.  (I've written some myself :-(.  A common reason for doing so is
the standard library's lack of support for dealing with timezones
other than your own and GMT.)

And even if all the software were fully leap-sec-cognizant, you'd
still be dependent on the local sysadmin to update a configuration
table every time a new leap second is declared.

Bottom line is that properly handling leap second timekeeping is still
well beyond the average state of practice.  I don't think that Unix
is any worse off than any other OS in this area...

                        regards, tom lane


From: "Marc Brett" <Marc.Brett@waii.com> [-/+]
Date: 21 Aug 1997 09:59:52 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: bug
[-/+]

Stuart Anderson <sba@srl.caltech.edu> wrote:
> I am confused when people refer to Unix time or xntpd as being
> UTC. On my Solaris box at least, the number of seconds returned
> by gettimeofday() is not the actual number of seconds since
> 0h UT 1 Jan 1970

> Stated another way, if I subtract the timestamp over a long period
> of time (including a leap second) the answer is wrong.

> Is this sort of behavior to be considered:

> 1) A bug in Solaris

> 2) A feature of the definition of Unix time as, "well sort of UTC"

The correct answer is "2".

The classic definition was only ever valid for the short interval from
1970-01-01 00:00:00 UT to 1971-12-31 23:59:69 (sic) UT.

With the introduction of leap seconds on 1972-01-01 00:00:00 UTC,
the (new and improved!) Unix time scale actually started (with
time_t=0) at 1970-01-01 00:00:10 UTc, fully 10 seconds adrift.  Six
months later, the Unix time scale again changed and time_t=0 was set to
1970-01-01 00:00:11 UTc.  This pattern kept going, and so today
(1997-08-21) time_t=0 is 1970-01-01 00:00:31 UTc.

[I use the term UTc because I can't figure out if it shopuld be UT or
UTC, or indeed the even more ambiguous GMT.  UTC didn't exist before
1972, so it can't be used.  However, Unix time is an atomic (or at
least quartz oscillator) time scale, so UT, an astronomical time scale
defined by the earth's rotation, can't really be used.  Hence my
compromise, UTc].

Since 1970, there have been 22 different Unix time scales, each one
completely valid only for the interval between leap second events.
Most people just use the latest scale, which aligns perfectly with UTC,
and accept that timing events before the last leap second or after the
next one is either a) erroneous or b) complicated.

I hasten to add that all this is not necessarily a Bad Thing.  The
logistical problems of trying to stick narrowly to the original
definition of Unix time far outweigh the current confusion caused by
the fuzziness of what, exactly, is meant by "Unix time".

--
Marc Brett  +44 181 560 3160            Western Atlas
Marc.Brett@waii.com                     455 London Road, Isleworth
FAX: +44 181 847 5711                   Middlesex TW7 5AB    England


From: eggert@twinsun.com (Paul Eggert) [-/+]
Date: 22 Aug 1997 18:03:38 -0700
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: TAI
[-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes:

> The Olson code uses 1970-01-01 00:00:10 TAI, which is
> approximately 1.9 seconds different from 1970-01-01 00:00:00 UT.

To some extent we're talking angels-and-pinheads here, since in 1970
the Olson code didn't exist, nor did any POSIX.1 or Unix platforms
exist to run it on.  But the argument is pretty straightforward:
the traditional Unix origin was ``midnight GMT'' on that date, and
POSIX.1 clarified the ``GMT'' to mean UTC.  And it is somewhat useful
to have these definitions nailed down, so that POSIX.1 applications can
reliably interchange timestamps back to 1970.

>(You've already made clear that by ``UTC'' you mean UT for dates before
>1972. In any case, there are no standard time scales for which your
>statement is correct.)

No, actually, by ``UTC'' I meant what the time authorities usually mean
when they say ``UTC''.  The International Earth Rotation Service (the
body that decides when leap seconds occur) uses ``UTC'' to denote the
Coordinated Universal Time regime that was in place before 1972.  E.g. see:
http://hpiers.obspm.fr/webiers/general/earthor/utc/table1.html

Granted, today's UTC method differs somewhat from the pre-1972 method.
And I'll also grant you that timekeeping before 1972 was messier than
it is today.  But today's UTC method is not defined for times before
1972, and the only plausible way to interpret the POSIX.1 origin of
1970 is to use the UTC that was in effect in 1970.


From: eggert@twinsun.com (Paul Eggert) [-/+]
Date: 22 Aug 1997 18:38:38 -0700
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes:

>In practice, differences between time_t values are within +-0.1% of real
>time differences, even across leap seconds. Most hosts don't run NTP.

I disagree.  I think hosts that use POSIX.1-like time rules
fall into three categories:

1.  Hosts that run NTP and attempt to follow the POSIX.1 rules exactly.
    On these hosts, most time_t ticks are 1 second long, but
    when a leap second is introduced, the time_t tick is 2 seconds long.

2.  Hosts that use adjtime() to adjust their clock so that there are not
    big jumps in the clock.

3.  Hosts where people change the time by hand when it gets too far off.

I think the vast majority of such hosts fall into category (3).
On these hosts, difference between time_t values are not within +-0.1%
of real time differences when the clock jumps.  The +-0.1% property
does hold for category (2).  I agree that very few hosts are in
category (3) but in some sense the category-(3) hosts are doing the best job.

>you continue to talk about the POSIX time rules as if they were
>something more than a description of fundamentally flawed time-date code
>created many years ago by a few ignorant programmers.

I agree that there are many flaws in the POSIX time rules.  I would like
to have a better system in place.  This will take a lot of work, and
will require the cooperation of a lot of people.  One way to get this
cooperation would be to have a better system implemented, and to have
it used in a few popular applications.


From: eggert@twinsun.com (Paul Eggert) [-/+]
Date: 22 Aug 1997 23:03:47 -0700
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: broadcast
[-/+] TAI [-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes:

>Which makes more sense: adding leap-second support to a few routines
>like localtime() and mktime(), or adding it to thousands of routines
>that add and subtract time_t values?

Neither alternative is appealing; that's why few people do either,
even though the source code for leap-second localtime is freely available.
I've contributed a good deal of time to that effort, but it just
isn't catching on.  If we want leap second support to be widespread, we
have to come up with a better way.

Robin O'Leary's proposal of augmenting NTP to broadcast both TAI and
UTC is a promising way to attack this problem.  But we shouldn't link
this proposal to the idea of cramming leap seconds down localtime's
throat: that's a sure way to scare potential users.  We need a separate
interface to the leap-second-aware mechanism.

>In the real world, most UNIX machines _never_ have their software clocks
>readjusted.

Really?  I don't know about your neck of the woods, but that isn't true
around here.  It may take a few weeks or months, but eventually someone
gets annoyed with their box's clock being off (it breaks
`make'-over-the-network among other things) and they fix it with rdate
or call-the-phone-company-and-set-the-time.  Around here the only Unix
boxes that have _never_ had their clock readjusted are the boxes that
were recently unpacked.


From: vjs@calcite.rhyolite.com (Vernon Schryver) [-/+]
Date: 23 Aug 1997 10:16:59 -0600
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]

In article <5tlug3$1v6$1@shade.twinsun.com>,
Paul Eggert <eggert@twinsun.com> wrote:

> ...
>>In the real world, most UNIX machines _never_ have their software clocks
>>readjusted.
>
>Really?  I don't know about your neck of the woods, but that isn't true
>around here.  It may take a few weeks or months, but eventually someone
>gets annoyed with their box's clock being off (it breaks
>`make'-over-the-network among other things) and they fix it with rdate
>or call-the-phone-company-and-set-the-time.  Around here the only Unix
>boxes that have _never_ had their clock readjusted are the boxes that
>were recently unpacked.

There are also commercial UNIX systems that are shipped with `timed`
configured on by default precisely to deal with hassles such as `make`.

They may not amount to the majority.  Causing all of the systems on the
net to have the same time (+/- 10 ms) is not the same thing as causing
them all to be within 10 ms of what the Naval Observatory says, and so
might not qualify as whatever was meant by "readjusted."

Vernon Schryver    vjs@rhyolite.com


From: Achim Gratz <gratz@ite.inf.tu-dresden.de> [-/+]
Date: 23 Aug 1997 11:22:11 +0200
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes:

> In the real world, most UNIX machines _never_ have their software clocks
> readjusted. The software clock is set from a hardware clock at boot time
> and then runs without outside intervention.

That is not true.  After all, why would tickadj need the -s option if
it were?  Anyone doing network makes would care about the time also.
That's why I got interested in NTP, anyway.

Achim Gratz.

--+<[ It's the small pleasures that make life so miserable. ]>+--
WWW:    http://www.inf.tu-dresden.de/~ag7/{english/}
E-Mail: gratz@ite.inf.tu-dresden.de
Phone:  +49 351 463 - 8325


From: "L. F. Sheldon, Jr." <lsheldon@creighton.edu> [-/+]
Date: Sat, 23 Aug 1997 20:45:36 -0500
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: SNTP
[-/+]

On 22 Aug 1997, Paul Eggert wrote:

> djb@koobera.math.uic.edu (D. J. Bernstein) writes:
[snip]
> >In the real world, most UNIX machines _never_ have their software clocks
> >readjusted.
>
> Really?  I don't know about your neck of the woods, but that isn't true
> around here.  It may take a few weeks or months, but eventually someone
> gets annoyed with their box's clock being off (it breaks
> `make'-over-the-network among other things) and they fix it with rdate
> or call-the-phone-company-and-set-the-time.  Around here the only Unix
> boxes that have _never_ had their clock readjusted are the boxes that
> were recently unpacked.

Just a note from a luker--there is a chance that y'all underestimate the
the penetration of NTP and SNTP in the world.  I would guess that a word
like "most" is still strictly speaking correct--but the trend around
here seems to be to use some form of time-setting.  All of the unix
boxes I administer and a non-trivial number of the ones I don't, plus
a very large number of the PC's and "minor" servers in our world use
NTP or SNTP time-setting-and-keeping.

As interconnections, distributed computing, and such continue to proliferate
I think it will become ever more common to see some sort of synchronization
in use (and I didn't think to count the number of "isolated" machines I
know of that use some sort of dial-up-for-the-time gadget).

I will agree that it will probably be one or two more days before many of
use care about, much less understand the TAI-UT-UTC-GMT issues.
--
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
.                                                                       .
- L. F. (Larry) Sheldon, Jr.                                            -
. Unix Systems Administration                                           .
- Creighton University Computer Center-Old Gym                          -
. 2500 California Plaza                                                 .
- Omaha, Nebraska, U.S.A.  68178       We are all faced with            -
. lsheldon@creighton.edu                  great opportunities           .
- 402 280-2254 (work)                  brilliantly disguised as         -
. 402 681-4726 (cellular)                 impossible situations.        .
- 402 332-4622 (residence)                                              -
.                                           Bits and Pieces             .
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-


From: djb@koobera.math.uic.edu (D. J. Bernstein) [-/+]
Date: 23 Aug 1997 16:40:10 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]

Paul Eggert <eggert@twinsun.com> wrote:
> Neither alternative is appealing;

Leap-second support has to go somewhere.

The first alternative---putting it into localtime() and mktime()---is
already available in millions of computers.

The second alternative---putting it into every routine that handles
time_t numerically---is, by all reports, available in zero computers.

> that's why few people do either,

No. Very few people have ever considered the issue. The truth is that
one person, Olson, made the decision, under pressure from some zealots.
Don't blame the users for a decision that they never made.

---Dan
Set up a new mailing list in a single command. http://pobox.com/~djb/ezmlm.html


From: nmm1@cus.cam.ac.uk (Nick Maclaren) [-/+]
Date: 23 Aug 1997 19:33:17 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: TAI
[-/+]

In article <1997Aug2317.32.26.15261@koobera.math.uic.edu>,
D. J. Bernstein <djb@koobera.math.uic.edu> wrote:
>The reason that people like time_t=TAI is that it actually works. It
>offers simple yet accurate relative timings, no painful clock jumps, and
>code already written for accurate local-time display.

Yes, but it is also 10-20 seconds different from what virtually every
Unix system currently puts in time_t timestamps.

>In contrast, time_t=POSIX-function-of-UTC will _never_ offer accurate
>relative timings. A few NTP proponents talk about the semantics that
>they claim they're guaranteeing, but nobody is willing to start the
>massive conversion project required for time_t manipulation code to
>actually support those semantics.

Your first statement is misleading.  There is no difficulty in using them
to provide accurate relative timings, and it needs only a minor definition
of how leap seconds are included.  What it does not provide is the ability
to have consistent relative INTERVALS.

But your last statement applies to any proposal to convert time_t values to
TAI, redoubled in spades.  Replacing NTP by a TAI+leap_count value is easy,
and could be done semi-transparently, but the next stage is a nightmare.

Nick Maclaren,
University of Cambridge Computer Laboratory,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email:  nmm1@cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679


From: cbbrowne@news.brownes.org (Christopher B. Browne)
Date: 24 Aug 1997 01:46:18 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: TAI
[-/+]

On 19 Aug 1997 01:27:44 GMT, D. J. Bernstein <djb@koobera.math.uic.edu> posted:
>Marc Brett <Marc.Brett@waii.com> wrote:
>> UTC is legal civil time in most of the world,

>False. Civil time is several hours away from UTC in most of the world.

>> Computers are tools
>> which interface with peoples lives, and should, at least externally,
>> keep the same time as they do: UTC.

>False. Computers should indeed display human-comprehensible times---but
>that doesn't mean UTC. It means local time.

And what if I'm in Dallas, and the computer is in California?

Which time should the computer display?

This is not an academic question by any means.  I live in Texas, and
work with servers that are all sited in Oklahoma.  Users are scattered
across the planet, and *will* be connecting to the servers in Oklahoma.
It's quite probable that such information will be presented in terms
of CDT/CST.  To *most* users, that indeed is "local" time (because most
of the users will indeed be in Fort Worth, Texas, because head office
is there, and thus the bulk of the administrative users...).
Nonetheless, for those that aren't local, what's the valid time?

However, there are other servers in the same complex for which the vast
majority of users are *NOT* in the same time zone.  The STIN servers are
*probably* accessed by folks in every single time zone that there is.
Internally, the system has to use a single time, and that will by
necessity not be the same time that users think the time is in their
locale.

In effect, there *does* need to be some internal representation.
So long as libraries can handle it nicely, it doesn't much matter
if we're using UTC, GMT, TAI, or something else folks might think
up.

But it *is* preferable if the internal representation has a fairly
high degree of "universality."

>> Translating from an internal representation of TAI to an external
>> representation of UTC adds an additional layer of complexity

>False. The layer already exists; it is called ``localtime()'' and
>``mktime()''. Converting from TAI to UTC is merely the first step in
>converting from TAI to local time.

But if localtime() and mktime() think that the "true" time is
represented in UTC, then this *does* complicate things.

- Existing file systems and many utilities use UTC
- Existing programs treat the "base" representation as being UTC,
and use functions like localtime() and mktime() to manipulate UTC
values.

I agree that the *intent* of localtime()/mktime() are as you suggest;
making usage of a newer representation (such as TAI) universal would be
rather disruptive.  Perhaps "necessary disruptive," but nonetheless
disruptive.
--
Christopher B. Browne, cbbrowne@hex.net, chris_browne@sdt.com
PGP Fingerprint: 10 5A 20 3C 39 5A D3 12  D9 54 26 22 FF 1F E9 16
URL: <http://www.hex.net/~cbbrowne/>
Bill Gates to his broker: "You idiot, I said $150 million on **SNAPPLE**!!!"


From: djb@koobera.math.uic.edu (D. J. Bernstein) [-/+]
Date: 24 Aug 1997 03:29:12 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: TAI
[-/+]

Christopher B. Browne <cbbrowne@hex.net> wrote:
> - Existing file systems

Existing file systems use 4-byte times and will soon have to be fixed.
The switch to 8-byte times is a good moment to specify TAI.

> - Existing programs treat the "base" representation as being UTC,

Nonsense. Thousands of programs treat time_t differences as real-time
differences. Leap-second support in time_t manipulations is an absurd
fantasy.

In contrast, conversion from time_t to struct tm is rather well
modularized.

> And what if I'm in Dallas, and the computer is in California?

Set your TZ environment variable to whichever time zone you want to see.
What's the problem?

---Dan
Set up a new mailing list in a single command. http://pobox.com/~djb/ezmlm.html


From: vjs@calcite.rhyolite.com (Vernon Schryver) [-/+]
Date: 24 Aug 1997 10:41:35 -0600
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: configuration
[-/+] SCO [-/+] SNTP [-/+]

In article <Pine.HPP.3.95.970823203721.20839D-100000@bluejay.creighton.edu>,
L. F. Sheldon, Jr. <lsheldon@creighton.edu> wrote:

> ...
>> >In the real world, most UNIX machines _never_ have their software clocks
>> >readjusted.

> ...
>Just a note from a luker--there is a chance that y'all underestimate the
>the penetration of NTP and SNTP in the world.  I would guess that a word
>like "most" is still strictly speaking correct--but the trend around
>here seems to be to use some form of time-setting.  All of the unix
>boxes I administer and a non-trivial number of the ones I don't, plus
>a very large number of the PC's and "minor" servers in our world use
>NTP or SNTP time-setting-and-keeping.
> ...

Anything that requires manual intervention and is not absolutley positively
required for day-to-day operation is not done on the vast majority of
boxes, whether PC's or UNIX boxes.  If you have only a few dozen or a few
hundred systems, you might have the provincial notion that most systems
are reasonably, responsibily, or competantly administrated.  Once you see
real networks with 10,000 or more systems, you learn better.  Even in
outfits with large staffs of people who do nothing but fiddle with the
configuration of stuff (i.e. system or network administrators), anything
that is not absolutely required is either not done or done wrong (and
never mind about a lot of stuff that is absolutely required).  To put it
another way, DHCP and BOOTP and the Microsoft equivalents are very popular
for good reasons.

>As interconnections, distributed computing, and such continue to proliferate
>I think it will become ever more common to see some sort of synchronization
>in use (and I didn't think to count the number of "isolated" machines I
>know of that use some sort of dial-up-for-the-time gadget).

How many applications really care whether the time on all of the involved
systems are closer than you can get by checking Mickey's hands and typing
a little every few months?  There are a few, such as `make` over NFS, but
they are very rare compared to the vast numbers of others.  (Never mind
that people who frequent this newsgroup are likely to care about `make`
and NFS.)  Most distributed applications have some notion of "server,"
and the only ticks that matter are those that are counted by the "server."

Given those facts, ask yourself:
  1. how many systems come configured to use NTP or SNTP 'out of the box'?
  2. how many applications care whether they keep time to better than
      10 minutes of UTC?
  3. how many boxes are running such applications?

Because of the answers to those questions, to all intents and purposes,
0.0% of all systems are running NTP or SNTP.  For that matter, less than
1% of all UNIX systems are running NTP or SNTP.

My daytime employer, one of the major UNIX box vendors, and SCO have long
shipped `timed` (NSP) turned on by default.  That means that perhaps as
many as 20% of all UNIX boxes are running `timed`.  (I continue to fight
the good fit to keep my employer's sytem shipping a clock protocol on by
default.)  NSP is a somewhat lame protocol (i.e. a Bezerkley masters thesis
of ~10 years ago).  It keeps boxes on a LAN to better than 50 ms of the
consensus tick, but worries such as TIA vs. UTC vs. Mickey's hands are
far out of its range.

>I will agree that it will probably be one or two more days before many of
>use care about, much less understand the TAI-UT-UTC-GMT issues.

If you sat down to write a distributed application, would you rely upon
the computers counting ticks in unison?  Or would you try things, such as
distributing the clock yourself, or arranging to not need to distribute
the clock?  If you require a good distributed clock, then your application
will not work in as many places as it would otherwise (e.g. the at least
80% running neither NTP or TSP), and it will have failure modes it would
not otherwise suffer (e.g. when some systems stop agreeing on the number
of relevant leap seconds since the Tuesday before last).

In other words, all of this talk about TIA, UTC, and so forth is of vital
importance to us clock watchers, but an appropriate subject for ridicule
outside this asylum.

Vernon Schryver    vjs@rhyolite.com


From: Tom.Horsley@worldnet.att.net (Thomas A. Horsley) [-/+]
Date: 24 Aug 1997 08:45:10 -0400
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: TAI
[-/+]

>Clearly you have never been involved with organising such a changeover.
>I have, both on a single-site academic and single-vendor commercial scale.
>Nightmare is a mild word to describe the horrors involved.

Sure I have - it works great if you aren't in a hurry. It goes like this:

* The folks who want the kernel upgrade jump up and down and hold their
  breath until they turn blue.
* The folks who are afraid of instability resist, and since there are
  more of them (and some of them are the ones with money), they win.
* In 10 or 15 years, even the cheapest and most paranoid organization
  finally junks its old obsolete machines and buys new ones.
* When they get new machines, they come with the new kernel that keeps
  internal time based on TAI.

See - it only takes a couple of decades :-).

>And changing every damn Unix and Internet protocol, system and utility is
>an order of magnitude more ghastly than anything I have ever got involved
>with.  I know what to do, and the first step is to become world dictator.
>The second step is harder.

But all the protocols and utilities work the exact same way they always did.
That's the point of having the kernel keep TAI internally but convert it to
POSIX consistently on every existing kernel call which talks about time.
New utilities which use the new system service calls and talk about TAI can
come later and gradually on a per-machine basis as any machine (say the one
controlling the automated observatory for instance) needs that much
fanaticism.

Of course you *do* need to become world dictator in order to force the
operating system vendors to change their kernels to keep TAI internally :-).

--
>>==>> The *Best* political site <URL:http://www.vote-smart.org/> >>==+
      email: Tom.Horsley@worldnet.att.net icbm: Delray Beach, FL      |
<URL:http://home.att.net/~Tom.Horsley> Free Software and Politics <<==+


From: vjs@calcite.rhyolite.com (Vernon Schryver) [-/+]
Date: 24 Aug 1997 16:32:07 -0600
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: delay
[-/+] resolution [-/+] TAI [-/+]

In article <1997Aug2420.38.00.26956@koobera.math.uic.edu>,
D. J. Bernstein <djb@koobera.math.uic.edu> wrote:

>> First, most people don't like time_t=TAI.
>
>Most people have never heard of TAI or UTC.

Yes, and most people have no reason to care.

>                                             hey want the displayed time
>to reliably match local time.

Yes, and they define "reliably" as "within a dozen seconds," and could
not care less about irregular 1-second hiccups at midnight once in many
months.

>                              They also want a ``1-second'' pause to
>always mean 1 second.
>
>How do you plan to meet these requirements?
>
>There are tens of thousands of programs that subtract time_t values to
>time themselves. Some packages use these timings to select the fastest
>code for each machine at installation time.
>
>Would you like these programs to fail, perhaps disastrously, if they
>happen to be run during a leap second?

In any real operating system, you can only delay for at least one second,
one tick or one something else.  You can not delay for "no more than
one second."  Any program that might "fail, perhaps disastrously" if it
happened to be delayed an extra second either requires a real time
operating system or is silly junk.   (Think about contention for the CPU.)

The probability of a bad result from hardware timing code that gets the
wrong answer due to a leap second hiccup and that works otherwise is zero
a lot of significant digits.  The probability of the midnight July or
January second is involved is around 1 in 15552000, assuming that people
are equally likely to be installing stuff at midnight as any other time.
Never mind that one of the midnights is widely observed holiday.

Who would use gettimeofday() to measure hardware speed when times() is
also available?  If you need to measure a duration, and you know a little
of how time keeping works, you do not use gettimeoday().  In real systems,
consecutive calls to gettimeofday() does not give true time hardware
because:

  - adjtime() (not to mention settimeofday()) is continually jiggering
    the gettimeofday() answer.  The goal of protocol for which this
    newsgroup is named is to make gettimeofday() yield an answer close
    to what the atomic clocks say, and toward that goal we have abandoned
    the notion that the difference two answers produced by gettimeofday()
    separated by 1.0 seconds is necessarily more than 0.97 seconds and
    less 1.03 seconds.  (Or whatever your system's maximum adjtime()
    slew-rate.)

  - many systems fiddle with the microseconds from gettimeofday() to give
    the false but desirable illusion that the hardware clock has
    resolution finer than milliseconds.

  - times() yields hardware ticks regardless of leap seconds, settimeofday(),
    and adjtime() (at least in the systems I know a little about).

> ...
>>  A. It's incompatible with all the existing hosts out there, so one can't
>>     interchange data (e.g. `tar' files) reliably; and
>
>Silly argument. The timestamps in tar files available around the net are
>wildly unreliable right now. Anyway, the tar format uses 4-byte times,
>so it has to be upgraded soon.

While worrying about whether TAR timestamps to within a dozen seconds is
silly, at least as silly is talk about 4-byte times that must "be upgraded
soon," unless you have an unusual notion of "soon."  While we all plan to
be arround in the 22nd century, but few of us will be.

Could this angels-on-a-pin stuff be taken away?

The facts, unchangable by wishing, flaming, or arguing, are that:

  1. it would have been nice if UNIX/POSIX had dealt with leap seconds.

  2. it/they did not and do not.

  3. Anyone who thinks that leap seconds might be a noticable POSIX
      hassle has a serious shortage of experience when it comes to dealing
      with the POSIX compliance song and dance.  Any system has far bigger
      problems with POSIX and hassles with the POSIX test suites than leap
      seconds, including concerning the answers that gettimeofday() yields.

  4. there are some, but only a few applications that care about leap seconds.
      (and software or hardware installion are not among them)

  5. no application that is otherwise usable is going to "fail disasterously"
      regardless of leap seconds.  Leap seconds are simply one of many
      causes for clocks to jerk around, and any usable application must
      be prepared to deal with strange hiccups, including time that seems
      to go backwards because a time daemon realized that the operator
      set things wrong last week.

  6. teaching POSIX about leap seconds is not going to happen this
      year nor probably this century.

  7. practically no one cares about #6.

About #5--several times this year, many of the clocks at my day job have
jumped 3600 seconds into the future for about 40 minutes, and then jumped
back.  It seems that the old WWV receiver used to sync much of the net
has occassionally hiccuped.  The chaos a single such hiccup produces dwarfs
the sum of all of the leap seconds hassles there have ever been or ever
will be.

Vernon Schryver    vjs@rhyolite.com


From: eggert@twinsun.com (Paul Eggert) [-/+]
Date: 24 Aug 1997 17:46:05 -0700
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: Linux
[-/+] TAI [-/+]

Tom.Horsley@worldnet.att.net (Thomas A. Horsley) writes:

>It would be entirely feasible for kernel code to keep TAI internally and
>convert it to POSIX time when folks use the existing system services. Then
>you just need to provide *new* system service calls for TAI, and library
>routines which use them.

An approach like this would take 20 years or so, but I think it'd work.
Vendors would probably resist it, because they don't want the hassle
of maintaining the leap second info.  However, the BSD and/or Linux
crowds could start the ball rolling.  We'd also need a consensus on how
to handle timestamps in the future, and timestamps before 1972.


From: eggert@twinsun.com (Paul Eggert) [-/+]
Date: 24 Aug 1997 23:39:56 -0700
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Building a better time protocol using TAI
[-/+]
X-Keywords: TAI
[-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes:
>Paul Eggert <eggert@twinsun.com> wrote:
>> Some commercial vendors have even removed support
>> for time_t=TAI -- it's not shipped with SunOS 5, for example,
>It was Guy Harris who took the code out.
>He later admitted that he should have left it in.

Both items are news to me.  But regardless of who took out the code,
such decisions are not made in a vacuum.  It's plain that Sun's
customers didn't use time_t=TAI; otherwise it would have not been
removed, or at least someone would have complained and the code would
have been put back in.  So my point that time_t=TAI is rarely used is
still valid.

>>  A. [time_t=TAI is] incompatible with all the existing hosts out there,
>>     so one can't interchange data (e.g. `tar' files) reliably; and
>The timestamps in tar files available around the net are
>wildly unreliable right now.

Yes, but that's irrelevant.  My point was that time stamps should be
interpreted consistently, independently of how inaccurate they are.
For example, programs' output can contain human-readable forms of
`tar' time stamps, and if such output differs from host to
host this will make it harder to do regression tests.

>>  B. One can't convert future times reliably;
>Leap seconds are no different from time zones in this
>respect. Have you not noticed that the authorities in some countries
>change their time zone laws more often than you change your watch?

Yes, I have noticed it.  But the problem you mention is not limited to
future times; it also applies to past times.  For example, what was the
local time in Brussels when the Armistice ended World War I?  Different
sources report different values, and the source that should be
authoritative (namely, the Annuaire de L'Observatoire Royal de Belgique)
is ambiguous.  So it's quite possible for different implementations to
report different answers to questions about times in the past.
(This problem is not restricted to times far in the past --
I mention the Brussels example merely because it is
the one that crossed my desk most recently.)

So I agree that local times cannot be converted reliably in general.
However, there are many important special cases where it's quite
practical to convert local times accurately.  For example, even though
I'm not 100% sure, I have a very, very high degree of confidence as to
when 2001-01-01 00:00:00 (local time) will occur in Tokyo.  Frankly, if
my time primitives insisted on TAI and refused to convert this future
time stamp, I would chuck them and find a better set of time primitives.

Let's put it this way: the time wizards have changed our timekeeping
basis several times in the past 100 years, but Tokyo's New-Year's-Day
offset from Greenwich hasn't changed even once.  I'm more confident of
that offset not changing than I am of our time basis not changing!

>> Second, time_t=TAI does not ``actually work'' in general, because it
>> is undefined for old timestamps.
>This is already solved by my TAI64 format, which uses the TAI second and
>a particular TAI epoch but then defines its own timestamps

How does your code handle time stamps in, say, 1918?
TAI isn't defined for time stamps that old.  It sounds like you've
invented your own TAI-like time scale for back then, but you haven't
explained how it actually works.

>for a span of a few hundred billion years.

Hmmmm.  How do you define TAI that far back?
TAI is defined with respect to sea level,
and sea level hasn't existed for that long.
There are also some interesting theoretical problems once you
go back before the Big Bang....

>[People] want a ``1-second'' pause to always mean 1 second.

True, but that's irrelevant to the current discussion.
The traditional Unix primitives have never given you a reliable
way to sleep exactly 1 second, so leap second glitches
are not an undue burden here.

>There are tens of thousands of programs that subtract time_t values to
>time themselves. Some packages use these timings to select the fastest
>code for each machine at installation time.

This is a weak, weak argument for a time_t=TAI.  Any such program, if
written naively, will choose suboptimally for many reasons other than
leap seconds.  Leap seconds are well down in the noise for this
particular problem.  Besides, any program that wants to do a reasonable
job of this ought to be taking the median of several timed runs, and in
that case any leap second glitches should wash out.

>> Most practical applications deal with UTC, not TAI,
>... Almost every time display is local time.

The _display_ may be local time, but the _applications_
deal with UTC internally.  Many applications do things like add 3600
seconds to get to the next hour; this breaks with time_t=TAI.

But we're straying from my point, which was that UTC is much more
commonly used than TAI, even if only input/output purposes are considered;
so even if one really wants a TAI time_t for modern time stamps, it
makes sense to cater to UTC-using applications when deciding the
historical dividing line between a UTC time_t and a TAI time_t.

>> most people don't like time_t=TAI.
>Most people have never heard of TAI or UTC.

True; I should have written ``most people who have considered the issue
don't like time_t=TAI''.  People have voted with their feet.


From: bwb@etl.noaa.gov (Bruce Bartram 303-497-6217) [-/+]
Date: 19 Aug 1997 22:20:41 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Request for help setting up a stratum-3 machine
X-Keywords: TAI
[-/+] timezone [-/+]

Terrence Brannon (brannon@quake.usc.edu) wrote:
: Thanks for the help. However, the stratum-3 machine seems to be
: exactly 3 hours behind the stratum-2 machine... the timezone is
: different.

Howdy,

Each process on a un*x machine has its own timezone in the environment
variable TZ.  If there is no TZ, a host-wide default should apply.
The host clock runs in UTC, as does NTP and xntpd (ignoring the
important "future" thread -- I like TAI and UTC).

Try this Terrence,  for the sh (or bash) shell,
  sh   (with default $ prompt)
      $ echo $TZ
      $ TZ=US/Pacific date
      $ TZ=US/Hawaii date
      $ TZ=GMT date
If you use the csh shell, type "sh" or "bash" to get a Bourne shell.

You should see the different timezones and offsets.

If you don't get sensible answers, you may have problems with the
/usr/share/lib/zoneinfo directory and its files.  The little files
are binary output from the "zic" program ("man zic" is interesting,
if a bit arcane).  The files with names like "northamerica" are long
text files with the timezone rules.  Feeding these into zic makes
the binary files.  There are a few strangenesses with the POSIX sign...

If your answers make sense and you need to change your personal
TZ, add a line like "setenv TZ US/Pacific" to your .cshrc or .login,
or "TZ=US/Pacific" to your .profile file.

To change the system default timezone, I think the magic is in the
/etc/TIMEZONE, /etc/default/init or /usr/share/lib/zoneinfo/localtime
files.  These are system specific.  On my Solaris 2.5 system, the
magic is in the last line of /etc/TIMEZONE, and needs to be
"TZ=US/Pacific" like the .profile line.  I think a reboot is needed
to make this take effect.  On a SunOS 4.1 system, I think the
technique is to make /usr/share/lib/zoneinfo/localtime
be the desired zic output file.  I'd suggest (as root of course,
and with all the warnings about how this might be VERY BAD ! and
you must understand what all this will do before executing it):
      # cd /usr/share/lib/zoneinfo
      # ln -s US/Pacific localtime
but I've never tried this.  I suggest that the default timezone
should be set to the local wall clock time, so a casual "date"
is wall clock.  I think this makes logfiles and sendmail headers
easier to read.

I've seen a host with badly mangled junk in /usr/share/lib/zoneinfo.
On that system, I made my own timezone files and have
  setenv TZ $HOME/timezone/US/Mountain
to ignore the mangled stuff.

Feel free to email me directly if you want more help.

Bruce Bartram     bbartram@etl.noaa.gov    just another chimehead


From: "Doug Hogarth" <DougHo@niceties.com> [-/+]
Date: 23 Aug 1997 15:17:33 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: Timeserv on NT is using up CPU time
X-Keywords: bug
[-/+] USNO [-/+]

See the flash at the top of my http://home1.gte.net/dougho/TimeServ.html
In short, you were using Type=Internet instead of Type=NTP, and when the
USNO discontinued their non-NTP services around the end of 13 August, a bug
in my program occured.
Sorry for any inconvenience.

John C. Binder <jbinder@s-vision.com> wrote in article
<01bcad0c$fa2ae820$78ad6ccf@jbinder2>...
> We have been running Timeserv on an NT 4.0 Server for almost a year. Last
> week the timeserv started to go haywire. It started using up CPU cycles
to
> the point that the processor was running at 100%. I have had to shut it
> down.
>
> We were pointing to the USNO. Does anyone have an idea what is happening?
> Is anyone else having a similar problem. I have used other machines and
got
> the same result.


From: "Marc Brett" <Marc.Brett@waii.com> [-/+]
Date: 21 Aug 1997 10:16:52 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: NTP version 2
X-Keywords: documentation
[-/+] fudge [-/+] precision [-/+] prefer [-/+]

C.S.Transport <hk3705@usa.net> wrote:
> Howdy,

> I use some Digital AlphaServers with Digital Unix 3.2 C. This is
> delivered with NTP version 2 (according to ntpq: ntpversion).
> I would like to use the equivalent of the following ntp.conf file
> (version NTP 3) for the version 2:

> server 127.127.1.1 prefer
> fudge 127.127.1.1 stratum 1

> Any idea on how to "translate" that?
> (This would be to prevent NTP from adjusting the local clock of
> the server).

It's important to distinguish between NTP versions 2 & 3 -- the
protocols, and xntp versions 2.x & 3.x -- the implementations.
Configuration files have nothing to do with the NTP protocol,
only the software.

The best place to look would be the Digital documentation.

That said, I know that early versions of xntp 3.x had a different syntax
to the current releases, namely:

        server 127.127.1.10

                where 10 would be the stratum.

Perhaps this might work with your software?

Incidentally, it is a Bad Idea to set the local clock to such a low
stratum as 1.  It implies more precision than is actually being
delivered.  If you ever get a radio clock or some other low-stratum
clock on the Internet, you will want it to be used in preference to the
less accurate local clock.  Stratum numbers will allow clients to
discriminate.

Also, the "prefer" keyword mucks up the clock selection algorithm, and
should be used only in exceptional circumstances.

--
Marc Brett  +44 181 560 3160            Western Atlas
Marc.Brett@waii.com                     455 London Road, Isleworth
FAX: +44 181 847 5711                   Middlesex TW7 5AB    England


From: robin@acm.nospam.org (Robin O'Leary) [-/+]
Date: 22 Aug 1997 22:02:25 +0100
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: TAI, UTC and POSIX-time
X-Keywords: TAI
[-/+]

Thanks to the many people who have emailed and posted comments that
helped me get this clear (I hope):

International Atomic Time (TAI) counts standard seconds since a
well-defined epoch (in 1972) and names them with successive integers,
0, 1, 2, 3, etc.; there are no minutes, days, months or years in TAI.
A sufficiently stable oscillator can keep TAI time indefinitely without
outside involvement.

Univeral Co-ordinated Time (UTC) names the same seconds in the
conventional YYYY-MM-DD HH:MM:SS manner and it has the occasional hiccup
in numbering (23:59:60) that we call positive leap-seconds.  It isn't
meaningful to talk of ``UTC seconds'' in contrast to ``TAI seconds''
since there is a perfect 1:1 correspondence between seconds in both
systems; the only difference is how they are labelled.  To map between
historic TAI and UTC requires the use of a table of leap-seconds; this table is
being lengthened continually by the International Earth Rotation Service).

POSIX-time, or ``seconds since the epoch'' as they define it, is TAI
minus the number of complete positive leap-seconds in UTC up to that point
(plus the number of negative leap-seconds, but there haven't been any).
Unfortunately, this means that some seconds (the positive leap-seconds)
can't be given unique POSIX-time labels; such a second gets the same
label as its immediate successor.

For example:
        TAI     598 599 600 601 602
        UTC     :58 :59 :60 :00 :01
        POSIX   598 599 600 600 601

What we are having trouble with isn't really UTC at all---UTC ticks in
synchronisation with TAI quite happily---but with POSIX-time.  POSIX-time
is what you get if you convert UTC to an integer using the POSIX function:
        tm_sec + tm_min*60 + tm_hour*3600 + tm_yday*86400 +
            (tm_year-70)*31536000 + ((tm_year-69)/4)*86400
which leaves out all leap-seconds (and the leap day in the year 2100).

NTP currently distributes POSIX-time together with a leap-second warning
flag.  This is better than POSIX-time alone, since it is possible to
keep the warning flag and use it to disambiguate the POSIX-times around
leap seconds.  But what would be even better would be for NTP also to
convey the number of leap-seconds since the epoch; that value could
then be added to POSIX-time to get TAI.  What would be equivalent, but
philosophically best of all, would be for NTP to transmit TAI together
with the number of leap-seconds.

I am proposing that by one means or another, NTP is augmented so that
it provides enough information to enable an NTP client to report current
TAI, UTC and POSIX-times.

Robin O'Leary.
--
<robin@nospam.acm.org>  +44 973 310035  P.O. Box 20, Swansea SA2 8YB, U.K.


From: bpenrod@nbn.com [-/+]
Date: 24 Aug 1997 02:01:00 -0400
[-/+]
Newsgroups: comp.os.os2.announce,comp.protocols.time.ntp,comp.os.os2.misc,comp.os.os2.networking.tcp-ip,comp.os.os2.scitech
Subject: WARNING: To Users of OS2_NTPD, Network Time Protocol Client for OS/2
X-Keywords: poll
[-/+] release [-/+] TrueTime [-/+]

~Reply-to:     bpenrod@truetime.com
[Followups directed to comp.protocols.time.ntp]
---------------------------------------------------------------------
WARNING:  To Users of OS2_NTPD, Network Time Protocol Client for OS/2

This announcement has been made necessary due to changes in the operation of
the base clock drivers for OS/2:  clock01.sys and clock02.sys.  Coincidental
to the final release of Netscape for OS/2 back in October of last year, these
new drivers first appeared in the multimedia plug-in pak for Netscape which
was available for download along with the final version of Netscape.

Since then, these new drivers have been incorporated into the Fixpaks for both
Warp V3 and V4, starting with FP26 for Warp V3 and FP1 for Warp V4.  These new
drivers have seriously affected the operation of the timekeeping functions
used by OS2_NTPD for timetagging NTP packet requests and replies as well as
for performing sub-second level adjustments to the Real Time Clock.  While
running with these drivers, OS2_NTPD is able to maintain the system clock
accuracy only to the one-half second level.  For users who are unsure about
the Corrective Service Level of their systems, the symptom of the problem
caused by the new clock drivers is the repeated display of this message in the
status area of the OS2_NTPD window:

"Unable to perform Clock Adjustment now, interrupted--Sleep Error = xx"

where 'xx' is some number of milliseconds.

The purpose of this announcement is to warn users who are operating with the
new, problematic clock drivers that a side effect of the decreased performance
is a dramatic INCREASE in the normal polling rate of the NTP servers.  This is
a result of the inability of the program to pull the system clock close enough
to allow the polling interval to be extended.  As a result, the program
continues to poll at the initial interval (default is 16 seconds)
indefinitely. This has caused some concern at some of the more well known
public NTP servers like those at the US Naval Observatory, tick and tock.
These servers must process thousands of NTP packet requests per day.  I have
been informed by the operator of those servers that "NTP hogs" will be
selectively filtered from access to these servers.

I recommend either of two strategies for fixing this problem:

1.  Replace the clock01.sys and clock02.sys files in your os2\boot directory
    with the originals from your distribution disks or CD.  Unless you are
    using multimedia, I believe there are no benefits to the new drivers.

2.  If you must operate with the new drivers, edit the cfgdata file in the
    os2_ntpd directory so that the initial polling interval is greater than or
    equal to 64 seconds.  The default in the distribution zipfile is set to 16
    seconds.

In the meantime, work is being done to modify OS2_NTPD so that it will work
with the new clock drivers, however the project is still in its infancy.
Users should not procrastinate on implementing my suggested courses of action
in the belief that a new version release is eminent.

Bruce M. Penrod
TrueTime Inc
2835 Duke Court
Santa Rosa, CA  95407
_____________________________________________________________________
| NOTE: Please send submissions by EMAIL mailto:os2_ann_req@bix.com
| Correspondence to the COOA Moderator: mailto:lfirrantello@bix.com .
| Please see: http://www.bix.com/pub/os2ann/pindex.htm for posting guidelines


From: nmm1@cus.cam.ac.uk (Nick Maclaren) [-/+]
Date: 18 Sep 1997 20:58:24 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: NTP server for Netware
[-/+]
X-Keywords: SNTP
[-/+]

In article <3421818D.2CB4F90C@nhdayton.com>,
Randy Hardin  <rhardin@nhdayton.com> wrote:
>I remember seeing some software a while back that would allow a Netware
>server to act as an NTP time server. I think it was actually tied in
>with the RDATE software that allows it to be an NTP client. Of course,
>now that I need it, I can't find it. Anybody know where it can be found?

For Heaven's sake, do NOT mix rdate and NTP!  If you do that, you are
likely to cause the whole NTP net that you serve to go unstable.  My
SNTP client goes to great trouble to avoid corrupting NTP networks,
and it is a hundred times less likely to cause chaos than anything
base on rdate's design.

Nick Maclaren,
University of Cambridge Computer Laboratory,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email:  nmm1@cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679


From: nmm1@cus.cam.ac.uk (Nick Maclaren) [-/+]
Date: 22 Sep 1997 21:46:07 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: NTP over variable-latency network routes
[-/+]
X-Keywords: compatible
[-/+] delay [-/+] dispersion [-/+] glitch [-/+] SNTP [-/+] stability [-/+]

In article <Jeff-2209971214490001@mac.pn.wagsky.com>,
Jeff Kletsky <Jeff@Wagsky.com> wrote:
>
>However, if the network is heavily loaded, the dispersion estimate goes up
>(as I would expect) with very high delay and offset (e.g., the delay in
>the range of 3000-5000 ms and an offset of -1300 to -2200). xntpd thinks
>that the whole world has gone haywire, and throws in a "big" timestep
>(e.g., -2.24 s).  Once the traffic slows, xntpd realizes the rest of the
>world has a different idea of time and again throws in a big timestep
>(e.g., 1.89 s) to get back to reality.
>
>So much for smooth, reliable timekeeping ;-(
>
>Under heavily loaded conditions, ping to the gateway shows a round-trip
>time of 500-6000 ms, rather than the usual 130-140 ms.  This is likely
>consistent with a transmission time of about 500 ms per 1500-byte packet
>(with some queueing of packets at the host and router).

Nasty.  In the part of your posting that I omitted, you cover most of
the obvious 'solutions', but I am afraid that the only effective one is
to junk NTP and start over with a new design.  In order to handle wildly
erratic synchronisation packets, you need some long-term statistical
averaging in there, and NTP is a horribly deterministic design.

My SNTP client would be partially immune to those problems because (a) it
rejects bad packets (by default > 5 seconds dispersion) and (b) it weights
packets appropriately.  It would, however, fall over if the glitch covered
too many consecutive packets, as its error recovery is conservative rather
than comprehensive.

It would be possible to combine the approaches, but it wouldn't be simple,
and it wouldn't be compatible.  In particular, the responsiveness of the
NTP algorithm is incompatible with the stability of a long-term averaging
method that you would need to resolve your problem.  This is fundamental
in the way the universe works :-(

Nick Maclaren,
University of Cambridge Computer Laboratory,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email:  nmm1@cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679


From: nmm1@cus.cam.ac.uk (Nick Maclaren) [-/+]
Date: 23 Sep 1997 07:41:37 GMT
[-/+]
Newsgroups: comp.protocols.time.ntp
Subject: Re: NTP over variable-latency network routes
[-/+]
X-Keywords: compatible
[-/+] dialup [-/+] dispersion [-/+] glitch [-/+] Mills [-/+] RFC [-/+] SNTP [-/+] specification [-/+] stability [-/+]

In article <60762a$qad@cebaf4.cebaf.gov>,
Larry Doolittle <ldoolitt@recycle.jlab.org> wrote:
>Nick Maclaren (nmm1@cus.cam.ac.uk) wrote:
>: In article <Jeff-2209971214490001@mac.pn.wagsky.com>,
>: Jeff Kletsky <Jeff@Wagsky.com> wrote:
>: >
>: >However, if the network is heavily loaded, the dispersion [ chop ]
>: >So much for smooth, reliable timekeeping ;-(
>: >
>: Nasty.  In the part of your posting that I omitted, you cover most of
>: the obvious 'solutions', but I am afraid that the only effective one is
>: to junk NTP and start over with a new design.  In order to handle wildly
>: erratic synchronisation packets, you need some long-term statistical
>: averaging in there, and NTP is a horribly deterministic design.
>
>NTP protocol itself is OK.  Averaging can be thrown on later, at least
>in the client/server mode.  I found xntpd to be similarly erratic, and
>noticed that it (at least appears to) throw away one key bit information:
>the round trip latency for a query/response.  When that shoots up,
>the client can tell the response is (nearly) meaningless.

Part of the trouble is that there IS no NTP protocol as such.  I had hell
trying to find out what were legal packets, and didn't really succeed.
The protocol's specification has to be deduced from the algorithm's one,
which is itself embedded in the background description.  I failed to
persuade David Mills that RFC 1305 needs a protocol specification,
that can be used on its own.

>: My SNTP client would be partially immune to those problems because (a) it
>: rejects bad packets (by default > 5 seconds dispersion) and (b) it weights
>: packets appropriately.  It would, however, fall over if the glitch covered
>: too many consecutive packets, as its error recovery is conservative rather
>: than comprehensive.
>
>Even long-round-trip communications set bounds on the time, merely
>lousy ones compared to the usual (for me) 1 ms round trip.

That is true, but not the point.  With standard least-squares regression
(which my code uses), the optimal weighting of a data point is inversely
proportional to the square of its estimated error.  This automatically
balances packets by their reliability, WITHOUT having the instability of
simple acceptance/rejection.

>: It would be possible to combine the approaches, but it wouldn't be simple,
>: and it wouldn't be compatible.  In particular, the responsiveness of the
>: NTP algorithm is incompatible with the stability of a long-term averaging
>: method that you would need to resolve your problem.  This is fundamental
>: in the way the universe works :-(
>
>You can dump the NTP algorithm without dumping NTP servers and protocol!
>Feel free to beat me to a decent implemenation :-) .

I have, partially, and could do so wholly for a few weeks' work.  As the
README in my source says, this would be a DISASTER!  The stability of the
NTP algorithm relies on all nodes using the same algorithm; mixing nodes
with completely different properties could lead to partial meltdown.

We are moving towards a world where there are a large number of primary
servers, and few systems are more than a few hops from one (say <= 5).
Furthermore, modern clocks are almost all quartz-based (i.e. not mains)
and so have excellent stability properties, though they may drift very
badly.  The combination of these assumptions would enable us to design
a NTP replacement with the following properties:

    1) Very low transaction rates (1-10 per diem), optionally at times
selected from outside (i.e. dialup).
    2) Very good stability, even with networks that have long periods
(many hours) of near-inaccessibility.
    3) Global and local synchronisations that as good as people are
currently getting out of xntpd in most environments.

But it would NOT be compatible with RFC 1305 and would NOT attempt to
deliver microsecond accuracies over LANs.  The trouble with NTP as a
general time framework is that it is specialised precisely for the very
high accuracy requirement, which few people need.  It thus gives LOW
accuracy (unnecessarily low) in many, more common, circumstances.

That is a standard, unavoidable dilemma that is well-known to all
control theorists.

Nick Maclaren,
University of Cambridge Computer Laboratory,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email:  nmm1@cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679


Next part