NTP "FAQ" #2 (6/96 - 10/97) - Articles (part 20)

Previous part

From: Stuart Anderson <sba@srl.caltech.edu> Date: Wed, 20 Aug 1997 10:46:12 -0700 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: bug [-/+]

I am confused when people refer to Unix time or xntpd as being UTC. On my Solaris box at least, the number of seconds returned by gettimeofday() is not the actual number of seconds since 0h UT 1 Jan 1970

Stated another way, if I subtract the timestamp over a long period of time (including a leap second) the answer is wrong.

Is this sort of behavior to be considered:

1) A bug in Solaris

2) A feature of the definition of Unix time as, "well sort of UTC"

-- Stuart Anderson sba@srl.caltech.edu PGP 2AA64B7D

From: Tom Lane <tgl@netcom.com> [-/+]Date: Thu, 21 Aug 1997 04:35:26 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: bug [-/+] configuration [-/+] update [-/+]

Stuart Anderson <sba@srl.caltech.edu> writes: > Is this sort of behavior to be considered: > 1) A bug in Solaris > 2) A feature of the definition of Unix time as, "well sort of UTC"

Well, Solaris is not alone --- AFAIK no version of Unix has ever accounted for leap seconds.

Even if the standard C library routines did know about leap seconds, plenty of application programs do their own timekeeping calculations, and most of 'em assume there are exactly 86400 seconds per day, every day. (I've written some myself :-(. A common reason for doing so is the standard library's lack of support for dealing with timezones other than your own and GMT.)

And even if all the software were fully leap-sec-cognizant, you'd still be dependent on the local sysadmin to update a configuration table every time a new leap second is declared.

Bottom line is that properly handling leap second timekeeping is still well beyond the average state of practice. I don't think that Unix is any worse off than any other OS in this area...

regards, tom lane

From: "Marc Brett" <Marc.Brett@waii.com> [-/+]Date: 21 Aug 1997 09:59:52 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: bug [-/+]

Stuart Anderson <sba@srl.caltech.edu> wrote: > I am confused when people refer to Unix time or xntpd as being > UTC. On my Solaris box at least, the number of seconds returned > by gettimeofday() is not the actual number of seconds since > 0h UT 1 Jan 1970

> Stated another way, if I subtract the timestamp over a long period > of time (including a leap second) the answer is wrong.

> Is this sort of behavior to be considered:

> 1) A bug in Solaris

> 2) A feature of the definition of Unix time as, "well sort of UTC"

The correct answer is "2".

The classic definition was only ever valid for the short interval from 1970-01-01 00:00:00 UT to 1971-12-31 23:59:69 (sic) UT.

With the introduction of leap seconds on 1972-01-01 00:00:00 UTC, the (new and improved!) Unix time scale actually started (with time_t=0) at 1970-01-01 00:00:10 UTc, fully 10 seconds adrift. Six months later, the Unix time scale again changed and time_t=0 was set to 1970-01-01 00:00:11 UTc. This pattern kept going, and so today (1997-08-21) time_t=0 is 1970-01-01 00:00:31 UTc.

[I use the term UTc because I can't figure out if it shopuld be UT or UTC, or indeed the even more ambiguous GMT. UTC didn't exist before 1972, so it can't be used. However, Unix time is an atomic (or at least quartz oscillator) time scale, so UT, an astronomical time scale defined by the earth's rotation, can't really be used. Hence my compromise, UTc].

Since 1970, there have been 22 different Unix time scales, each one completely valid only for the interval between leap second events. Most people just use the latest scale, which aligns perfectly with UTC, and accept that timing events before the last leap second or after the next one is either a) erroneous or b) complicated.

I hasten to add that all this is not necessarily a Bad Thing. The logistical problems of trying to stick narrowly to the original definition of Unix time far outweigh the current confusion caused by the fuzziness of what, exactly, is meant by "Unix time".

-- Marc Brett +44 181 560 3160 Western Atlas Marc.Brett@waii.com 455 London Road, Isleworth FAX: +44 181 847 5711 Middlesex TW7 5AB England

From: eggert@twinsun.com (Paul Eggert) [-/+]Date: 22 Aug 1997 18:03:38 -0700 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: TAI [-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes:

> The Olson code uses 1970-01-01 00:00:10 TAI, which is > approximately 1.9 seconds different from 1970-01-01 00:00:00 UT.

To some extent we're talking angels-and-pinheads here, since in 1970 the Olson code didn't exist, nor did any POSIX.1 or Unix platforms exist to run it on. But the argument is pretty straightforward: the traditional Unix origin was ``midnight GMT'' on that date, and POSIX.1 clarified the ``GMT'' to mean UTC. And it is somewhat useful to have these definitions nailed down, so that POSIX.1 applications can reliably interchange timestamps back to 1970.

>(You've already made clear that by ``UTC'' you mean UT for dates before >1972. In any case, there are no standard time scales for which your >statement is correct.)

No, actually, by ``UTC'' I meant what the time authorities usually mean when they say ``UTC''. The International Earth Rotation Service (the body that decides when leap seconds occur) uses ``UTC'' to denote the Coordinated Universal Time regime that was in place before 1972. E.g. see: http://hpiers.obspm.fr/webiers/general/earthor/utc/table1.html

Granted, today's UTC method differs somewhat from the pre-1972 method. And I'll also grant you that timekeeping before 1972 was messier than it is today. But today's UTC method is not defined for times before 1972, and the only plausible way to interpret the POSIX.1 origin of 1970 is to use the UTC that was in effect in 1970.

From: eggert@twinsun.com (Paul Eggert) [-/+]Date: 22 Aug 1997 18:38:38 -0700 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes:

>In practice, differences between time_t values are within +-0.1% of real >time differences, even across leap seconds. Most hosts don't run NTP.

I disagree. I think hosts that use POSIX.1-like time rules fall into three categories:

1. Hosts that run NTP and attempt to follow the POSIX.1 rules exactly. On these hosts, most time_t ticks are 1 second long, but when a leap second is introduced, the time_t tick is 2 seconds long.

2. Hosts that use adjtime() to adjust their clock so that there are not big jumps in the clock.

3. Hosts where people change the time by hand when it gets too far off.

I think the vast majority of such hosts fall into category (3). On these hosts, difference between time_t values are not within +-0.1% of real time differences when the clock jumps. The +-0.1% property does hold for category (2). I agree that very few hosts are in category (3) but in some sense the category-(3) hosts are doing the best job.

>you continue to talk about the POSIX time rules as if they were >something more than a description of fundamentally flawed time-date code >created many years ago by a few ignorant programmers.

I agree that there are many flaws in the POSIX time rules. I would like to have a better system in place. This will take a lot of work, and will require the cooperation of a lot of people. One way to get this cooperation would be to have a better system implemented, and to have it used in a few popular applications.

From: eggert@twinsun.com (Paul Eggert) [-/+]Date: 22 Aug 1997 23:03:47 -0700 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: broadcast [-/+] TAI [-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes:

>Which makes more sense: adding leap-second support to a few routines >like localtime() and mktime(), or adding it to thousands of routines >that add and subtract time_t values?

Neither alternative is appealing; that's why few people do either, even though the source code for leap-second localtime is freely available. I've contributed a good deal of time to that effort, but it just isn't catching on. If we want leap second support to be widespread, we have to come up with a better way.

Robin O'Leary's proposal of augmenting NTP to broadcast both TAI and UTC is a promising way to attack this problem. But we shouldn't link this proposal to the idea of cramming leap seconds down localtime's throat: that's a sure way to scare potential users. We need a separate interface to the leap-second-aware mechanism.

>In the real world, most UNIX machines _never_ have their software clocks >readjusted.

Really? I don't know about your neck of the woods, but that isn't true around here. It may take a few weeks or months, but eventually someone gets annoyed with their box's clock being off (it breaks `make'-over-the-network among other things) and they fix it with rdate or call-the-phone-company-and-set-the-time. Around here the only Unix boxes that have _never_ had their clock readjusted are the boxes that were recently unpacked.

From: vjs@calcite.rhyolite.com (Vernon Schryver) [-/+]Date: 23 Aug 1997 10:16:59 -0600 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]

In article <5tlug3$1v6$1@shade.twinsun.com>, Paul Eggert <eggert@twinsun.com> wrote:

> ... >>In the real world, most UNIX machines _never_ have their software clocks >>readjusted. > >Really? I don't know about your neck of the woods, but that isn't true >around here. It may take a few weeks or months, but eventually someone >gets annoyed with their box's clock being off (it breaks >`make'-over-the-network among other things) and they fix it with rdate >or call-the-phone-company-and-set-the-time. Around here the only Unix >boxes that have _never_ had their clock readjusted are the boxes that >were recently unpacked.

There are also commercial UNIX systems that are shipped with `timed` configured on by default precisely to deal with hassles such as `make`.

They may not amount to the majority. Causing all of the systems on the net to have the same time (+/- 10 ms) is not the same thing as causing them all to be within 10 ms of what the Naval Observatory says, and so might not qualify as whatever was meant by "readjusted."

Vernon Schryver vjs@rhyolite.com

From: Achim Gratz <gratz@ite.inf.tu-dresden.de> [-/+]Date: 23 Aug 1997 11:22:11 +0200 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes:

> In the real world, most UNIX machines _never_ have their software clocks > readjusted. The software clock is set from a hardware clock at boot time > and then runs without outside intervention.

That is not true. After all, why would tickadj need the -s option if it were? Anyone doing network makes would care about the time also. That's why I got interested in NTP, anyway.

Achim Gratz.

--+<[ It's the small pleasures that make life so miserable. ]>+-- WWW: http://www.inf.tu-dresden.de/~ag7/{english/} E-Mail: gratz@ite.inf.tu-dresden.de Phone: +49 351 463 - 8325

From: "L. F. Sheldon, Jr." <lsheldon@creighton.edu> [-/+]Date: Sat, 23 Aug 1997 20:45:36 -0500 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: SNTP [-/+]

On 22 Aug 1997, Paul Eggert wrote:

> djb@koobera.math.uic.edu (D. J. Bernstein) writes: [snip] > >In the real world, most UNIX machines _never_ have their software clocks > >readjusted. > > Really? I don't know about your neck of the woods, but that isn't true > around here. It may take a few weeks or months, but eventually someone > gets annoyed with their box's clock being off (it breaks > `make'-over-the-network among other things) and they fix it with rdate > or call-the-phone-company-and-set-the-time. Around here the only Unix > boxes that have _never_ had their clock readjusted are the boxes that > were recently unpacked.

Just a note from a luker--there is a chance that y'all underestimate the the penetration of NTP and SNTP in the world. I would guess that a word like "most" is still strictly speaking correct--but the trend around here seems to be to use some form of time-setting. All of the unix boxes I administer and a non-trivial number of the ones I don't, plus a very large number of the PC's and "minor" servers in our world use NTP or SNTP time-setting-and-keeping.

As interconnections, distributed computing, and such continue to proliferate I think it will become ever more common to see some sort of synchronization in use (and I didn't think to count the number of "isolated" machines I know of that use some sort of dial-up-for-the-time gadget).

I will agree that it will probably be one or two more days before many of use care about, much less understand the TAI-UT-UTC-GMT issues. -- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- . . - L. F. (Larry) Sheldon, Jr. - . Unix Systems Administration . - Creighton University Computer Center-Old Gym - . 2500 California Plaza . - Omaha, Nebraska, U.S.A. 68178 We are all faced with - . lsheldon@creighton.edu great opportunities . - 402 280-2254 (work) brilliantly disguised as - . 402 681-4726 (cellular) impossible situations. . - 402 332-4622 (residence) - . Bits and Pieces . -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-

From: djb@koobera.math.uic.edu (D. J. Bernstein) [-/+]Date: 23 Aug 1997 16:40:10 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]

Paul Eggert <eggert@twinsun.com> wrote: > Neither alternative is appealing;

Leap-second support has to go somewhere.

The first alternative---putting it into localtime() and mktime()---is already available in millions of computers.

The second alternative---putting it into every routine that handles time_t numerically---is, by all reports, available in zero computers.

> that's why few people do either,

No. Very few people have ever considered the issue. The truth is that one person, Olson, made the decision, under pressure from some zealots. Don't blame the users for a decision that they never made.

---Dan Set up a new mailing list in a single command. http://pobox.com/~djb/ezmlm.html

From: nmm1@cus.cam.ac.uk (Nick Maclaren) [-/+]Date: 23 Aug 1997 19:33:17 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: TAI [-/+]

In article <1997Aug2317.32.26.15261@koobera.math.uic.edu>, D. J. Bernstein <djb@koobera.math.uic.edu> wrote: >The reason that people like time_t=TAI is that it actually works. It >offers simple yet accurate relative timings, no painful clock jumps, and >code already written for accurate local-time display.

Yes, but it is also 10-20 seconds different from what virtually every Unix system currently puts in time_t timestamps.

>In contrast, time_t=POSIX-function-of-UTC will _never_ offer accurate >relative timings. A few NTP proponents talk about the semantics that >they claim they're guaranteeing, but nobody is willing to start the >massive conversion project required for time_t manipulation code to >actually support those semantics.

Your first statement is misleading. There is no difficulty in using them to provide accurate relative timings, and it needs only a minor definition of how leap seconds are included. What it does not provide is the ability to have consistent relative INTERVALS.

But your last statement applies to any proposal to convert time_t values to TAI, redoubled in spades. Replacing NTP by a TAI+leap_count value is easy, and could be done semi-transparently, but the next stage is a nightmare.

Nick Maclaren, University of Cambridge Computer Laboratory, New Museums Site, Pembroke Street, Cambridge CB2 3QG, England. Email: nmm1@cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679

From: cbbrowne@news.brownes.org (Christopher B. Browne) Date: 24 Aug 1997 01:46:18 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: TAI [-/+]

On 19 Aug 1997 01:27:44 GMT, D. J. Bernstein <djb@koobera.math.uic.edu> posted: >Marc Brett <Marc.Brett@waii.com> wrote: >> UTC is legal civil time in most of the world,

>False. Civil time is several hours away from UTC in most of the world.

>> Computers are tools >> which interface with peoples lives, and should, at least externally, >> keep the same time as they do: UTC.

>False. Computers should indeed display human-comprehensible times---but >that doesn't mean UTC. It means local time.

And what if I'm in Dallas, and the computer is in California?

Which time should the computer display?

This is not an academic question by any means. I live in Texas, and work with servers that are all sited in Oklahoma. Users are scattered across the planet, and *will* be connecting to the servers in Oklahoma. It's quite probable that such information will be presented in terms of CDT/CST. To *most* users, that indeed is "local" time (because most of the users will indeed be in Fort Worth, Texas, because head office is there, and thus the bulk of the administrative users...). Nonetheless, for those that aren't local, what's the valid time?

However, there are other servers in the same complex for which the vast majority of users are *NOT* in the same time zone. The STIN servers are *probably* accessed by folks in every single time zone that there is. Internally, the system has to use a single time, and that will by necessity not be the same time that users think the time is in their locale.

In effect, there *does* need to be some internal representation. So long as libraries can handle it nicely, it doesn't much matter if we're using UTC, GMT, TAI, or something else folks might think up.

But it *is* preferable if the internal representation has a fairly high degree of "universality."

>> Translating from an internal representation of TAI to an external >> representation of UTC adds an additional layer of complexity

>False. The layer already exists; it is called ``localtime()'' and >``mktime()''. Converting from TAI to UTC is merely the first step in >converting from TAI to local time.

But if localtime() and mktime() think that the "true" time is represented in UTC, then this *does* complicate things.

- Existing file systems and many utilities use UTC - Existing programs treat the "base" representation as being UTC, and use functions like localtime() and mktime() to manipulate UTC values.

I agree that the *intent* of localtime()/mktime() are as you suggest; making usage of a newer representation (such as TAI) universal would be rather disruptive. Perhaps "necessary disruptive," but nonetheless disruptive. -- Christopher B. Browne, cbbrowne@hex.net, chris_browne@sdt.com PGP Fingerprint: 10 5A 20 3C 39 5A D3 12 D9 54 26 22 FF 1F E9 16 URL: <http://www.hex.net/~cbbrowne/> Bill Gates to his broker: "You idiot, I said $150 million on **SNAPPLE**!!!"

From: djb@koobera.math.uic.edu (D. J. Bernstein) [-/+]Date: 24 Aug 1997 03:29:12 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: TAI [-/+]

Christopher B. Browne <cbbrowne@hex.net> wrote: > - Existing file systems

Existing file systems use 4-byte times and will soon have to be fixed. The switch to 8-byte times is a good moment to specify TAI.

> - Existing programs treat the "base" representation as being UTC,

Nonsense. Thousands of programs treat time_t differences as real-time differences. Leap-second support in time_t manipulations is an absurd fantasy.

In contrast, conversion from time_t to struct tm is rather well modularized.

> And what if I'm in Dallas, and the computer is in California?

Set your TZ environment variable to whichever time zone you want to see. What's the problem?

---Dan Set up a new mailing list in a single command. http://pobox.com/~djb/ezmlm.html

From: vjs@calcite.rhyolite.com (Vernon Schryver) [-/+]Date: 24 Aug 1997 10:41:35 -0600 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: configuration [-/+] SCO [-/+] SNTP [-/+]

In article <Pine.HPP.3.95.970823203721.20839D-100000@bluejay.creighton.edu>, L. F. Sheldon, Jr. <lsheldon@creighton.edu> wrote:

> ... >> >In the real world, most UNIX machines _never_ have their software clocks >> >readjusted.

> ... >Just a note from a luker--there is a chance that y'all underestimate the >the penetration of NTP and SNTP in the world. I would guess that a word >like "most" is still strictly speaking correct--but the trend around >here seems to be to use some form of time-setting. All of the unix >boxes I administer and a non-trivial number of the ones I don't, plus >a very large number of the PC's and "minor" servers in our world use >NTP or SNTP time-setting-and-keeping. > ...

Anything that requires manual intervention and is not absolutley positively required for day-to-day operation is not done on the vast majority of boxes, whether PC's or UNIX boxes. If you have only a few dozen or a few hundred systems, you might have the provincial notion that most systems are reasonably, responsibily, or competantly administrated. Once you see real networks with 10,000 or more systems, you learn better. Even in outfits with large staffs of people who do nothing but fiddle with the configuration of stuff (i.e. system or network administrators), anything that is not absolutely required is either not done or done wrong (and never mind about a lot of stuff that is absolutely required). To put it another way, DHCP and BOOTP and the Microsoft equivalents are very popular for good reasons.

>As interconnections, distributed computing, and such continue to proliferate >I think it will become ever more common to see some sort of synchronization >in use (and I didn't think to count the number of "isolated" machines I >know of that use some sort of dial-up-for-the-time gadget).

How many applications really care whether the time on all of the involved systems are closer than you can get by checking Mickey's hands and typing a little every few months? There are a few, such as `make` over NFS, but they are very rare compared to the vast numbers of others. (Never mind that people who frequent this newsgroup are likely to care about `make` and NFS.) Most distributed applications have some notion of "server," and the only ticks that matter are those that are counted by the "server."

Given those facts, ask yourself: 1. how many systems come configured to use NTP or SNTP 'out of the box'? 2. how many applications care whether they keep time to better than 10 minutes of UTC? 3. how many boxes are running such applications?

Because of the answers to those questions, to all intents and purposes, 0.0% of all systems are running NTP or SNTP. For that matter, less than 1% of all UNIX systems are running NTP or SNTP.

My daytime employer, one of the major UNIX box vendors, and SCO have long shipped `timed` (NSP) turned on by default. That means that perhaps as many as 20% of all UNIX boxes are running `timed`. (I continue to fight the good fit to keep my employer's sytem shipping a clock protocol on by default.) NSP is a somewhat lame protocol (i.e. a Bezerkley masters thesis of ~10 years ago). It keeps boxes on a LAN to better than 50 ms of the consensus tick, but worries such as TIA vs. UTC vs. Mickey's hands are far out of its range.

>I will agree that it will probably be one or two more days before many of >use care about, much less understand the TAI-UT-UTC-GMT issues.

If you sat down to write a distributed application, would you rely upon the computers counting ticks in unison? Or would you try things, such as distributing the clock yourself, or arranging to not need to distribute the clock? If you require a good distributed clock, then your application will not work in as many places as it would otherwise (e.g. the at least 80% running neither NTP or TSP), and it will have failure modes it would not otherwise suffer (e.g. when some systems stop agreeing on the number of relevant leap seconds since the Tuesday before last).

In other words, all of this talk about TIA, UTC, and so forth is of vital importance to us clock watchers, but an appropriate subject for ridicule outside this asylum.

Vernon Schryver vjs@rhyolite.com

From: Tom.Horsley@worldnet.att.net (Thomas A. Horsley) [-/+]Date: 24 Aug 1997 08:45:10 -0400 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: TAI [-/+]

>Clearly you have never been involved with organising such a changeover. >I have, both on a single-site academic and single-vendor commercial scale. >Nightmare is a mild word to describe the horrors involved.

Sure I have - it works great if you aren't in a hurry. It goes like this:

* The folks who want the kernel upgrade jump up and down and hold their breath until they turn blue. * The folks who are afraid of instability resist, and since there are more of them (and some of them are the ones with money), they win. * In 10 or 15 years, even the cheapest and most paranoid organization finally junks its old obsolete machines and buys new ones. * When they get new machines, they come with the new kernel that keeps internal time based on TAI.

See - it only takes a couple of decades :-).

>And changing every damn Unix and Internet protocol, system and utility is >an order of magnitude more ghastly than anything I have ever got involved >with. I know what to do, and the first step is to become world dictator. >The second step is harder.

But all the protocols and utilities work the exact same way they always did. That's the point of having the kernel keep TAI internally but convert it to POSIX consistently on every existing kernel call which talks about time. New utilities which use the new system service calls and talk about TAI can come later and gradually on a per-machine basis as any machine (say the one controlling the automated observatory for instance) needs that much fanaticism.

Of course you *do* need to become world dictator in order to force the operating system vendors to change their kernels to keep TAI internally :-).

-- >>==>> The *Best* political site <URL:http://www.vote-smart.org/> >>==+ email: Tom.Horsley@worldnet.att.net icbm: Delray Beach, FL | <URL:http://home.att.net/~Tom.Horsley> Free Software and Politics <<==+

From: vjs@calcite.rhyolite.com (Vernon Schryver) [-/+]Date: 24 Aug 1997 16:32:07 -0600 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: delay [-/+] resolution [-/+] TAI [-/+]

In article <1997Aug2420.38.00.26956@koobera.math.uic.edu>, D. J. Bernstein <djb@koobera.math.uic.edu> wrote:

>> First, most people don't like time_t=TAI. > >Most people have never heard of TAI or UTC.

Yes, and most people have no reason to care.

> hey want the displayed time >to reliably match local time.

Yes, and they define "reliably" as "within a dozen seconds," and could not care less about irregular 1-second hiccups at midnight once in many months.

> They also want a ``1-second'' pause to >always mean 1 second. > >How do you plan to meet these requirements? > >There are tens of thousands of programs that subtract time_t values to >time themselves. Some packages use these timings to select the fastest >code for each machine at installation time. > >Would you like these programs to fail, perhaps disastrously, if they >happen to be run during a leap second?

In any real operating system, you can only delay for at least one second, one tick or one something else. You can not delay for "no more than one second." Any program that might "fail, perhaps disastrously" if it happened to be delayed an extra second either requires a real time operating system or is silly junk. (Think about contention for the CPU.)

The probability of a bad result from hardware timing code that gets the wrong answer due to a leap second hiccup and that works otherwise is zero a lot of significant digits. The probability of the midnight July or January second is involved is around 1 in 15552000, assuming that people are equally likely to be installing stuff at midnight as any other time. Never mind that one of the midnights is widely observed holiday.

Who would use gettimeofday() to measure hardware speed when times() is also available? If you need to measure a duration, and you know a little of how time keeping works, you do not use gettimeoday(). In real systems, consecutive calls to gettimeofday() does not give true time hardware because:

- adjtime() (not to mention settimeofday()) is continually jiggering the gettimeofday() answer. The goal of protocol for which this newsgroup is named is to make gettimeofday() yield an answer close to what the atomic clocks say, and toward that goal we have abandoned the notion that the difference two answers produced by gettimeofday() separated by 1.0 seconds is necessarily more than 0.97 seconds and less 1.03 seconds. (Or whatever your system's maximum adjtime() slew-rate.)

- many systems fiddle with the microseconds from gettimeofday() to give the false but desirable illusion that the hardware clock has resolution finer than milliseconds.

- times() yields hardware ticks regardless of leap seconds, settimeofday(), and adjtime() (at least in the systems I know a little about).

> ... >> A. It's incompatible with all the existing hosts out there, so one can't >> interchange data (e.g. `tar' files) reliably; and > >Silly argument. The timestamps in tar files available around the net are >wildly unreliable right now. Anyway, the tar format uses 4-byte times, >so it has to be upgraded soon.

While worrying about whether TAR timestamps to within a dozen seconds is silly, at least as silly is talk about 4-byte times that must "be upgraded soon," unless you have an unusual notion of "soon." While we all plan to be arround in the 22nd century, but few of us will be.

Could this angels-on-a-pin stuff be taken away?

The facts, unchangable by wishing, flaming, or arguing, are that:

1. it would have been nice if UNIX/POSIX had dealt with leap seconds.

2. it/they did not and do not.

3. Anyone who thinks that leap seconds might be a noticable POSIX hassle has a serious shortage of experience when it comes to dealing with the POSIX compliance song and dance. Any system has far bigger problems with POSIX and hassles with the POSIX test suites than leap seconds, including concerning the answers that gettimeofday() yields.

4. there are some, but only a few applications that care about leap seconds. (and software or hardware installion are not among them)

5. no application that is otherwise usable is going to "fail disasterously" regardless of leap seconds. Leap seconds are simply one of many causes for clocks to jerk around, and any usable application must be prepared to deal with strange hiccups, including time that seems to go backwards because a time daemon realized that the operator set things wrong last week.

6. teaching POSIX about leap seconds is not going to happen this year nor probably this century.

7. practically no one cares about #6.

About #5--several times this year, many of the clocks at my day job have jumped 3600 seconds into the future for about 40 minutes, and then jumped back. It seems that the old WWV receiver used to sync much of the net has occassionally hiccuped. The chaos a single such hiccup produces dwarfs the sum of all of the leap seconds hassles there have ever been or ever will be.

Vernon Schryver vjs@rhyolite.com

From: eggert@twinsun.com (Paul Eggert) [-/+]Date: 24 Aug 1997 17:46:05 -0700 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: Linux [-/+] TAI [-/+]

Tom.Horsley@worldnet.att.net (Thomas A. Horsley) writes:

>It would be entirely feasible for kernel code to keep TAI internally and >convert it to POSIX time when folks use the existing system services. Then >you just need to provide *new* system service calls for TAI, and library >routines which use them.

An approach like this would take 20 years or so, but I think it'd work. Vendors would probably resist it, because they don't want the hassle of maintaining the leap second info. However, the BSD and/or Linux crowds could start the ball rolling. We'd also need a consensus on how to handle timestamps in the future, and timestamps before 1972.

From: eggert@twinsun.com (Paul Eggert) [-/+]Date: 24 Aug 1997 23:39:56 -0700 [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Building a better time protocol using TAI [-/+]X-Keywords: TAI [-/+]

djb@koobera.math.uic.edu (D. J. Bernstein) writes: >Paul Eggert <eggert@twinsun.com> wrote: >> Some commercial vendors have even removed support >> for time_t=TAI -- it's not shipped with SunOS 5, for example, >It was Guy Harris who took the code out. >He later admitted that he should have left it in.

Both items are news to me. But regardless of who took out the code, such decisions are not made in a vacuum. It's plain that Sun's customers didn't use time_t=TAI; otherwise it would have not been removed, or at least someone would have complained and the code would have been put back in. So my point that time_t=TAI is rarely used is still valid.

>> A. [time_t=TAI is] incompatible with all the existing hosts out there, >> so one can't interchange data (e.g. `tar' files) reliably; and >The timestamps in tar files available around the net are >wildly unreliable right now.

Yes, but that's irrelevant. My point was that time stamps should be interpreted consistently, independently of how inaccurate they are. For example, programs' output can contain human-readable forms of `tar' time stamps, and if such output differs from host to host this will make it harder to do regression tests.

>> B. One can't convert future times reliably; >Leap seconds are no different from time zones in this >respect. Have you not noticed that the authorities in some countries >change their time zone laws more often than you change your watch?

Yes, I have noticed it. But the problem you mention is not limited to future times; it also applies to past times. For example, what was the local time in Brussels when the Armistice ended World War I? Different sources report different values, and the source that should be authoritative (namely, the Annuaire de L'Observatoire Royal de Belgique) is ambiguous. So it's quite possible for different implementations to report different answers to questions about times in the past. (This problem is not restricted to times far in the past -- I mention the Brussels example merely because it is the one that crossed my desk most recently.)

So I agree that local times cannot be converted reliably in general. However, there are many important special cases where it's quite practical to convert local times accurately. For example, even though I'm not 100% sure, I have a very, very high degree of confidence as to when 2001-01-01 00:00:00 (local time) will occur in Tokyo. Frankly, if my time primitives insisted on TAI and refused to convert this future time stamp, I would chuck them and find a better set of time primitives.

Let's put it this way: the time wizards have changed our timekeeping basis several times in the past 100 years, but Tokyo's New-Year's-Day offset from Greenwich hasn't changed even once. I'm more confident of that offset not changing than I am of our time basis not changing!

>> Second, time_t=TAI does not ``actually work'' in general, because it >> is undefined for old timestamps. >This is already solved by my TAI64 format, which uses the TAI second and >a particular TAI epoch but then defines its own timestamps

How does your code handle time stamps in, say, 1918? TAI isn't defined for time stamps that old. It sounds like you've invented your own TAI-like time scale for back then, but you haven't explained how it actually works.

>for a span of a few hundred billion years.

Hmmmm. How do you define TAI that far back? TAI is defined with respect to sea level, and sea level hasn't existed for that long. There are also some interesting theoretical problems once you go back before the Big Bang....

>[People] want a ``1-second'' pause to always mean 1 second.

True, but that's irrelevant to the current discussion. The traditional Unix primitives have never given you a reliable way to sleep exactly 1 second, so leap second glitches are not an undue burden here.

>There are tens of thousands of programs that subtract time_t values to >time themselves. Some packages use these timings to select the fastest >code for each machine at installation time.

This is a weak, weak argument for a time_t=TAI. Any such program, if written naively, will choose suboptimally for many reasons other than leap seconds. Leap seconds are well down in the noise for this particular problem. Besides, any program that wants to do a reasonable job of this ought to be taking the median of several timed runs, and in that case any leap second glitches should wash out.

>> Most practical applications deal with UTC, not TAI, >... Almost every time display is local time.

The _display_ may be local time, but the _applications_ deal with UTC internally. Many applications do things like add 3600 seconds to get to the next hour; this breaks with time_t=TAI.

But we're straying from my point, which was that UTC is much more commonly used than TAI, even if only input/output purposes are considered; so even if one really wants a TAI time_t for modern time stamps, it makes sense to cater to UTC-using applications when deciding the historical dividing line between a UTC time_t and a TAI time_t.

>> most people don't like time_t=TAI. >Most people have never heard of TAI or UTC.

True; I should have written ``most people who have considered the issue don't like time_t=TAI''. People have voted with their feet.

From: bwb@etl.noaa.gov (Bruce Bartram 303-497-6217) [-/+]Date: 19 Aug 1997 22:20:41 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Request for help setting up a stratum-3 machine X-Keywords: TAI [-/+] timezone [-/+]

Terrence Brannon (brannon@quake.usc.edu) wrote: : Thanks for the help. However, the stratum-3 machine seems to be : exactly 3 hours behind the stratum-2 machine... the timezone is : different.

Howdy,

Each process on a un*x machine has its own timezone in the environment variable TZ. If there is no TZ, a host-wide default should apply. The host clock runs in UTC, as does NTP and xntpd (ignoring the important "future" thread -- I like TAI and UTC).

Try this Terrence, for the sh (or bash) shell, sh (with default $ prompt) $ echo $TZ $ TZ=US/Pacific date $ TZ=US/Hawaii date $ TZ=GMT date If you use the csh shell, type "sh" or "bash" to get a Bourne shell.

You should see the different timezones and offsets.

If you don't get sensible answers, you may have problems with the /usr/share/lib/zoneinfo directory and its files. The little files are binary output from the "zic" program ("man zic" is interesting, if a bit arcane). The files with names like "northamerica" are long text files with the timezone rules. Feeding these into zic makes the binary files. There are a few strangenesses with the POSIX sign...

If your answers make sense and you need to change your personal TZ, add a line like "setenv TZ US/Pacific" to your .cshrc or .login, or "TZ=US/Pacific" to your .profile file.

To change the system default timezone, I think the magic is in the /etc/TIMEZONE, /etc/default/init or /usr/share/lib/zoneinfo/localtime files. These are system specific. On my Solaris 2.5 system, the magic is in the last line of /etc/TIMEZONE, and needs to be "TZ=US/Pacific" like the .profile line. I think a reboot is needed to make this take effect. On a SunOS 4.1 system, I think the technique is to make /usr/share/lib/zoneinfo/localtime be the desired zic output file. I'd suggest (as root of course, and with all the warnings about how this might be VERY BAD ! and you must understand what all this will do before executing it): # cd /usr/share/lib/zoneinfo # ln -s US/Pacific localtime but I've never tried this. I suggest that the default timezone should be set to the local wall clock time, so a casual "date" is wall clock. I think this makes logfiles and sendmail headers easier to read.

I've seen a host with badly mangled junk in /usr/share/lib/zoneinfo. On that system, I made my own timezone files and have setenv TZ $HOME/timezone/US/Mountain to ignore the mangled stuff.

Feel free to email me directly if you want more help.

Bruce Bartram bbartram@etl.noaa.gov just another chimehead

From: "Doug Hogarth" <DougHo@niceties.com> [-/+]Date: 23 Aug 1997 15:17:33 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: Timeserv on NT is using up CPU time X-Keywords: bug [-/+] USNO [-/+]

See the flash at the top of my http://home1.gte.net/dougho/TimeServ.html In short, you were using Type=Internet instead of Type=NTP, and when the USNO discontinued their non-NTP services around the end of 13 August, a bug in my program occured. Sorry for any inconvenience.

John C. Binder <jbinder@s-vision.com> wrote in article <01bcad0c$fa2ae820$78ad6ccf@jbinder2>... > We have been running Timeserv on an NT 4.0 Server for almost a year. Last > week the timeserv started to go haywire. It started using up CPU cycles to > the point that the processor was running at 100%. I have had to shut it > down. > > We were pointing to the USNO. Does anyone have an idea what is happening? > Is anyone else having a similar problem. I have used other machines and got > the same result.

From: "Marc Brett" <Marc.Brett@waii.com> [-/+]Date: 21 Aug 1997 10:16:52 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: NTP version 2 X-Keywords: documentation [-/+] fudge [-/+] precision [-/+] prefer [-/+]

C.S.Transport <hk3705@usa.net> wrote: > Howdy,

> I use some Digital AlphaServers with Digital Unix 3.2 C. This is > delivered with NTP version 2 (according to ntpq: ntpversion). > I would like to use the equivalent of the following ntp.conf file > (version NTP 3) for the version 2:

> server 127.127.1.1 prefer > fudge 127.127.1.1 stratum 1

> Any idea on how to "translate" that? > (This would be to prevent NTP from adjusting the local clock of > the server).

It's important to distinguish between NTP versions 2 & 3 -- the protocols, and xntp versions 2.x & 3.x -- the implementations. Configuration files have nothing to do with the NTP protocol, only the software.

The best place to look would be the Digital documentation.

That said, I know that early versions of xntp 3.x had a different syntax to the current releases, namely:

server 127.127.1.10

where 10 would be the stratum.

Perhaps this might work with your software?

Incidentally, it is a Bad Idea to set the local clock to such a low stratum as 1. It implies more precision than is actually being delivered. If you ever get a radio clock or some other low-stratum clock on the Internet, you will want it to be used in preference to the less accurate local clock. Stratum numbers will allow clients to discriminate.

Also, the "prefer" keyword mucks up the clock selection algorithm, and should be used only in exceptional circumstances.

-- Marc Brett +44 181 560 3160 Western Atlas Marc.Brett@waii.com 455 London Road, Isleworth FAX: +44 181 847 5711 Middlesex TW7 5AB England

From: robin@acm.nospam.org (Robin O'Leary) [-/+]Date: 22 Aug 1997 22:02:25 +0100 [-/+]Newsgroups: comp.protocols.time.ntp Subject: TAI, UTC and POSIX-time X-Keywords: TAI [-/+]

Thanks to the many people who have emailed and posted comments that helped me get this clear (I hope):

International Atomic Time (TAI) counts standard seconds since a well-defined epoch (in 1972) and names them with successive integers, 0, 1, 2, 3, etc.; there are no minutes, days, months or years in TAI. A sufficiently stable oscillator can keep TAI time indefinitely without outside involvement.

Univeral Co-ordinated Time (UTC) names the same seconds in the conventional YYYY-MM-DD HH:MM:SS manner and it has the occasional hiccup in numbering (23:59:60) that we call positive leap-seconds. It isn't meaningful to talk of ``UTC seconds'' in contrast to ``TAI seconds'' since there is a perfect 1:1 correspondence between seconds in both systems; the only difference is how they are labelled. To map between historic TAI and UTC requires the use of a table of leap-seconds; this table is being lengthened continually by the International Earth Rotation Service).

POSIX-time, or ``seconds since the epoch'' as they define it, is TAI minus the number of complete positive leap-seconds in UTC up to that point (plus the number of negative leap-seconds, but there haven't been any). Unfortunately, this means that some seconds (the positive leap-seconds) can't be given unique POSIX-time labels; such a second gets the same label as its immediate successor.

For example: TAI 598 599 600 601 602 UTC :58 :59 :60 :00 :01 POSIX 598 599 600 600 601

What we are having trouble with isn't really UTC at all---UTC ticks in synchronisation with TAI quite happily---but with POSIX-time. POSIX-time is what you get if you convert UTC to an integer using the POSIX function: tm_sec + tm_min*60 + tm_hour*3600 + tm_yday*86400 + (tm_year-70)*31536000 + ((tm_year-69)/4)*86400 which leaves out all leap-seconds (and the leap day in the year 2100).

NTP currently distributes POSIX-time together with a leap-second warning flag. This is better than POSIX-time alone, since it is possible to keep the warning flag and use it to disambiguate the POSIX-times around leap seconds. But what would be even better would be for NTP also to convey the number of leap-seconds since the epoch; that value could then be added to POSIX-time to get TAI. What would be equivalent, but philosophically best of all, would be for NTP to transmit TAI together with the number of leap-seconds.

I am proposing that by one means or another, NTP is augmented so that it provides enough information to enable an NTP client to report current TAI, UTC and POSIX-times.

Robin O'Leary. -- <robin@nospam.acm.org> +44 973 310035 P.O. Box 20, Swansea SA2 8YB, U.K.

From: bpenrod@nbn.com [-/+]Date: 24 Aug 1997 02:01:00 -0400 [-/+]Newsgroups: comp.os.os2.announce,comp.protocols.time.ntp,comp.os.os2.misc,comp.os.os2.networking.tcp-ip,comp.os.os2.scitech Subject: WARNING: To Users of OS2_NTPD, Network Time Protocol Client for OS/2 X-Keywords: poll [-/+] release [-/+] TrueTime [-/+]

~Reply-to: bpenrod@truetime.com [Followups directed to comp.protocols.time.ntp] --------------------------------------------------------------------- WARNING: To Users of OS2_NTPD, Network Time Protocol Client for OS/2

This announcement has been made necessary due to changes in the operation of the base clock drivers for OS/2: clock01.sys and clock02.sys. Coincidental to the final release of Netscape for OS/2 back in October of last year, these new drivers first appeared in the multimedia plug-in pak for Netscape which was available for download along with the final version of Netscape.

Since then, these new drivers have been incorporated into the Fixpaks for both Warp V3 and V4, starting with FP26 for Warp V3 and FP1 for Warp V4. These new drivers have seriously affected the operation of the timekeeping functions used by OS2_NTPD for timetagging NTP packet requests and replies as well as for performing sub-second level adjustments to the Real Time Clock. While running with these drivers, OS2_NTPD is able to maintain the system clock accuracy only to the one-half second level. For users who are unsure about the Corrective Service Level of their systems, the symptom of the problem caused by the new clock drivers is the repeated display of this message in the status area of the OS2_NTPD window:

"Unable to perform Clock Adjustment now, interrupted--Sleep Error = xx"

where 'xx' is some number of milliseconds.

The purpose of this announcement is to warn users who are operating with the new, problematic clock drivers that a side effect of the decreased performance is a dramatic INCREASE in the normal polling rate of the NTP servers. This is a result of the inability of the program to pull the system clock close enough to allow the polling interval to be extended. As a result, the program continues to poll at the initial interval (default is 16 seconds) indefinitely. This has caused some concern at some of the more well known public NTP servers like those at the US Naval Observatory, tick and tock. These servers must process thousands of NTP packet requests per day. I have been informed by the operator of those servers that "NTP hogs" will be selectively filtered from access to these servers.

I recommend either of two strategies for fixing this problem:

1. Replace the clock01.sys and clock02.sys files in your os2\boot directory with the originals from your distribution disks or CD. Unless you are using multimedia, I believe there are no benefits to the new drivers.

2. If you must operate with the new drivers, edit the cfgdata file in the os2_ntpd directory so that the initial polling interval is greater than or equal to 64 seconds. The default in the distribution zipfile is set to 16 seconds.

In the meantime, work is being done to modify OS2_NTPD so that it will work with the new clock drivers, however the project is still in its infancy. Users should not procrastinate on implementing my suggested courses of action in the belief that a new version release is eminent.

Bruce M. Penrod TrueTime Inc 2835 Duke Court Santa Rosa, CA 95407 _____________________________________________________________________ | NOTE: Please send submissions by EMAIL mailto:os2_ann_req@bix.com | Correspondence to the COOA Moderator: mailto:lfirrantello@bix.com . | Please see: http://www.bix.com/pub/os2ann/pindex.htm for posting guidelines

From: nmm1@cus.cam.ac.uk (Nick Maclaren) [-/+]Date: 18 Sep 1997 20:58:24 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: NTP server for Netware [-/+]X-Keywords: SNTP [-/+]

In article <3421818D.2CB4F90C@nhdayton.com>, Randy Hardin <rhardin@nhdayton.com> wrote: >I remember seeing some software a while back that would allow a Netware >server to act as an NTP time server. I think it was actually tied in >with the RDATE software that allows it to be an NTP client. Of course, >now that I need it, I can't find it. Anybody know where it can be found?

For Heaven's sake, do NOT mix rdate and NTP! If you do that, you are likely to cause the whole NTP net that you serve to go unstable. My SNTP client goes to great trouble to avoid corrupting NTP networks, and it is a hundred times less likely to cause chaos than anything base on rdate's design.

Nick Maclaren, University of Cambridge Computer Laboratory, New Museums Site, Pembroke Street, Cambridge CB2 3QG, England. Email: nmm1@cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679

From: nmm1@cus.cam.ac.uk (Nick Maclaren) [-/+]Date: 22 Sep 1997 21:46:07 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: NTP over variable-latency network routes [-/+]X-Keywords: compatible [-/+] delay [-/+] dispersion [-/+] glitch [-/+] SNTP [-/+] stability [-/+]

In article <Jeff-2209971214490001@mac.pn.wagsky.com>, Jeff Kletsky <Jeff@Wagsky.com> wrote: > >However, if the network is heavily loaded, the dispersion estimate goes up >(as I would expect) with very high delay and offset (e.g., the delay in >the range of 3000-5000 ms and an offset of -1300 to -2200). xntpd thinks >that the whole world has gone haywire, and throws in a "big" timestep >(e.g., -2.24 s). Once the traffic slows, xntpd realizes the rest of the >world has a different idea of time and again throws in a big timestep >(e.g., 1.89 s) to get back to reality. > >So much for smooth, reliable timekeeping ;-( > >Under heavily loaded conditions, ping to the gateway shows a round-trip >time of 500-6000 ms, rather than the usual 130-140 ms. This is likely >consistent with a transmission time of about 500 ms per 1500-byte packet >(with some queueing of packets at the host and router).

Nasty. In the part of your posting that I omitted, you cover most of the obvious 'solutions', but I am afraid that the only effective one is to junk NTP and start over with a new design. In order to handle wildly erratic synchronisation packets, you need some long-term statistical averaging in there, and NTP is a horribly deterministic design.

My SNTP client would be partially immune to those problems because (a) it rejects bad packets (by default > 5 seconds dispersion) and (b) it weights packets appropriately. It would, however, fall over if the glitch covered too many consecutive packets, as its error recovery is conservative rather than comprehensive.

It would be possible to combine the approaches, but it wouldn't be simple, and it wouldn't be compatible. In particular, the responsiveness of the NTP algorithm is incompatible with the stability of a long-term averaging method that you would need to resolve your problem. This is fundamental in the way the universe works :-(

Nick Maclaren, University of Cambridge Computer Laboratory, New Museums Site, Pembroke Street, Cambridge CB2 3QG, England. Email: nmm1@cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679

From: nmm1@cus.cam.ac.uk (Nick Maclaren) [-/+]Date: 23 Sep 1997 07:41:37 GMT [-/+]Newsgroups: comp.protocols.time.ntp Subject: Re: NTP over variable-latency network routes [-/+]X-Keywords: compatible [-/+] dialup [-/+] dispersion [-/+] glitch [-/+] Mills [-/+] RFC [-/+] SNTP [-/+] specification [-/+] stability [-/+]

In article <60762a$qad@cebaf4.cebaf.gov>, Larry Doolittle <ldoolitt@recycle.jlab.org> wrote: >Nick Maclaren (nmm1@cus.cam.ac.uk) wrote: >: In article <Jeff-2209971214490001@mac.pn.wagsky.com>, >: Jeff Kletsky <Jeff@Wagsky.com> wrote: >: > >: >However, if the network is heavily loaded, the dispersion [ chop ] >: >So much for smooth, reliable timekeeping ;-( >: > >: Nasty. In the part of your posting that I omitted, you cover most of >: the obvious 'solutions', but I am afraid that the only effective one is >: to junk NTP and start over with a new design. In order to handle wildly >: erratic synchronisation packets, you need some long-term statistical >: averaging in there, and NTP is a horribly deterministic design. > >NTP protocol itself is OK. Averaging can be thrown on later, at least >in the client/server mode. I found xntpd to be similarly erratic, and >noticed that it (at least appears to) throw away one key bit information: >the round trip latency for a query/response. When that shoots up, >the client can tell the response is (nearly) meaningless.

Part of the trouble is that there IS no NTP protocol as such. I had hell trying to find out what were legal packets, and didn't really succeed. The protocol's specification has to be deduced from the algorithm's one, which is itself embedded in the background description. I failed to persuade David Mills that RFC 1305 needs a protocol specification, that can be used on its own.

>: My SNTP client would be partially immune to those problems because (a) it >: rejects bad packets (by default > 5 seconds dispersion) and (b) it weights >: packets appropriately. It would, however, fall over if the glitch covered >: too many consecutive packets, as its error recovery is conservative rather >: than comprehensive. > >Even long-round-trip communications set bounds on the time, merely >lousy ones compared to the usual (for me) 1 ms round trip.

That is true, but not the point. With standard least-squares regression (which my code uses), the optimal weighting of a data point is inversely proportional to the square of its estimated error. This automatically balances packets by their reliability, WITHOUT having the instability of simple acceptance/rejection.

>: It would be possible to combine the approaches, but it wouldn't be simple, >: and it wouldn't be compatible. In particular, the responsiveness of the >: NTP algorithm is incompatible with the stability of a long-term averaging >: method that you would need to resolve your problem. This is fundamental >: in the way the universe works :-( > >You can dump the NTP algorithm without dumping NTP servers and protocol! >Feel free to beat me to a decent implemenation :-) .

I have, partially, and could do so wholly for a few weeks' work. As the README in my source says, this would be a DISASTER! The stability of the NTP algorithm relies on all nodes using the same algorithm; mixing nodes with completely different properties could lead to partial meltdown.

We are moving towards a world where there are a large number of primary servers, and few systems are more than a few hops from one (say <= 5). Furthermore, modern clocks are almost all quartz-based (i.e. not mains) and so have excellent stability properties, though they may drift very badly. The combination of these assumptions would enable us to design a NTP replacement with the following properties:

1) Very low transaction rates (1-10 per diem), optionally at times selected from outside (i.e. dialup). 2) Very good stability, even with networks that have long periods (many hours) of near-inaccessibility. 3) Global and local synchronisations that as good as people are currently getting out of xntpd in most environments.

But it would NOT be compatible with RFC 1305 and would NOT attempt to deliver microsecond accuracies over LANs. The trouble with NTP as a general time framework is that it is specialised precisely for the very high accuracy requirement, which few people need. It thus gives LOW accuracy (unnecessarily low) in many, more common, circumstances.

That is a standard, unavoidable dilemma that is well-known to all control theorists.

Nick Maclaren, University of Cambridge Computer Laboratory, New Museums Site, Pembroke Street, Cambridge CB2 3QG, England. Email: nmm1@cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679

Next part