Discussion:
Are there different programming languages that are compiled to the same intermediate language?
(too old to reply)
Roger L Costello
2023-01-29 15:49:14 UTC
Permalink
Hi Folks,

Programming_Language_1 has a compiler which generates Intermediate_Code_1.
Many backends are created for Intermediate_Code_1.

Time passes ....

Sally Smith creates Programming_Language_2. Sally wants to leverage the
compiler work done on Programming_Language_1. She considers two approaches:

1. Create a compiler front-end for Programming_Language_2 which compiles
instances to Intermediate_Code_1.
2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.

Sally is concerned. She asks herself:

- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate
is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1
that I generate is semantically equivalent to my Programming_Language_2
instance?"

What is the answer to Sally's questions?

/Roger
[I think the answer either way is that you don't, and you try to have a test
suite that is as comprehensive as possible. -John]
Thomas Koenig
2023-01-29 22:14:51 UTC
Permalink
Post by Roger L Costello
Hi Folks,
Programming_Language_1 has a compiler which generates Intermediate_Code_1.
Many backends are created for Intermediate_Code_1.
Time passes ....
Sally Smith creates Programming_Language_2. Sally wants to leverage the
1. Create a compiler front-end for Programming_Language_2 which compiles
instances to Intermediate_Code_1.
Compilers have been doing that for a very long time. I remember
reading that the IBM PL.8 compiler used a target-independent
intermediate representation, and gcc and LLVM both adopt this.
Makes sense - most of the optimizations are target-indpendent.
Typically, SSA (single static assignment) is used these days
because it offers many optimization opportunites.
Post by Roger L Costello
2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.
That is also an approach, which works if the Programming_Language_2
is sufficiently low-level, and yet sufficiently flexible, to allow
a reasonable representation. In practice, this usually means C,
where things like pointer arithmetic can be used to implement
multi-dimensional arrays.

Fortran is a prime example. Both the well-known f2c translator, based
on the Bell Labs Fortran 77 compiler, and the NAG Fortran comppiler use
C as intermediate language.
Post by Roger L Costello
- With either approach, how do I prove that my mapping is correct?
Usually, she can't.

Compiler front ends are well-known to have bugs, and it would be
very hard to prove their correctness, especially if the bugs are
in semantics, which are usually not described in a formal way.
Post by Roger L Costello
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate
is correct for my programming language?
See above.
Post by Roger L Costello
- For approach 2, how do I prove that the instance of Programming_Language_1
that I generate is semantically equivalent to my Programming_Language_2
instance?"
Same problem.

[Target-independent intermediate forms work OK when the source
languages are semantically similar, e.g. C and Fortran, and range of
target architectures is limited, e.g., two's complement machines with
8-bit byte flat addressing. Every few years someone comes up with the
bright idea to do an all purpose intermediate language, which never
works. See UNCOL around 1960 for the first time that idea failed.

By the way, f2c is based on f77 which compiled Fortran into the intermediate
language the PDP-11 C compiler used. -John]
Aharon Robbins
2023-01-30 08:03:12 UTC
Permalink
Post by Thomas Koenig
By the way, f2c is based on f77 which compiled Fortran into the intermediate
language the PDP-11 C compiler used. -John]
Just to be clear, it was the intermediate language of Johnson's PCC,
which was first targeted at the Interdata, then the PDP-11 and Vax,
not Dennis Ritchie's original PDP-11 compiler.

Arnold
[Good point. I wrote a competing F77 compiler INfort which did use the
Ritchie intermediate language. Someone else later modified it to use
pcc. -John]
William Fahle
2023-01-30 10:00:42 UTC
Permalink
Post by Roger L Costello
- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate
is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1
that I generate is semantically equivalent to my Programming_Language_2
instance?"
What is the answer to Sally's questions?
/Roger
[I think the answer either way is that you don't, and you try to have a test
suite that is as comprehensive as possible. -John]
This is Post's Correspondence Problem, which has been proven to be
equivalent to the Halting Problem, thus uncomputable.
[Remember that while the halting problem is insoluble in general, it's
often soluble in specific cases, e.g. "halt" or "foo: goto foo".
Nevertheless, I would be quite surprised if anyone could prove a
non-toy translator correct. -John]
Roger L Costello
2023-01-30 14:36:25 UTC
Permalink
Post by Thomas Koenig
Post by Roger L Costello
2. Create a translator which converts instances of
Programming_Language_2 into
Programming_Language_1 instances.
That is also an approach, which works if the
Programming_Language_2 is sufficiently low-level
Is that a typo? Do you mean Programming_Language_1 is sufficiently low level?
Recall that the approach is to translate instances of Programming_Language_2
to instances of Programming_Language_1.

I will assume you meant Programming_Language_1. So you are saying that the
target language (Programming_Language_1) must be a lower level language than
the source language (Programming_Language_2), right?
Post by Thomas Koenig
Target-independent intermediate forms work OK
when the source languages are semantically similar
How to determine if two source languages are semantically similar?

/Roger
[Not to snark too much, but you know when when you see it. C and Fortran
are similar, COBOL and Lisp are not. -John]
Kaz Kylheku
2023-01-30 17:36:30 UTC
Permalink
Post by Roger L Costello
Hi Folks,
Programming_Language_1 has a compiler which generates Intermediate_Code_1.
Many backends are created for Intermediate_Code_1.
Time passes ....
Sally Smith creates Programming_Language_2. Sally wants to leverage the
1. Create a compiler front-end for Programming_Language_2 which compiles
instances to Intermediate_Code_1.
2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.
- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate
is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1
that I generate is semantically equivalent to my Programming_Language_2
instance?"
What is the answer to Sally's questions?
The answer is that proving a compiler correct is the same as proving
any other software correct. What can make it it easier than some
software is that there is no real-time, nondeterministic behavior. A
compiler should produce exactly the same result every time it is
invoked with the same inputs.

Also, compilers can be written using pure functional techniques, and
this is actually "easy" to do, compared to some other kinds of
tasks. Recursion is natural for compiling because the input has a
nested, recursive structure, which the compiler follows.

Functional programs are easier to prove correct than imperative or
object-oriented programs which work by mutating the state in variables
and objects.

The compiler is expressed as a recursive function which handles all the
cases that occur (syntax shapes being translated). You can then argue
that it is correct by induction: inspect that each case is performing a
correct transformation from source language to the intermediate language
for the construct which it is handling, and that it's correctly invoking
itself recursively for the sub-expressions that the input contains.
Also included must be an argument showing that the recursion terminates.

E.g. If you know that your compiler handles, say, an if/else statement
correctly and some kind of for loop also correctly, and it's all
clean recursion, you can confidently argue that if statements nested
in loops, and vice versa, or in each other, are also handled correctly.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
[I have bad news for you -- there are real compilers that produce different
results on different runs, due to things like it being loaded at different
memory addreses and hash tables of pointers end up in different orders. -John]
Martin Ward
2023-01-31 14:04:39 UTC
Permalink
Post by William Fahle
[Remember that while the halting problem is insoluble in general, it's
often soluble in specific cases, e.g. "halt" or "foo: goto foo".
More usefully, there are methods for transforming a formal
specification into an executable implementation which
preserve semantic equivalence, and therefore are guaranteed
to produce a terminating program (for input states for which
the specification is defined). For example:

"Provably Correct Derivation of Algorithms Using FermaT"
Martin Ward and Hussein Zedan
Formal Aspects of Computing, September 2014.
Volume 26, Issue 5, Pages 993--1031, 2014-09-01, ISSN: 0934-5043
doi:dx.doi.org/10.1007/s00165-013-0287-2
http://www.gkc.org.uk/martin/papers/trans-prog-t.pdf
Post by William Fahle
Nevertheless, I would be quite surprised if anyone could prove a
non-toy translator correct. -John]
According to this paper: "The state of the art [as of 2014] is
the CompCert compiler, which compiles almost all of the C language
to PowerPC, ARM and x86 assembly and has been mechanically verified
using the Coq proof assistant."

Xavier Leroy. "Formally verifying a compiler: Why? How? How far?".
CGO 2011 - 9th Annual IEEE/ACM International Symposium on Code
Generation and Optimization, Apr 2011, Chamonix, France.
⟨10.1109/CGO.2011.5764668⟩. ⟨hal-01091800⟩

https://hal.inria.fr/hal-01091800

--
Martin

Dr Martin Ward | Email: ***@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4
Aharon Robbins
2023-02-01 08:07:24 UTC
Permalink
I've never understood this. Isn't there a chicken and egg problem?
How do we know that the theorem prover is correct and bug free?

I ask in all seriousness.

Thanks,

Arnold
Post by Martin Ward
Post by William Fahle
[Remember that while the halting problem is insoluble in general, it's
often soluble in specific cases, e.g. "halt" or "foo: goto foo".
More usefully, there are methods for transforming a formal
specification into an executable implementation which
preserve semantic equivalence, and therefore are guaranteed
to produce a terminating program (for input states for which
the specification is defined). ...
[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]
Spiros Bousbouras
2023-02-05 15:14:02 UTC
Permalink
On 01 Feb 2023 08:07:24 GMT
Post by Aharon Robbins
I've never understood this. Isn't there a chicken and egg problem?
How do we know that the theorem prover is correct and bug free?
We prove it by hand ? I don't know if anyone has done this for the
provers mentioned but it feels like a feasible task and when it's done ,
it's done i.e. a theorem prover is not the kind of software which
continually gets updated.
Post by Aharon Robbins
[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]
It depends on what we mean by "bug" , in mathematics we wouldn't say "bug"
but "error".

First you have *typos* which is situations where one was meaning to , for
example , write i and they wrote j instead. It may be that mathematics
proofs have more of those than computer programmes because in computer
programmes some will be caught by the compiler like for example if j has
not been declared. But the essential difference is that a typo is harmless in
mathematics , the reader of the proof will either notice and understand what
was intended or even unconsciously replace what is written with what was
intended. But in computer programmes this kind of typo bug can be
catastrophic. Hence in mathematics we wouldn't say that a proof has errors
just because it has a typo , we would say it has a typo.

For other errors , logic errors , I would guess that computer programmes will
have more errors/bugs than mathematical proofs. This assertion is based on
personal experience and instinct to some extent , having written both many
programmes and many proofs. But I will attempt to give some of what I think
are reasons :

1:
In mathematics you work with a formalism right from the start but in computer
programmes often one has an intuitive understanding of what the program is
meant to achieve and a working programme may be considered to have a bug when
its behaviour turns out not to match closely enough that intuitive
understanding. The difference in the level of formalism is not just about
the programmes but about the programming languages too. In standards of
programming languages I've read , the language is not described in precise
mathematical detail. The grammar usually is but not other parts of the language.

For example , from the C standard :
The result of the binary + operator is the sum of the operands.

Even just for integer arguments , this is nowhere a complete definition but
to piece together a complete definition one must combine information
scattered in faraway parts of the standard. This would never occur in a
mathematics books or paper , the complete definition of a function or
operator or whatever (including the precise set for which it is defined)
would appear in one place. This sort of thing where one has to piece
together information from many parts of the document to arrive at a complete
definition , seems to be common in computer language standards and I have
no idea why it happens , I don't think anything about specifying a programming
language forces things to be done like this. But the result is that programmers
will often work with a mix of precise specification and intuitive understanding
of the language they're using.

2:
In mathematics a correct proof *is* the goal but in computer programming some
set of behaviours is the goal and the set of behaviours may not even be the
ultimate goal but the desire to make a profit is. For example

en.wikipedia.org/wiki/Pac-Man :
Due to an integer overflow, the 256th level loads improperly, rendering
it impossible to complete.

But this didn't prevent the game from being highly successful. "Donkey Kong"
also had a bug which prevented progress from some point onwards.

What is the goal also influences the kind of people who specialise in computer
programming vs those who write mathematical proofs. My impression is that there
exist computer programmers who are of the "trial and error" type who write some
code which seems to approximate what they have in mind , test it , write some
more code and so forth. Eventually they arrive at something which seems to
match their mental model and they consider themselves done. They wouldn't have
any interest and possibly ability to write a proof of correctness. As a practical
matter , they may be perfectly productive and write code which works for the
desired cases.

3:
There is a kind of bug/error for which there's no mathematical analogue :

From "Expert C Programming" by Peter van der Linden :

One release of an ANSI C compiler had an interesting bug. The symbol
table was accessed by a hash function that computed a likely place from
which to start a serial search. The computation was copiously commented,
even describing the book the algorithm came from. Unfortunately, the
programmer omitted to close the comment! The entire hash initial value
calculation thus fell inside the continuing comment,


[...]

The entire calculation of the initial hash value was omitted, so the
table was always searched serially from the zeroth element! As a result,
symbol table lookup (a very frequent operation in a compiler) was much
slower than it should have been. This was never found during testing
because it only affected the speed of a lookup, not the result. This is
why some compilers complain if they notice an opening comment in a
comment string. The error was eventually found in the course of looking
for a different bug. Inserting the closing comment resulted in an
immediate compilation speedup of 15%!

[The history of formal description of programming languages is not
encouraging. Back in the 1970s the ANSI PL/I standard was defined in
terms of formal operations on trees, but even so it had bugs/errors.
Also, per your comment on addition, in a lot of cases the practical
definition turns into whatever the underlying machine's instruction
does. Until recently the C standard was deliberately unclear about
whether arithmetic was ones- or twos-complement. -John]
Keith Thompson
2023-02-06 00:14:57 UTC
Permalink
[...]
Post by Spiros Bousbouras
[The history of formal description of programming languages is not
encouraging. Back in the 1970s the ANSI PL/I standard was defined in
terms of formal operations on trees, but even so it had bugs/errors.
Also, per your comment on addition, in a lot of cases the practical
definition turns into whatever the underlying machine's instruction
does. Until recently the C standard was deliberately unclear about
whether arithmetic was ones- or twos-complement. -John]
The C standard still doesn't mandate two's-complement.

The 1990 edition of the C standard was vague about integer
representations, though it did impose some requirements. It required
a "pure binary" representation, and it required, for example,
a value that is within the range of both int and unsigned int to
have the same representation in both. That excluded some of the
more exotic possibilites. (It mentioned 2's complement a few times,
but only as an example.)

The 1999 standard explicitly required one of three representations:
sign and magnitude, two's complement, or one's complement. It also
explicitly permitted padding bits (bits that don't contribute
to the value, and that can either be ignored or create trap
representations).

The 2011 standard corrected the spelling of "ones' complement",
but otherwise didn't change anything significant. (The 2017 edition
was a minor update.)

The upcoming 2023 standard mandates two's complement. And it
requires INT_MIN to be exactly -INT_MAX-1; previously INT_MIN could
be equal to -INT_MAX, and -INT_MAX-1 could be a trap representation.
It still permits padding bits and trap representations.

Note that twos's complement *representation* doesn't imply two's
complement *behavior* on overflow. Signed integer overflow still
has undefined behavior.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
gah4
2023-02-06 21:26:00 UTC
Permalink
On Sunday, February 5, 2023 at 6:01:04 PM UTC-8, Keith Thompson wrote:

(snip)
The upcoming 2023 standard mandates two's complement. And it
requires INT_MIN to be exactly -INT_MAX-1; previously INT_MIN could
be equal to -INT_MAX, and -INT_MAX-1 could be a trap representation.
It still permits padding bits and trap representations.
Too bad for those CDC computers, and Unisys computers.
Last I know of sign-magnitude is the IBM 7090 and 7094.
Note that twos's complement *representation* doesn't imply two's
complement *behavior* on overflow. Signed integer overflow still
has undefined behavior.
Overflow behavior mostly depends on the hardware.

Many computers, such as the IBM S/360 and successors, have a bit that
enables or disables the fixed point overflow exception. (For IBM, it
also applies to decimal overflow.)

For IA32, that is x86, there is the INTO instruction, which is put
after any instruction that could generate overflow, and causes the
interrupt if the overflow bit is set. Compilers would have to generate
that instruction.

It seems, though, that has gone away in x86-64 mode.

I don't know that there is any replacement, other than a conditional
branch based on the overflow flag.

Fortran programmers rely on fixed point overflow, as they have been
doing for 60 years, and don't have unsigned.

(There are two routines in TeX that Knuth suggests replacing with
assembly code. Those do multiply and divide, with 32 bit fixed point
values, 16 bits after the binary point. Pascal has no option for
avoiding the overflow trap, and it takes a lot of Pascal code to do
the multiply and divide!)
Hans-Peter Diettrich
2023-02-07 13:31:26 UTC
Permalink
Post by gah4
Too bad for those CDC computers, and Unisys computers.
Last I know of sign-magnitude is the IBM 7090 and 7094.
AFAIK use IEEE-754 floating point numbers still sign-magnitude
representation.
Then the same representation of integral numbers may have advantages in
computations.

DoDi
[I presume the sign-magnitude is to enable the hidden bit trick,
which doesn't apply in unscaled integers. -John]
gah4
2023-02-08 09:10:49 UTC
Permalink
Post by Hans-Peter Diettrich
Post by gah4
Too bad for those CDC computers, and Unisys computers.
Last I know of sign-magnitude is the IBM 7090 and 7094.
AFAIK use IEEE-754 floating point numbers still sign-magnitude
representation.
Then the same representation of integral numbers may have advantages in
computations.
[I presume the sign-magnitude is to enable the hidden bit trick,
which doesn't apply in unscaled integers. -John]
Yes, I meant integer representation.

Well, I have been wondering for years when we get a C compiler
for the 7090 so we can test out sign-magnitude integers.

I think the 7090 does 16 bit integers, at least that is what
its Fortran compilers did, stored in 36 bit words.

As for floating point, the PDP-10 uses a two's complement
floating point format. It does two's complement on the
whole 36 bit word. The result is that fixed point compare
instructions work on floating point values.
[The 704x/709x series did 36 bit sign-magnitude arithmetic. Fortran
integers were limited to 15 bits plus a sign, probably because that
was the size of addresses, and they expected integer arithmetic to
be used only for counting and subscripts. In 709 Fortran II they
expanded them to 17 bits, in 7090 Fortran IV they were finally a
full word. -John]
gah4
2023-02-09 08:26:11 UTC
Permalink
On Wednesday, February 8, 2023 at 8:48:23 AM UTC-8, gah4 wrote:

(snip)
Post by gah4
Well, I have been wondering for years when we get a C compiler
for the 7090 so we can test out sign-magnitude integers.
I think the 7090 does 16 bit integers, at least that is what
its Fortran compilers did, stored in 36 bit words.
(snip)
Post by gah4
[The 704x/709x series did 36 bit sign-magnitude arithmetic. Fortran
integers were limited to 15 bits plus a sign, probably because that
was the size of addresses, and they expected integer arithmetic to
be used only for counting and subscripts. In 709 Fortran II they
expanded them to 17 bits, in 7090 Fortran IV they were finally a
full word. -John]
OK, so 7090 C can use all 36 bits. When we get one.

I just remembered that the S/360 emulation to develop
OS/360 was done on the 7090. 36 bits would help!

It was the 15 bit integers on the 704 that gave us five digit
statement numbers in Fortran, originally 1 to 32767, and
(not much) later extended to 99999.

And over 60 years later, we still have 99999.

But also, the 704 Fortran, and I believe still the 7090,
indexes arrays from the end of memory toward the beginning.
[It did because for reasons I have never been able to figure out,
the 70x series subtracted rather than added the contents of
an index register to get the effective address. -John]
gah4
2023-02-11 08:01:33 UTC
Permalink
(snip, I wrote)
Post by gah4
But also, the 704 Fortran, and I believe still the 7090,
indexes arrays from the end of memory toward the beginning.
[It did because for reasons I have never been able to figure out,
the 70x series subtracted rather than added the contents of
an index register to get the effective address. -John]
I suppose so, but it was also convenient. Compilers (or linkers)
generate code starting from the bottom of memory, and allocate
variables starting from the top. (Different for different sized
machines.) Since Fortran COMMON can be different length, or at least
blank COMMON in later versions, allocating it from the end of memory
works well.

You can extend common from one subroutine to the next, or with
chaining, to another whole program.

So, was Fortran designed around the hardware,
to allocate memory from the end?

Or maybe just lucky.
[Fortran was definitely designed around the 704. I have done a lot of
looking to try and find out where the subtracted index registers came
from, and while there are a lot of guesses, there is nothing written
down. The three index registers were 1,2, and 4, and if you specified
more than one, it OR'ed them together and subtracted the result, which
was really strange. There were a lot of other machines with index
registers but none I know of that subtracted or OR'ed. I have also
been unable to tell whether the OR'ing was deliberate or just a result
of minimizing the number of tubes that they then documented. It was
likely useless since the 7094 had 7 index registers and a flag in
case you wanted the old behavior. -John]
Dennis Boone
2023-02-12 04:37:07 UTC
Permalink
The three index registers were 1,2, and 4, and if you specified more
than one, it OR'ed them together and subtracted the result, which was
really strange. There were a lot of other machines with index registers
but none I know of that subtracted or OR'ed. I have also been unable to
tell whether the OR'ing was deliberate or just a result of minimizing
the number of tubes that they then documented.
Not having to have the circuitry to make sure multiple registers
were never gated onto the internal bus seems like a likely reason.

But one _could_ also conceivably use multiple index registers for
multiple dimensions, with careful allocation.

De
[As I said, I've found lots of guesses but no documents. Multiple
dimensions only work if each dimension is a power of two which,
considering the 704's tiny memory, seems unlikely. My guess is the
same as yours, fewer tubes, but it's just a guess. -John]
Anton Ertl
2023-02-08 10:19:54 UTC
Permalink
Post by Hans-Peter Diettrich
AFAIK use IEEE-754 floating point numbers still sign-magnitude
representation.
Then the same representation of integral numbers may have advantages in
computations.
Such as? Anyway, whatever these advantages may be, they have not been
enough to prevent the extermination of sign-magnitude integers.
Post by Hans-Peter Diettrich
[I presume the sign-magnitude is to enable the hidden bit trick,
which doesn't apply in unscaled integers. -John]
With a ones-complement or two's-complement mantissa the hidden bit
would just have the same sign as the sign bit, so this trick is not
tied to sign-magnitude representation.

Some years ago someone sketched a two's-complement representation for
FP (that also includes the exponent), but I did not quite get the
details. Anyway, I expect that whatever the advantages of that
representation are, they are not enough to justify the transition
cost.

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[PDP-6/10 floating point was two's complement. As someone else recently noted, that
meant they could use fixed point compare instructions. -John]
Hans-Peter Diettrich
2023-02-10 18:11:31 UTC
Permalink
Post by Anton Ertl
With a ones-complement or two's-complement mantissa the hidden bit
would just have the same sign as the sign bit, so this trick is not
tied to sign-magnitude representation.
Please explain the provenience or purpose of that hidden bit with
integral numbers. How can integral values be *normalized* so that a
previously required bit can be hidden? Sign extension to a higher number
of bits does not increase the value or accuracy of an integral number.

DoDi
[He said "mantissa", so it's floating point. I've certainly seen scaled
integer arithmetic, but normalized integers other than +/- zero in systems
with signed zeros seems unlikely. -John]
gah4
2023-02-11 07:47:40 UTC
Permalink
On Friday, February 10, 2023 at 10:18:49 AM UTC-8, Hans-Peter Diettrich wrote:

(snip)
Post by Hans-Peter Diettrich
Please explain the provenience or purpose of that hidden bit with
integral numbers. How can integral values be *normalized* so that a
previously required bit can be hidden? Sign extension to a higher number
of bits does not increase the value or accuracy of an integral number.
[He said "mantissa", so it's floating point. I've certainly seen scaled
integer arithmetic, but normalized integers other than +/- zero in systems
with signed zeros seems unlikely. -John]
Normalized binary floating point, with hidden one, is pretty common.
I knew IBM S/360 floating point for some years before learning about
those, and it seemed surprising at the time.

As for integers, though, there are some processors with a floating
point format that does not left normalize values.

Some CDC processors, if the value can be shifted, normalized,
as an integer value without losing bits on either end, choose that.
Even more, the exponent is zero for that case. I think some
Burroughs processors also do that.

The result of doing that is that, for values in the appropriate range,
the floating point instructions work for integer values. No instructions
are needed to convert (in range) integers to floating point.

There is so much fun history to the different floating point
formats used over the years. Now almost forgotten.
[I am not aware of any hidden bit formats before IEEE but the 704
manual noted that normalized mantissas always have a 1 in the high
bit so it wasn't a big leap. -John]
Anton Ertl
2023-02-11 22:34:25 UTC
Permalink
Post by gah4
[I am not aware of any hidden bit formats before IEEE
The VAX FP formats have a hidden bit
<https://nssdc.gsfc.nasa.gov/nssdc/formats/VAXFloatingPoint.htm>, and
AFAIK VAX F and G are pretty close to IEEE binary32 and binary64.

According to
<https://home.kpn.nl/jhm.bonten/computers/bitsandbytes/wordsizes/hidbit.htm>
already Zuse used a hidden bit in FP numbers on his machines (1940s
and 1950s).

--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[Oh, right, I forgot about Zuse. He invented a lot of stuff that other
people reinvented later. Squinting at my VAX architecture handbook, the
formats have the same layout as IEEE but the exponents are excess 128 and
1024 rather than 127 and 1023 and there's no infinities or denormals. -John]
gah4
2023-02-12 06:48:41 UTC
Permalink
(our moderator wrote)
[Oh, right, I forgot about Zuse. He invented a lot of stuff that other
people reinvented later. Squinting at my VAX architecture handbook, the
formats have the same layout as IEEE but the exponents are excess 128 and
1024 rather than 127 and 1023 and there's no infinities or denormals. -John]
I always forget the difference between them.

VAX has the binary point to the left of the hidden one, and IEEE to the right.
So that makes up for the one difference in exponent bias.

But no infinity and NaN means that the actual exponent can be one higher.

I think that is the way it works.

I never had problems reading S/360 format from hex dumps, where
the exponent is in whole hexdecimal digits, and no hidden one.

Hans-Peter Diettrich
2023-02-08 14:24:55 UTC
Permalink
Post by Hans-Peter Diettrich
Post by gah4
Too bad for those CDC computers, and Unisys computers.
Last I know of sign-magnitude is the IBM 7090 and 7094.
AFAIK use IEEE-754 floating point numbers still sign-magnitude
representation.
Then the same representation of integral numbers may have advantages in
computations.
DoDi
[I presume the sign-magnitude is to enable the hidden bit trick,
which doesn't apply in unscaled integers. -John]
That's correct, the inprecise representation of FP numbers allows for
such tricks. The hidden bit trick can be used again with the FP
exponents, as I outlined in my Dynamic Floating Point Exponents proposal
<https://figshare.com/articles/preprint/Dynamic_Exponents_for_Floating_Point_Numbers-2022-04-07_pdf/19548187>.

DoDi
gah4
2023-02-09 08:37:09 UTC
Permalink
On Wednesday, February 8, 2023 at 8:50:19 AM UTC-8, Hans-Peter Diettrich wrote:

(snip)
Post by Hans-Peter Diettrich
That's correct, the inprecise representation of FP numbers allows for
such tricks. The hidden bit trick can be used again with the FP
exponents, as I outlined in my Dynamic Floating Point Exponents proposal
To continue discussion about OoO and the 360/91, S/360 specifies
floating point divide with a truncated quotient. The 91 uses a
Newton-Raphson divide algorithm, using its high-speed multiplier,
to generate a rounded quotient.

Along with imprecise interrupts, that is on the list of incompatibilities
with other S/360 models.

As for hidden bit, S/360 uses base 16 floating point, so no hidden bit.
[I actually programmed a /91 in Fortran, so, yeah. -John]
gah4
2023-02-01 06:59:17 UTC
Permalink
On Sunday, January 29, 2023 at 8:58:58 AM UTC-8, Roger L Costello wrote:

(snip)
Post by Roger L Costello
Sally Smith creates Programming_Language_2. Sally wants to leverage the
1. Create a compiler front-end for Programming_Language_2 which compiles
instances to Intermediate_Code_1.
2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.
The complication with this question is the definition of intermediate language.

Some compilers have a front end, middle end, and back end, with some form of
intermediate code between each. They might be more or less general.

Depending on the compiler writer (or writers), there may be thought and time
to make them more general, allowing for other languages.

But okay, choice 2, especially if the language 1 has a standard, can be useful.
(I was almost going to shorten "Programming_Language_1" to PL-1, but
then realized that some might read that wrong.)

That allows it to be used with other compilers, even ones not written yet.
On the other hand, it is restricted by the features of that language, which
sometimes make things complicated.

In the case of choice 1, you might be able to add new features to the
intermediate language, especially if you can join with its project.

Usually, though, there are some features that don't translate to
language 1. Or translate in an extremely complicated way.

Usually the translated language is unreadable to most people who
actually know the language.

Some examples. TeX was written in Pascal. Well, not really.
It was written in Web, a language and preprocessor that generates
Pascal code to be compiled. At some point, it was desired to have
it in C instead. A program was written to translate much of
Pascal into C, but some things were not so easy. So instead,
the features of the Web preprocessor were used, to convert to
a Pascal-like language, easier to convert to C. This was made
possible by the macro facilities of Web, and with its change
file feature.

Another one is f2c, which translates Fortran to C, but uses
special library routines and otherwise not so readable output.

Could we have a standard intermediate language, with all
features needed, or that ever will be needed?
[The answer to that last question is no. It's a very old bad idea,
with the first failed attempt UNCOL in 1958. -John]
gah4
2023-02-03 00:24:11 UTC
Permalink
(snip, I wrote)
Post by gah4
Could we have a standard intermediate language, with all
features needed, or that ever will be needed?
[The answer to that last question is no. It's a very old bad idea,
with the first failed attempt UNCOL in 1958. -John]
I have wondered, though, about a standardized intermediate
for a processor family. One could write compilers to generate it,
and then updated processors come along with updated code
generators. Or even distribute intermediate code, and the
installer generates the right version for the processor.

This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.
Since the whole idea is that the processor depends on the
code generator doing things in the right order. That is, out
of order execution, but determined at compile time. Failure
to do that meant failure for the whole idea.

[Someone comes up with an intermediate language that works for a few
source languages and a few targets, and usually publishes a paper
about his breakthrough. Then people try to add more front ends and
back ends, the intermediate language gets impossibly complex and
buggy, and the project is quietly forgotten. I'd think the back end
problem is a lot easier now than it was in 1958 since everything is
twos complement arithmetic in 8-bit bytes, but the front end is still
daunting.
If you don't push it too far it's certainly possible to do a lot of
work in a largely machine-independent way as LLVM does. -John]
Anton Ertl
2023-02-03 11:44:03 UTC
Permalink
Post by gah4
I have wondered, though, about a standardized intermediate
for a processor family. One could write compilers to generate it,
and then updated processors come along with updated code
generators. Or even distribute intermediate code, and the
installer generates the right version for the processor.
This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.
I dispute the latter claim. My take is that IA-64 failed because the
original assumption that in-order performance would exceed OoO
performance was wrong. OoO processors surpassed in-order CPUs; they
managed to get higher clock rates (my guess is that this is due to
them having smaller feedback loops) and they benefit from better
branch prediction, which extends to 512-instruction reorder buffers on
recent Intel CPUs, far beyond what compilers can achieve on IA-64.
The death knell for IA-64 competetiveness was the introduction of SIMD
instruction set extensions which made OoO CPUs surpass IA-64 even in
those vectorizable codes where IA-64 had been competetive.
Post by gah4
Since the whole idea is that the processor depends on the
code generator doing things in the right order. That is, out
of order execution, but determined at compile time. Failure
to do that meant failure for the whole idea.
But essentially all sold IA-64 CPUs were variations of the McKinley
microarchitecture as far as performance characteristics were
concerned, especially during the time when IA-64 was still perceived
as relevant. The next microarchitecture Poulson was only released in
2012 when IA-64 had already lost.
Post by gah4
[Someone comes up with an intermediate language that works for a few
source languages and a few targets, and usually publishes a paper
about his breakthrough. Then people try to add more front ends and
back ends, the intermediate language gets impossibly complex and
buggy, and the project is quietly forgotten. I'd think the back end
problem is a lot easier now than it was in 1958 since everything is
twos complement arithmetic in 8-bit bytes
Yes. Computer architecture converges. 20 years ago we still had to
worry about alignment and byte ordering, nowadays alignment has been
settled in general-purpose CPUs (no alignment restrictions), and byte
ordering is mostly settled to little-endian (exception s390/s390x).
Post by gah4
If you don't push it too far it's certainly possible to do a lot of
work in a largely machine-independent way as LLVM does. -John]
LLVM mostly supports what the hardware supports, but there is at least
one exception: LLVM IR divides the code into functions, that you can
only call, with LLVM handling the calling convention. When someone
needs to deviate from the calling convention, like the Haskel/C--
people, LLVM provides some possibilities, but from what I read, they
had a hard time pulling it through.

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[I'm with you on IA-64 and VLIW. I knew the Multiflow people pretty
well and it is evident in retrospect that while it was a good fit
with what you could build in the 1980s, hardware soon reached the
point where it could do better what VLIW compilers had done. -John]
gah4
2023-02-04 03:13:17 UTC
Permalink
On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote:

(snip, I wrote)
Post by gah4
This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.
I dispute the latter claim. My take is that IA-64 failed because the
original assumption that in-order performance would exceed OoO
performance was wrong. OoO processors surpassed in-order CPUs; they
managed to get higher clock rates (my guess is that this is due to
them having smaller feedback loops) and they benefit from better
branch prediction, which extends to 512-instruction reorder buffers on
recent Intel CPUs, far beyond what compilers can achieve on IA-64.
The death knell for IA-64 competetiveness was the introduction of SIMD
instruction set extensions which made OoO CPUs surpass IA-64 even in
those vectorizable codes where IA-64 had been competetive.
I got interested in OoO in the days of the IBM 360/91, which is early
in the line of OoO processors. Among others, the 91 has imprecise
interrupts, where the stored address is not the instruction after the
interrupt cause.

But okay, the biggest failure of Itanium is that it was two years or
so behind schedule when it came out. And partly, as well as I
remember, is the need to implement x86 instructions, too.
Post by gah4
Since the whole idea is that the processor depends on the
code generator doing things in the right order. That is, out
of order execution, but determined at compile time. Failure
to do that meant failure for the whole idea.
But essentially all sold IA-64 CPUs were variations of the McKinley
microarchitecture as far as performance characteristics were
concerned, especially during the time when IA-64 was still perceived
as relevant. The next microarchitecture Poulson was only released in
2012 when IA-64 had already lost.
But is it the whole idea of compile-time instruction scheduling the
cause of the failure, or just the way they did it?

The 360/91 had some interesting problems. One is that it had 16 way
interleaved memory with a cycle time of 13 processor cycles, and the
goal of one instruction per clock cycle. That means it is dependent on
memory address ordering, which is hard to know at compile time.

The slightly later 360/85, without the fancy OoO processor, but with
cache memory (the first machine with cache!) was about as fast
on real problems.

Otherwise, the 91 has a limited number of reservation stations,
limiting how for OoO it can go. All OoO processors have a limit
to how far they can go. But the compiler does not have that limit.

Now, since transistors are cheap now, and one can throw a large
number into reorder buffers and such, one can build really deep
pipelines.

But the reason for bringing this up, is that if Intel had a defined
intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.

Even more, design for it could have been done in parallel with the
processor, making both work well together.

Reminds me of the 8087 virtual register stack. The 8087 has
eight registers, but they were supposed to be a virtual stack.
They would be copied to/from memory on stack overflow
or underflow. But no-one tried to write the interrupt routine
until after the hardware was made, and it turns out not to
be possible. I never knew why that wasn't fixed for the 287
or 387, though.
[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns. I doubt that has
changed. And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]
gah4
2023-02-04 10:55:01 UTC
Permalink
On Friday, February 3, 2023 at 7:51:41 PM UTC-8, gah4 wrote:

(snip on Itanium and OoO execution)
The 360/91 had some interesting problems. One is that it had 16 way
interleaved memory with a cycle time of 13 processor cycles, and the
goal of one instruction per clock cycle. That means it is dependent on
memory address ordering, which is hard to know at compile time.
(snip)
But the reason for bringing this up, is that if Intel had a defined
intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.
(snip, our moderator wrote)
[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns. I doubt that has
changed. And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]
So, 50 years ago we had the 360/91, and some CDC competition.

About 45 years ago, we had the Cray-1 and successor vector processors.

So, yes, SIMD processors.

Then 30 years ago, Fortran added the FORALL statement, to make writing
vector code easier. Except that it doesn't.

FORALL, and array assignment statements, always operate such that
the effect is as if the whole right-hand side is evaluated before the left
side is changed. Unless the whole thing fits in a vector register,
the compiler is still stuck.

And just about that time, vector processor go out of style.

(There is now DO CONCURRENT, which might be better.)

Much matrix code goes sequentially through memory, and programmers
know that. (Or they are supposed to know that.)

Early Fortran had the FREQUENCY statement, such that one could
help the compiler know the likely iteration count for DO loops, and
branch probability for branch instructions. Those would help with
static branch predictors, but are long gone now.

It would seem, though, that a statement telling the compiler the
expected memory access pattern would help.

And the Cray-1, like the 360/91, has 16-way interleaved
memory, 64 bits wide.
[Legend says that in one of the early Fortran compilers, the FREQUENCY
statement was accidentally implemented backward and nobody noticed. We
have found over and over that programmers' intuitions about the way
their programs perform are quite poor and we're much better off saying
what we mean and letting the compiler figure out how to do low level
optimizations. -John]
Anton Ertl
2023-02-07 08:35:00 UTC
Permalink
Post by gah4
(snip, I wrote)
Post by gah4
This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.
I dispute the latter claim. My take is that IA-64 failed because the
original assumption that in-order performance would exceed OoO
performance was wrong. OoO processors surpassed in-order CPUs; they
managed to get higher clock rates (my guess is that this is due to
them having smaller feedback loops) and they benefit from better
branch prediction, which extends to 512-instruction reorder buffers on
recent Intel CPUs, far beyond what compilers can achieve on IA-64.
The death knell for IA-64 competetiveness was the introduction of SIMD
instruction set extensions which made OoO CPUs surpass IA-64 even in
those vectorizable codes where IA-64 had been competetive.
...
Post by gah4
But okay, the biggest failure of Itanium is that it was two years or
so behind schedule when it came out.
If it had had superior performance when McKinley came out in 2002,
that would have been just a minor speed bump on the way to success.
Post by gah4
And partly, as well as I
remember, is the need to implement x86 instructions, too.
Certainly, backwards compatibility is paramount in the markets that it
was intended for. And the 386 compatibility also had to be close to
competetive. But it wasn't.
Post by gah4
But is it the whole idea of compile-time instruction scheduling the
cause of the failure, or just the way they did it?
It seems to me that the idea that in-order execution combined with
architectural features for reducing dependencies and with compiler
scheduling turned out to be inferior to out-of-order execution.
Moreover, in those areas where the compiler works well (numerical
code), it also works ok for using SIMD instructions
(auto-vectorization) which could be added relatively cheaply to the
hardware.
Post by gah4
All OoO processors have a limit
to how far they can go. But the compiler does not have that limit.
And the compiler can make more sophisticated scheduling decisions
based on the critical path, while the hardware scheduler picks the
oldest ready instruction. These were the ideas that seduced Intel,
HP, and Transmeta to invest huge amounts of money into EPIC ideas.

But the compiler has other limits. It cannot schedule across indirect
calls (used in object-oriented dispatch), across compilation unit
boundaries (in particular, calls to and returns from shared
libraries). Another important limit is the predictability of
branches. Static branch prediction using profiles has ~10%
mispredictions, while (hardware) dynamic branch prediction has a much
lower misprediction rate (I remember numbers like 3% (for the same
benchmarks that have 10% mispredictions with static branch prediction)
in papers from the last century; I expect that this has improved even
more in the meantime. If the compiler mispredicts, it will schedule
instructions from the wrong path, instructions that will be useless
for execution.

In the end a compiler typically can schedule across a few dozen
instructions, while hardware can schedule across a few hundred.
Compiler scheduling works well for simple loops, and that's where
IA-64 shone, but only doing loops well is not good enough for
general-purpose software.
Post by gah4
Now, since transistors are cheap now, and one can throw a large
number into reorder buffers and such, one can build really deep
pipelines.
It's interesting that Intel managed to produce their first OoO CPU
(the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
first Itanium) at 25.4M transistors was too large for the 250nm and
they had to switch to 180nm (contributing to the delays). So, while
the theory was that the EPIC principle would reduce the hardware
complexity (to allow adding more functional units for increased
performance), in Itanium practice the hardware was more complex, and
the performance advantages did not appear.
Post by gah4
But the reason for bringing this up, is that if Intel had a defined
intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.
Something like this happened roughly at the same time with LLVM.
There were other initiatives, but LLVM was the one that succeeded.

There was the Open Research Compiler for IA-64 from Intel and the
Chinese Academy of Sciences.

SGI released their compiler targeted at IA-64 as Open64.
Post by gah4
Even more, design for it could have been done in parallel with the
processor, making both work well together.
Intel, HP, and others worked on compilers in parallel to the hardware
work. It's just that the result did not perform as well for
general-purpose code as OoO processors with conventional compilers.
Post by gah4
[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns.
Multiflow and Cydrome built computers for numerical computations (aka
HPC aka (mini-)supercomputing). These tend to spend a lot of time in
simple loops with high iteration counts and have statically
predictable branches. They found compiler techniques like trace
scheduling (Multiflow) that work well for code with predictable
branches, and modulo schedling (Cydrome) that work well for simple
loops with high iteration counts. The IA-64 and Transmeta architects
wanted to extend these successes to general-purpose computing, but it
did not work out.

Concerning memory access patterns: While they are not particularly
predictable in general-purpose code, mostd general-purpose code
benefits quite a lot from caches (more than numerical code), so I
don't think that this was a big problem for IA-64.

Some people mention varying latencies due to caches as a problem, but
the choice of a small L1 cache (16KB compared to 64KB for the earlier
21264 and K7 CPUs) for McKinley indicates that average latency was
more of a problem for IA-64 than varying latencies.
Post by gah4
And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]
Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
programs where IA-64 outdid the mainstream.

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
gah4
2023-02-08 09:04:38 UTC
Permalink
On Tuesday, February 7, 2023 at 6:24:44 PM UTC-8, Anton Ertl wrote:

(snip)
Post by Anton Ertl
Post by gah4
All OoO processors have a limit
to how far they can go. But the compiler does not have that limit.
And the compiler can make more sophisticated scheduling decisions
based on the critical path, while the hardware scheduler picks the
oldest ready instruction. These were the ideas that seduced Intel,
HP, and Transmeta to invest huge amounts of money into EPIC ideas.
But the compiler has other limits. It cannot schedule across indirect
calls (used in object-oriented dispatch), across compilation unit
boundaries (in particular, calls to and returns from shared
libraries).
OK, maybe we are getting closer.
Many processors now have speculative execution.
If you don't know what else to do, execute some instructions
just in case, but don't change actual memory or registers yet.

And Itanium doesn't do that. I don't see, though, that it isn't
possible to have EPIC and also speculative execution.
Post by Anton Ertl
Another important limit is the predictability of
branches. Static branch prediction using profiles has ~10%
mispredictions, while (hardware) dynamic branch prediction has a much
lower misprediction rate (I remember numbers like 3% (for the same
benchmarks that have 10% mispredictions with static branch prediction)
in papers from the last century; I expect that this has improved even
more in the meantime. If the compiler mispredicts, it will schedule
instructions from the wrong path, instructions that will be useless
for execution.
OK, but we just discussed speculative execution. So you can execute
two paths or maybe three.
Post by Anton Ertl
In the end a compiler typically can schedule across a few dozen
instructions, while hardware can schedule across a few hundred.
Compiler scheduling works well for simple loops, and that's where
IA-64 shone, but only doing loops well is not good enough for
general-purpose software.
OK, loops were important for the 360/91, meant for floating point
number crunching. (Only floating point is OoO.)
And small inner loops are usual in those programs. Among those
is loop mode, where it keeps the instructions, and doesn't have
to keep fetching them.

But also, the 360/91 was designed to run code written for any
S/360 model. So, no special ordering and even self-modifying
code. Even more, S/360 only has four floating point registers,
so register renaming was important. Instructions had to be
ordered for use of registers, where out-of-order and register
renaming could fix that.

Okay, so I am not saying that EPIC has to get all the ordering,
but only close enough. So, ones now can keep hundreds of
instructions in execution at the same time?
Post by Anton Ertl
Post by gah4
Now, since transistors are cheap now, and one can throw a large
number into reorder buffers and such, one can build really deep
pipelines.
It's interesting that Intel managed to produce their first OoO CPU
(the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
first Itanium) at 25.4M transistors was too large for the 250nm and
they had to switch to 180nm (contributing to the delays). So, while
the theory was that the EPIC principle would reduce the hardware
complexity (to allow adding more functional units for increased
performance), in Itanium practice the hardware was more complex, and
the performance advantages did not appear.
I remember lots of stories about how PPro didn't do well with the 16 bit
code still in much of DOS and Windows. Enough that it was slower
than Pentium. I am not sure by now how much OO the PPro does.
Post by Anton Ertl
Post by gah4
But the reason for bringing this up, is that if Intel had a defined
intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.
Something like this happened roughly at the same time with LLVM.
There were other initiatives, but LLVM was the one that succeeded.
There was the Open Research Compiler for IA-64 from Intel and the
Chinese Academy of Sciences.
SGI released their compiler targeted at IA-64 as Open64.
Post by gah4
Even more, design for it could have been done in parallel with the
processor, making both work well together.
Intel, HP, and others worked on compilers in parallel to the hardware
work. It's just that the result did not perform as well for
general-purpose code as OoO processors with conventional compilers.
Post by gah4
[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns.
Multiflow and Cydrome built computers for numerical computations (aka
HPC aka (mini-)supercomputing). These tend to spend a lot of time in
simple loops with high iteration counts and have statically
predictable branches. They found compiler techniques like trace
scheduling (Multiflow) that work well for code with predictable
branches, and modulo schedling (Cydrome) that work well for simple
loops with high iteration counts. The IA-64 and Transmeta architects
wanted to extend these successes to general-purpose computing, but it
did not work out.
I sometimes work on computational physics problems. The ones with
tight loops of floating point operations. I believe that some Itanium
systems went into a large cluster somewhere. I got an RX2600
for $100 some years ago, when there were many on eBay.
(You could even buy three at a time.)

But yes, I suspect for web servers, that they aren't especially good.
Post by Anton Ertl
Concerning memory access patterns: While they are not particularly
predictable in general-purpose code, mostd general-purpose code
benefits quite a lot from caches (more than numerical code), so I
don't think that this was a big problem for IA-64.
Some people mention varying latencies due to caches as a problem, but
the choice of a small L1 cache (16KB compared to 64KB for the earlier
21264 and K7 CPUs) for McKinley indicates that average latency was
more of a problem for IA-64 than varying latencies.
Post by gah4
And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]
Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
programs where IA-64 outdid the mainstream.
The Cray series of vector processors seems long gone by now.
Martin Ward
2023-02-02 15:44:17 UTC
Permalink
On 01/02/2023 08:07, Aharon Robbins wrote:> I've never understood this.
Isn't there a chicken and egg problem?
Post by Aharon Robbins
How do we know that the theorem prover is correct and bug free?
A theorem prover generates a proof of the theorem (if it succeeds).
Checking the correctness of a proof is a much simpler task
than finding the proof in the first place and can be carried
out independently by different teams using different methods.
Appel and Haken's proof of the four colour theorem, for example,
involved a significant element of computer checking which was
independently double checked with different programs and computers.
Post by Aharon Robbins
[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]
Mathematicians publish proofs all the time and only a tiny percentage
of published proofs turn out to have errors. Programmers release
programs all the time and a vanishingly small percentage of these
turn out to be free from all bugs. Alan Perlis may not have been able
to think of a reason why this should be the case, but it is,
nevetheless, the case.

--
Martin

Dr Martin Ward | Email: ***@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4
[Computer programs tend to be a lot longer than mathematical proofs. I
realize there are some 500 page proofs, but there are a whole lot of
500 page programs. It is my impression that in proofs, as in progams,
the longer and more complicated they are, the more likely they are to
have bugs. -John]
Aharon Robbins
2023-02-03 08:26:45 UTC
Permalink
Dr. Ward replied to me privately also, but since this went to the group,
I'll say the same thing here that I replied to him.
Post by Martin Ward
Post by Aharon Robbins
I've never understood this. Isn't there a chicken and egg problem?
How do we know that the theorem prover is correct and bug free?
A theorem prover generates a proof of the theorem (if it succeeds).
Checking the correctness of a proof is a much simpler task
than finding the proof in the first place and can be carried
out independently by different teams using different methods.
Appel and Haken's proof of the four colour theorem, for example,
involved a significant element of computer checking which was
independently double checked with different programs and computers.
This tells me what a theorem prover does, and why it's useful. It
does not answer my question: How do we know that the theorem prover
itself is correct and bug free?
Post by Martin Ward
Post by Aharon Robbins
[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]
And this is a different point from my question.
Post by Martin Ward
Mathematicians publish proofs all the time and only a tiny percentage
of published proofs turn out to have errors. Programmers release
programs all the time and a vanishingly small percentage of these
turn out to be free from all bugs. Alan Perlis may not have been able
to think of a reason why this should be the case, but it is,
nevetheless, the case.
I don't argue this. But I think mathematical theorems and their
proofs are different qualitatively from real world large programs.
I do think there's room for mathematical techniques to help
improve software development, but I don't see that done down
in the trenches, and I've been down in the trenches for close
to 40 years.

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
Anton Ertl
2023-02-05 17:59:03 UTC
Permalink
Post by Martin Ward
On 01/02/2023 08:07, Aharon Robbins wrote:> I've never understood this.
Post by Aharon Robbins
[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]
Mathematicians publish proofs all the time and only a tiny percentage
of published proofs turn out to have errors.
Even at face value I would like to see some evidence for this claim.

A comparable statement would be "Computer scientists publish papers
about programs all the time, and only a tiny percentage of these
papers turn out to have errors".

There is a difference between how a mathematical proof is used and how
a program is used.

1) A program is usually fed to a machine for execution. A published
proof is read by a number (>=0) of mathematicians, who fill in a lot
of what the author has left out. If you feed the published proof to
an automatic proof checker (of your choice), how many of the published
proofs need no additions and no changes (bug fixes) before the proof
checker verifies the proof. I guess there is a reason why Wikipedia
has no page on "proof checker", but suggests "proof assistant": "a
software tool to assist with the development of formal proofs by
human-machine collaboration."

2) A program has to satisfy the requirements of its users, while a
published proof is limited to proving a published theorem. One
example of the difference is that undefinedness is totally acceptable
in mathematics, while it is a bug in programs (interestingly, there is
a significant number of compiler writers who take the mathematical
view in what they provide to programmers, but consider that a program
in their programming language that exercises the undefined behaviour
that they provide to programmers to be buggy).

The size argument that our esteemed moderator provided also has to be
considered.

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
Spiros Bousbouras
2023-02-05 19:23:36 UTC
Permalink
On Sun, 05 Feb 2023 17:59:03 GMT
Post by Anton Ertl
2) A program has to satisfy the requirements of its users, while a
published proof is limited to proving a published theorem. One
example of the difference is that undefinedness is totally acceptable
in mathematics, while it is a bug in programs
All programmes have de facto limitations in the input they can process
correctly imposed by the hardware , operating system , etc. If they advertise
that they will detect some kind of invalid input and fail to do so , that's a
bug. If they make no such claims then it boils down to whether one can
"reasonably" expect the programme to detect the invalid input and report it but
what counts as reasonable will generally be a matter of dispute. Note also
that there can be grey areas. For example a numerical analysis programme may
produce for some range of inputs an output which is not 100% correct but
"close enough". But what's close enough depends on many factors.
Post by Anton Ertl
(interestingly, there is
a significant number of compiler writers who take the mathematical
view in what they provide to programmers, but consider that a program
in their programming language that exercises the undefined behaviour
that they provide to programmers to be buggy).
This is a cryptic comment. To the extent that I can guess from knowing
about your pet peeves what you're talking about , I don't think what
you say describes the views of compiler writers.
Loading...