Discussion:
fledgling assembler programmer
(too old to reply)
Alan Beck
2023-03-21 21:40:18 UTC
Permalink
//Hello all,//

Hi,

I have started to learn Assembler out of an old book.

It is ancient (2003) but I don't think 8086 programming has changed
much. But the tools have.

I took assembly language in school but dropped out. Now I want another
go at it.

Would someone be my Mentor and answer a ton of questions that would
dwindle out as time went on?

If it's OK, we could do it here. Or netmail

Books are from a bookstore.


Book 1
Assembly Language for the PC 3rd edition, John Socha and Peter Norton.

Book 2
Assembly Language (step by step) Jeff Duntemann. Too Chatty.

I cannot afford a modern book at this time.

Thats what I picked up from the thrift store.

These books are dated around the time I was taking machine code in
school and I find it interesting now.

I hope someone picks me up.

I am running linux and using DOSemu

Also a modern DEBUG and and a modern Vi

I am also a Ham Radio Operator (45 years)

1:229/426.36

Regards,
Alan Beck
VY2XU
[Please reply directly unless the response is relatted to compilers. -John]
gah4
2023-03-22 00:23:25 UTC
Permalink
Post by Alan Beck
I have started to learn Assembler out of an old book.
(Hopefully enough related to compilers.)

Not so long after I started learning OS/360 Fortran and PL/I, I found
the compiler option for printing out the generated code in sort-of
assembly language. (Not actually assembleable, though.)

About that time, I also had source listings on microfilm of
the OS/360 Fortran library, and some other Fortran callable
assembly programs. And also, the IBM S/370 Principles
of Operation.

With those, and no actual book meant to teach assembly
programming, I figured it out, and started writing my own
programs, though mostly callable from Fortran or PL/I.

Compilers today don't write out the generated code in the same way,
and there aren't so many libraries around to read. And, personally,
8086 is my least favorite to write assembly code in.

Learning C, and thinking about pointers and addresses, is a good start
toward assembly programming.

In any case, I don't think I have any idea how others learn
programming for any language, and especially not for assembly
programming. I used to read IBM reference manuals, cover to cover.
That was mostly high school years. After that, I figured out how to
use them as reference manuals.

Most of my 80x86 assembly programming in the last
20 years is (re)writing this one program:

rdtsc: rdtsc
ret

When called from C, and returning a 64 bit integer, it return the Time
Stamp Counter. (Works for 32 bit code, returning in EDX:EAX. 64 bit is
different.)

C programming works so well, that there are only a few
things you can't do in C, and so need assembly programs.
Thomas Koenig
2023-03-22 06:49:31 UTC
Permalink
Post by gah4
Post by Alan Beck
I have started to learn Assembler out of an old book.
At the risk of stating the blindingly obvious: There is more
than one assembler language, each computer architecture has its
own (with extensions over time, too). There are also sometimes
different syntax variant, for example AT&T vs. Intel.

[...]
Post by gah4
Compilers today don't write out the generated code in the same way,
Quite the opposite.

The standard on UNIXy systems is to write out assemblly language to
a file, which is then further processed with the actual assembler.
Use "-S" to just generate the foo.s file from foo.c.

Plus, you can disassemble object files and programs with "objdump -d".
Post by gah4
and there aren't so many libraries around to read.
Not ones written in assembler. But it is possible to download
the source code to many libraries, for example glibc, and then
examine what it is compiled to.

Another possibility would be to use http://godbolt.org, which shows
you assembler generated for different systems with differnt options.
To really make sense of it for different architectures you are
not familiar with may be difficult, though). Or build clang/LLVM
yourself and set different options for the architecture.
Post by gah4
And, personally,
8086 is my least favorite to write assembly code in.
I like 6502 even less :-)
Post by gah4
Learning C, and thinking about pointers and addresses, is a good start
toward assembly programming.
That, I agree with. And it helps a lot to also look at the
generated code.

[...]
Post by gah4
C programming works so well, that there are only a few
things you can't do in C, and so need assembly programs.
Bringing it back a bit towards compilers: Reading assembler code is
a good way to see where they generate inefficient or (more rarely)
incorrect code. In some special cases, writing in assembler can
bring benefits of a factor of 2 or even 4 over compiler-generated
code, usually when SIMD is involved.

Assembler is a bit like Latin: For most people, there is no need
to speak or write, but one should be able to read it.
gah4
2023-03-22 20:31:41 UTC
Permalink
(snip)
Post by Thomas Koenig
Post by gah4
Compilers today don't write out the generated code in the same way,
Quite the opposite.
The standard on UNIXy systems is to write out assemblly language to
a file, which is then further processed with the actual assembler.
Yes, not the same way.

Well, to be sure that this is about compilers, my favorite complaint
is the lost art of small memory compilers. That is, ones that can
run in kilobytes instead of megabytes.

In any case, the OS/360 compilers don't write out assembly code
that an assembler would recognize. It is meant for people.

Some write it out in two columns to save paper.
Labels don't have to agree with what assemblers recognize.
They don't have to be in the same order that they would be for
an assembler, though OS/360 object programs don't have to be
in order, either.

Having not thought about this for a while, I believe they put
in some comments that help human readers, though likely not
what an assembly programmer would say.

Unixy systems might put in some comments, but mostly don't.
Thomas Koenig
2023-03-23 11:26:50 UTC
Permalink
gah4 <***@u.washington.edu> schrieb:

[...]
Post by gah4
Well, to be sure that this is about compilers, my favorite complaint
is the lost art of small memory compilers. That is, ones that can
run in kilobytes instead of megabytes.
On the Internet, there is a project for almost everything - in this
case Tiny C, which still seems to be under active development. Or
at least there are sill commits at https://repo.or.cz/w/tinycc.git .

However, there is a reason why compilers got so big - there is
always a balance to be struck between comilation speed, compiler
size and optimization.

An extreme example: According to "Abstracting Away the Machine", the
very first FORTRAN compiler was so slow that the size of programs
it could compile was limited by the MTBF of the IBM 704 of around
eight hours.

The balance has shifted over time, because of increasing computing
power and available memory that can be applied to compilation,
and because relatively more people use programs than use compilers
than ever before. So, in today's environment, there is little
incentive for writing small compilers.

Also, languages have become bigger, more expressive, more powerful,
more bloated (take your pick), which also increases the size
of compilers.
gah4
2023-03-24 21:17:44 UTC
Permalink
On Friday, March 24, 2023 at 7:10:00 AM UTC-7, Thomas Koenig wrote:

(snip about the lost art of small memory compilers.)
Post by Thomas Koenig
On the Internet, there is a project for almost everything - in this
case Tiny C, which still seems to be under active development. Or
at least there are sill commits at https://repo.or.cz/w/tinycc.git .
However, there is a reason why compilers got so big - there is
always a balance to be struck between comilation speed, compiler
size and optimization.
When I was writing the above, I was looking at the Program Logic
Manual for the OS/360 Fortran G compiler.
(G means it is supposed to run in 128K.)

Fortran G was not written by IBM, but contracted out. And is not
(mostly) in assembler, but in something called POP. That is, it
is interpreted by the POP interpreter, with POPcode written using
assembler macros. Doing that, for one, allows reusing the code
for other machines, though you still need to rewrite the code
generator. But also, at least likely, it decreases the size of
the compiler. POP instructions are optimized for things that
compiler need to do.

I also had the source to that so many years ago, but not the
manual describing it.
Post by Thomas Koenig
An extreme example: According to "Abstracting Away the Machine", the
very first FORTRAN compiler was so slow that the size of programs
it could compile was limited by the MTBF of the IBM 704 of around
eight hours.
I remember stories about how well its optimizer worked, when
it was believed that they had to compete in code speed with
experienced assembly programmers. I don't remember anything
about how fast it was.
Post by Thomas Koenig
The balance has shifted over time, because of increasing computing
power and available memory that can be applied to compilation,
and because relatively more people use programs than use compilers
than ever before. So, in today's environment, there is little
incentive for writing small compilers.
I first thought about this, when reading about the Hercules project
of an IBM S/370 emulator, and couldn't run gcc in 16MB.
(Well, subtract some for the OS, but it still wouldn't fit.)
Post by Thomas Koenig
Also, languages have become bigger, more expressive, more powerful,
more bloated (take your pick), which also increases the size
of compilers.
OK, the IBM PL/I (F) compiler, for what many consider a bloated
language, is designed to run (maybe not well) in 64K.
At the end of every compilation it tells how much memory was
used, how much available, and how much to keep the symbol table
in memory.
Dennis Boone
2023-03-24 22:51:32 UTC
Permalink
Post by gah4
OK, the IBM PL/I (F) compiler, for what many consider a bloated
language, is designed to run (maybe not well) in 64K.
At the end of every compilation it tells how much memory was
used, how much available, and how much to keep the symbol table
in memory.
It's... 30-some passes, iirc?

De
[Well, phases or overlays but yes, IBM was really good at slicing compilers
into pieces they could overlay. -John]
gah4
2023-03-25 05:44:49 UTC
Permalink
On Friday, March 24, 2023 at 9:13:05 PM UTC-7, Dennis Boone wrote:

(after I wrote)
Post by Dennis Boone
Post by gah4
OK, the IBM PL/I (F) compiler, for what many consider a bloated
language, is designed to run (maybe not well) in 64K.
At the end of every compilation it tells how much memory was
used, how much available, and how much to keep the symbol table
in memory.
It's... 30-some passes, iirc?
[Well, phases or overlays but yes, IBM was really good at slicing compilers
into pieces they could overlay. -John]
It is what IBM calls, I believe, dynamic overlay. Each module specifically
requests others to be loaded into memory. If there is enough memory,
they can stay, otherwise they are removed.

And there are a few disk files to be used, when it is actually
a separate pass. The only one I actually know, is if the preprocessor
is used, it writes a disk file with the preprocessor output.

And as noted, if it is really short on memory, the symbol table
goes out to disk.

Fortran H, on the other hand, uses the overlay system generated
by the linkage editor. When running on virtual storage system, it is
usual to run the compiler through the linkage editor to remove
the overlay structure. (One of the few linkers that knows how
to read its own output.) Normally it is about 300K, without
overlay closer to 450K.
[Never heard of dynamic overlays on S/360. -John]
gah4
2023-03-25 08:27:18 UTC
Permalink
On Saturday, March 25, 2023 at 12:09:30 AM UTC-7, gah4 wrote:

(snip)
It is what IBM calls, I believe, dynamic overlay. Each module specifically
requests others to be loaded into memory. If there is enough memory,
they can stay, otherwise they are removed.
Traditional overlays are generated by the linkage editor, and have
static offsets determined at link time.

PL/I (F) uses OS/360 LINK, LOAD, and DELETE macros to dynamically
load and unload modules. The addresses are not static. IBM says:

"The compiler consists of a number of phases
under the supervision of compiler control
routines. The compiler communicates with
the control program of the operating
system, for input/output and other
services, through the control routines."

All described in:

http://bitsavers.trailing-edge.com/pdf/ibm/360/pli/GY28-6800-5_PL1_F_Program_Logic_Manual_197112.pdf

They do seem to be called phases, but there are both physical and
logical phases, where physical phases are what are more commonly
called phases. There are way more than 100 modules, but I stopped
counting.

(snip)
[Never heard of dynamic overlays on S/360. -John]
It seems not to actually have a name.
Hans-Peter Diettrich
2023-03-25 12:07:57 UTC
Permalink
Post by gah4
Fortran G was not written by IBM, but contracted out. And is not
(mostly) in assembler, but in something called POP. That is, it
is interpreted by the POP interpreter, with POPcode written using
assembler macros. Doing that, for one, allows reusing the code
for other machines, though you still need to rewrite the code
generator. But also, at least likely, it decreases the size of
the compiler. POP instructions are optimized for things that
compiler need to do.
After a look at "open software" I was astonished by the number of
languages and steps involved in writing portable C code. Also updates of
popular programs (Firefox...) are delayed by months on some platforms,
IMO due to missing manpower on the target systems for checks and the
adaptation of "configure". Now I understand why many people prefer
interpreted languages (Java, JavaScript, Python, .NET...) for a
simplification of their software products and spreading.

What's the actual ranking of programming languages? A JetBrains study
does not list any compiled language in their first 7 ranks in 2022. C++
follows on rank 8.

What does that trend mean to a compiler group? Interpreted languages
still need a front-end (parser) and back-end (interpreter), but don't
these tasks differ between languages compiled to hardware or interpretation?

DoDi
George Neuner
2023-03-26 00:54:26 UTC
Permalink
On Sat, 25 Mar 2023 13:07:57 +0100, Hans-Peter Diettrich
Post by Hans-Peter Diettrich
After a look at "open software" I was astonished by the number of
languages and steps involved in writing portable C code. Also updates of
popular programs (Firefox...) are delayed by months on some platforms,
IMO due to missing manpower on the target systems for checks and the
adaptation of "configure". Now I understand why many people prefer
interpreted languages (Java, JavaScript, Python, .NET...) for a
simplification of their software products and spreading.
Actually Python is the /only/ one of those that normally is
interpreted. And the interpreter is so slow the language would be
unusable were it not for the fact that all of its standard library
functions and most of its useful extensions are written in C.

In practice Java and Javascript almost always are JIT compiled to
native code rather than interpreted. There also exist offline (AOT)
compilers for both.

Many JIT runtimes do let you choose to have programs interpreted
rather than compiled, but running interpreted reduces performance so
much that it is rarely done unless memory is very tight.


.NET is not a language itself but rather a runtime system like the
Jave Platform. .NET consists of a virtual machine: the Common
Language Runtime (CLR); and a set of standard libraries. Similarly
the Java Platform consists of a virtual machine: the Java Virtual
Machine (JVM); and a set of standard libraries. Compilers target
these runtime systems.

The .NET CLR does not include an interpreter ... I'm not aware that
there even is one for .NET. There is an offline (AOT) compiler that
can be used instead of the JIT.
Post by Hans-Peter Diettrich
What's the actual ranking of programming languages? A JetBrains study
does not list any compiled language in their first 7 ranks in 2022. C++
follows on rank 8.
What does that trend mean to a compiler group? Interpreted languages
still need a front-end (parser) and back-end (interpreter), but don't
these tasks differ between languages compiled to hardware or interpretation?
The trend is toward "managed" environments which offer niceties like
GC, objects with automagic serialized access, etc., all to help
protect average programmers from themselves ... err, um, from being
unable to produce working software.
Post by Hans-Peter Diettrich
DoDi
George
Hans-Peter Diettrich
2023-03-28 07:21:50 UTC
Permalink
Post by George Neuner
On Sat, 25 Mar 2023 13:07:57 +0100, Hans-Peter Diettrich
Post by Hans-Peter Diettrich
After a look at "open software" I was astonished by the number of
languages and steps involved in writing portable C code. Also updates of
popular programs (Firefox...) are delayed by months on some platforms,
IMO due to missing manpower on the target systems for checks and the
adaptation of "configure". Now I understand why many people prefer
interpreted languages (Java, JavaScript, Python, .NET...) for a
simplification of their software products and spreading.
Actually Python is the /only/ one of those that normally is
interpreted. And the interpreter is so slow the language would be
unusable were it not for the fact that all of its standard library
functions and most of its useful extensions are written in C.
My impression of "interpretation" was aimed at the back-end, where
tokenized (virtual machine...) code has to be brought to a physical
machine, with a specific firmware (OS). Then the real back-end has to
reside on the target machine and OS, fully detached from the preceding
compiler stages.

Then, from the compiler writer viewpoint, it's not sufficient to define
a new language and a compiler for it, instead it must placed on top of
some popular "firmware" like Java VM, CLR or C/C++ standard libraries,
or else a dedicated back-end and libraries have to be implemented on
each supported platform.

My impression was that the FSF favors C and ./configure for "portable"
code. That's why I understand that any other way is easier for the
implementation of really portable software, that deserves no extra
tweaks for each supported target platform, for every single program. Can
somebody shed some light on the current practice of writing portable
C/C++ software, or any other compiled language, that (hopefully) does
not require additional human work before or after compilation for a
specific target platform?

DoDi
Aharon Robbins
2023-03-28 14:42:18 UTC
Permalink
Post by Hans-Peter Diettrich
My impression was that the FSF favors C and ./configure for "portable"
code.
Like many things, this is the result of evolution. Autoconf is well
over 20 years old, and when it was created the ISO C and POSIX standards
had not yet spread throughout the Unix/Windows/macOS world. It and the
rest of the autotools solved a real problem.

Today, the C and C++ worlds are easier to program in, but it's still
not perfect and I don't think I'd want to do without the autotools.
Particularly for the less POSIX-y systems, like MinGW and OpenVMS.
Post by Hans-Peter Diettrich
Can somebody shed some light on the current practice of writing portable
C/C++ software, or any other compiled language, that (hopefully) does
not require additional human work before or after compilation for a
specific target platform?
Well, take a look at Go. The trend there (as in the Python, Java and
C# worlds) is to significantly beef up the standard libraries. Go
has regular expressions, networking, file system, process and all kinds
of other stuff in its libraries, all things that regular old C and C++ code
often has to (or had to) hand-roll. That makes it a lot easier for
someone to just write the code to get their job done, as well as
providing for uniformity across both operating systems and applications
written in Go.

Go goes one step further, even. Following the Plan 9 example, the
golang.org Go compilers are also cross compilers. I can build a Linux
x86_64 executable on my macOS system just by setting some environment
variables when running 'go build'. Really nice.

The "go" tool itself also takes over a lot of the manual labor, such
as downloading libraries from the internet, managing build dependencies
(no need for "make") and much more. I suspect that that is also a
trend.

Does that answer your question?

Arnold
Kaz Kylheku
2023-03-29 18:33:12 UTC
Permalink
Post by Aharon Robbins
Post by Hans-Peter Diettrich
My impression was that the FSF favors C and ./configure for "portable"
code.
Like many things, this is the result of evolution. Autoconf is well
over 20 years old, and when it was created the ISO C and POSIX standards
had not yet spread throughout the Unix/Windows/macOS world. It and the
rest of the autotools solved a real problem.
Today, the C and C++ worlds are easier to program in, but it's still
not perfect and I don't think I'd want to do without the autotools.
Particularly for the less POSIX-y systems, like MinGW and OpenVMS.
Counterpoint: Autotools are a real detriment to GNU project programs.

When a release is cut of a typical GNU program, special steps
are execute to prepare a tarball which has a compiled configure
script.

You cannot just do a "git clone" of a GNU program, and then run
./configure and build. You must run some "make boostrap" nonsense, and
that requires you to have various Autotools installed, and in specific
versions!

In the past what I have done to build a GNU program from version
control, as a quick and dirty shortcut, done was to find the tarball
which is the closest match to the baseline that I'm trying to build
(e.g. of GNU Make or GNU Awk or whatever). Unpack the tarball over the
repository and run ./configure. Then "git reset --hard" the changes and
rebuild.

Most Autotools programs will not cleanly cross-compile. Autotools is tha
main reason why distro build systems use QEMU to create a virtual target
environment with native tools and libraries, and then build the
"cross-compiled" program as if it were native.

Among the problems are in Autoconf itself. If it knows the program
is being cross-compiled, any test which requires a test program to be
compiled and executed is disabled. Since the output of that configure
test is needed, bad defaults are substituted.
For instance, about a decade and a half ago I helped a company
replace Windriver cruft with an in-house distribution. Windriver's
cross-compiled Bash didn't have job control! Ctrl-Z, fg, bg stuff no
workie. The reason was that it was just cross-compiled straight, on an
x86 build box. It couldn't run the test to detect job control support,
and so it defaulted it off, even though the target machine had
"gnu-linux" in its string. In the in-house distro, my build steps for
bash exported numerous ac_cv_... internal variables to override the bad
defaults.

My TXR language project has a hand-written, not generated, ./configure
script. What you get in a txr-285.tar.gz tarball is exactly what you
get if you do a "git clone" and "git checkout txr-285", modulo
the presence of a .git directory and differing timestamps.

You just ./configure and make.

I have a "./configure --maintainer" mode which will require flex and bison
instead of using the shipped parser stuff, and that's about it.
You don't have to use that to do development work.

There is no incomprehensible nonsense in the build system at all.

None of my configure-time tests require the execution of a program;
For some situations, I have developed clever tricks to avoid it. For
instance, if you want to know the size of a data type:. Here
is a fragment:

printf "Checking what C integer type can hold a pointer ... "

if [ -z "$intptr" ] ; then
cat > conftest.c <<!
#include <stddef.h>
#include <limits.h>
#include "config.h"

#define D(N, Z) ((N) ? (N) + '0' : Z)
#define UD(S) D((S) / 10, ' ')
#define LD(S) D((S) % 10, '0')
#define DEC(S) { UD(S), LD(S) }

struct sizes {
char h_BYTE[32], s_BYTE[2];
#if HAVE_SUPERLONG_T
char h_SUPERLONG[32], s_SUPERLONG[2];
#endif
#if HAVE_LONGLONG_T
char h_LONGLONG[32], s_LONGLONG[2];
#endif
char h_PTR[32], s_PTR[2];
char h_LONG[32], s_LONG[2];
char h_INT[32], s_INT[2];
char h_SHORT[32], s_SHORT[2];
char h_WCHAR[32], s_WCHAR[2];
char nl[2];
} foo = {
"\nSIZEOF_BYTE=", DEC(CHAR_BIT),
#if HAVE_SUPERLONG_T
"\nSIZEOF_SUPERLONG_T=", DEC(sizeof (superlong_t)),
#endif
#if HAVE_LONGLONG_T
"\nSIZEOF_LONGLONG_T=", DEC(sizeof (longlong_t)),
#endif
"\nSIZEOF_PTR=", DEC(sizeof (char *)),
"\nSIZEOF_LONG=", DEC(sizeof (long)),
"\nSIZEOF_INT=", DEC(sizeof (int)),
"\nSIZEOF_SHORT=", DEC(sizeof (short)),
"\nSIZEOF_WCHAR_T=", DEC(sizeof (wchar_t)),
"\n"
};
!

In this generated program the sizes are encoded as two-digit decimal
strings, at compile time. So the compiled object file will contain
something like "SIZEOF_PTR= 8" surrounded by newlines. The configure
script can look for these strings and get the values out:

if ! conftest_o ; then # conftest_o is a function to build the .o
printf "failed\n\n"

printf "Errors from compilation: \n\n"
cat conftest.err
exit 1
fi

The script gets the SIZEOF lines out and evals them as shell
assignments. That's why we avoided SIZEOF_PTR=08; that would become
octal in the shell:

eval $(tr '\0' ' ' < conftest.o | grep SIZEOF | sed -e 's/ *//')

It also massages these SIZEOFs into header file material:

tr '\0' ' ' < conftest.o | grep SIZEOF | sed -e 's/= */ /' -e 's/^/#define /' >> config.h

if [ $SIZEOF_PTR -eq 0 -o $SIZEOF_BYTE -eq 0 ] ; then
printf "failed\n"
exit 1
fi

Here is how it then looks like in config.h:

#define SIZEOF_BYTE 8
#define SIZEOF_LONGLONG_T 8
#define SIZEOF_PTR 4
#define SIZEOF_LONG 4
#define SIZEOF_INT 4
#define SIZEOF_SHORT 2
#define SIZEOF_WCHAR_T 4

There is a minor cross-compiling complication in txr in that you need
txr to compile the standard library. So you must build a native txr
first and then specify TXR=/path/to/native/txr to use that one for
building the standard lib. Downstream distro people have figured this
out on their own.


--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Aharon Robbins
2023-03-31 07:10:46 UTC
Permalink
Post by Kaz Kylheku
Post by Aharon Robbins
Today, the C and C++ worlds are easier to program in, but it's still
not perfect and I don't think I'd want to do without the autotools.
Particularly for the less POSIX-y systems, like MinGW and OpenVMS.
Counterpoint: Autotools are a real detriment to GNU project programs.
When a release is cut of a typical GNU program, special steps
are execute to prepare a tarball which has a compiled configure
script.
You cannot just do a "git clone" of a GNU program, and then run
./configure and build. You must run some "make boostrap" nonsense, and
that requires you to have various Autotools installed, and in specific
versions!
This is not inherent in the autotools; it's laziness on the part of the
maintainers. For exactly this reason gawk has a very simple bootstrap.sh
program that simply does a touch on various files so that configure will
run without wanting to run the autotools.
Post by Kaz Kylheku
Most Autotools programs will not cleanly cross-compile. Autotools is the
main reason why distro build systems use QEMU to create a virtual target
environment with native tools and libraries, and then build the
"cross-compiled" program as if it were native.
QEMU wasn't around when the Autotools were first designed and
implemented. Most end users don't need to cross compile either, and it
is for them that I (and other GNU maintainers, I suppose) build my
configure scripts.

Yes, the world is different today than when the autotools were
designed. No, the autotools are not perfect. I don't know of a better
alternative though. And don't tell me CMake. CMake is an abomination,
interweaving configuration with building instead of cleanly separating
the jobs. Not to mention its stupid caching which keeps you from
running a simple "make" after you've changed a single file.
Post by Kaz Kylheku
My TXR language project has a hand-written, not generated, ./configure
script. What you get in a txr-285.tar.gz tarball is exactly what you
get if you do a "git clone" and "git checkout txr-285", modulo
the presence of a .git directory and differing timestamps.
You just ./configure and make.
And for gawk it's ./bootstrap.sh && ./configure && make
where bootstrap.sh only takes a few seconds.
Post by Kaz Kylheku
None of my configure-time tests require the execution of a program;
For some situations, I have developed clever tricks to avoid it.
And why should you, or anyone, be forced to develop such clever tricks?

All of this simply justifies more the approach taken by newer languages,
which is to move all the hard crap into the libraries. The language
developers do all the hard work, instead of the application developers
having to do it. This is great for people who want to just get their
job done, which includes me most of the time. However, and this is a
different discussion, it does lead to a generation of programmers who
have *no clue* as to how to do the hard stuff should they ever need to.

My opinion, of course.

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
Anton Ertl
2023-04-02 08:56:48 UTC
Permalink
Post by Kaz Kylheku
When a release is cut of a typical GNU program, special steps
are execute to prepare a tarball which has a compiled configure
script.
You cannot just do a "git clone" of a GNU program, and then run
./configure and build. You must run some "make boostrap" nonsense, and
that requires you to have various Autotools installed, and in specific
versions!
And the problem is?

The git repo contains only the source code, useful for developers.
The developers have stuff installed that someone who just wants to
install the program does not necessarily want to install. E.g., in
the case of Gforth, you need an older Gforth to build the kernel
images that contain Forth code compiled to an intermediate
representation. Therefore the tarballs contain a number of generated
(or, as you say, "compiled") files, e.g., the configure script, the
kernel images in case of Gforth, or the C files generated by Bison in
case of some other compilers.

If you go for the "git clone" route rather than building from the
tarball, you don't get these amenities, but have to install all the
tools that the developers use, and have to perform an additional step
(usually ./autogen.sh) to produce the configure file. "make
bootstrap" is unlikely to work, because at that stage you don't have a
Makefile.

I remember "make bootstrap" from gcc, where IIRC it compiles gcc first
(stage1) with the pre-installed C compiler, then (stage2) with the
result of stage1, and finally (stage3) again with the result of
stage2; if there is a difference between stage2 and stage3, something
is amiss.

Anyway, tl;dr: If you just want to do "./configure; make", use the
tarball.
Post by Kaz Kylheku
Most Autotools programs will not cleanly cross-compile. Autotools is tha
main reason why distro build systems use QEMU to create a virtual target
environment with native tools and libraries, and then build the
"cross-compiled" program as if it were native.
Clever! Let the machine do the work, rather than having to do manual
work for each package.
Post by Kaz Kylheku
For instance, about a decade and a half ago I helped a company
replace Windriver cruft with an in-house distribution. Windriver's
cross-compiled Bash didn't have job control! Ctrl-Z, fg, bg stuff no
workie. The reason was that it was just cross-compiled straight, on an
x86 build box. It couldn't run the test to detect job control support,
and so it defaulted it off, even though the target machine had
"gnu-linux" in its string. In the in-house distro, my build steps for
bash exported numerous ac_cv_... internal variables to override the bad
defaults.
That's the way to do it.

Your idea seems to be that, when the value is not supplied, instead of
a safe default (typically resulting in not using a feature), one
should base the values on the configuration name of the system. I
think the main problem with that is that for those systems most in
need of cross-compiling the authors of the tests don't know good
values for the configuration variables; for linux-gnu systems I
usually configure and compile on the system.
Post by Kaz Kylheku
For some situations, I have developed clever tricks to avoid it. For
instance, if you want to know the size of a data type:. Here
Great! Now we need someone who has enough time to replace the
AC_CHECK_SIZEOF autoconf macro with your technique, and a significant
part of the configuration variables that have to be supplied manually
when cross-configuring Gforth become fully automatic.

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
Hans-Peter Diettrich
2023-03-31 05:49:46 UTC
Permalink
Post by Aharon Robbins
Post by Hans-Peter Diettrich
My impression was that the FSF favors C and ./configure for "portable"
code.
Like many things, this is the result of evolution. Autoconf is well
over 20 years old, and when it was created the ISO C and POSIX standards
had not yet spread throughout the Unix/Windows/macOS world. It and the
rest of the autotools solved a real problem.
About 20 years ago I could not build any open source program on Windows.
Messages like "Compiler can not build executables" popped up when using
MinGW or Cygwin. I ended up in ./configure in a Linux VM and fixing the
resulting compiler errors manually on Windows. Without that trick I had
no chance to load the "portable" source code into any development
environment for inspection in readable (compilable) form. Often I had
the impression that the author wanted the program not for use on Windows
machines. Kind of "source open for specific OS only" :-(

DoDi
Anton Ertl
2023-04-02 10:04:31 UTC
Permalink
Post by Hans-Peter Diettrich
Often I had
the impression that the author wanted the program not for use on Windows
machines. Kind of "source open for specific OS only" :-(
Whatever we want, it's also a question of what the OS vendor wants.

For a Unix, there were a few hoops we had to jump through to make
Gforth work: e.g., IRIX 6.5 had a bug in sigaltstack, so we put in a
workaround for that; HP/UX's make dealt with files with the same mtime
differently from other makes, so we put in a workaround for that.
Windows, even with Cygwin, puts up many more hoops to jump through;
Bernd Paysan actually jumped through them for Gforth, but a Windows
build is still quite a bit of work, so he does that only occasionally.

It's no surprise to me that other developers don't jump through these
hoops; maybe if someone payed them for it, but why should they do it
on their own time?

As a recent example of another OS, Apple has intentionally reduced the
functionality of mmap() on MacOS on Apple silicon compared to MacOS on
Intel. As a result, the development version of Gforth does not work
on MacOS on Apple Silicon (it works fine on Linux on Apple Silicon).
I spent a day last summer on the MacOS laptop of a friend (an
extremely unpleasant experience) trying to find the problem and fix
it, and I found the problem, but time ran out before I had a working
fix (it did not help that I had to spend a lot of time on working
around things that I missed in MacOS). Since then this problem has
not reached the top of my ToDo list; and when it does, I will go for
the minimal fix, with the result that Gforth on MacOS will run without
dynamic native-code generation, i.e., slower than on Linux.

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
Hans-Peter Diettrich
2023-04-05 09:23:39 UTC
Permalink
Post by Anton Ertl
For a Unix, there were a few hoops we had to jump through to make
Gforth work: e.g., IRIX 6.5 had a bug in sigaltstack, so we put in a
workaround for that; HP/UX's make dealt with files with the same mtime
differently from other makes, so we put in a workaround for that.
Windows, even with Cygwin, puts up many more hoops to jump through;
Bernd Paysan actually jumped through them for Gforth, but a Windows
build is still quite a bit of work, so he does that only occasionally.
Too bad that not all existing OS are POSIX compatible? ;-)

So my impression still is: have a language (plus library) and an
interpreter (VM, browser, compiler...) on each target system. Then
adaptations to a target system have to be made only once, for each
target, not for every single program.

Even for programs with extreme speed requirements the development can be
done from the general implementation, for tests etc., and a version
tweaked for a very specific target system, instead of the single target
version in the first place and problematic ports to many other platforms.

Of course it's up to the software developer or principal to order or
build a software for a (more or less) specific target system only, or a
primarily unbound software.

(G)FORTH IMO is a special case because it's (also) a development system.
Building (bootstrapping) a new FORTH system written in FORTH is quite
complicated, in contrast to languages with stand alone tools like
compiler, linker etc. Some newer (umbilical?) FORTH versions also
compile to native code.

DoDi
Anton Ertl
2023-04-05 16:30:31 UTC
Permalink
Post by Hans-Peter Diettrich
Post by Anton Ertl
For a Unix, there were a few hoops we had to jump through to make
Gforth work: e.g., IRIX 6.5 had a bug in sigaltstack, so we put in a
workaround for that; HP/UX's make dealt with files with the same mtime
differently from other makes, so we put in a workaround for that.
Windows, even with Cygwin, puts up many more hoops to jump through;
Bernd Paysan actually jumped through them for Gforth, but a Windows
build is still quite a bit of work, so he does that only occasionally.
Too bad that not all existing OS are POSIX compatible? ;-)
Like many standards, POSIX is a subset of the functionality that
programs use. Windows NT used to have a POSIX subsystem in order to
make WNT comply with FIPS 151-2 needed to make WNT eligible for
certain USA government purchases. From what I read, it was useful for
that, but not much else.
Post by Hans-Peter Diettrich
So my impression still is: have a language (plus library) and an
interpreter (VM, browser, compiler...) on each target system. Then
adaptations to a target system have to be made only once, for each
target, not for every single program.
You mean: Write your program in Java, Python, Gforth, or the like?
Sure, they deal with compatibility problems for you, but you may want
to do things (or have performance) that they do not offer, or only
offer through a C interface (and in the latter case you run into the
C-level compatibility again).
Post by Hans-Peter Diettrich
Even for programs with extreme speed requirements the development can be
done from the general implementation, for tests etc., and a version
tweaked for a very specific target system, instead of the single target
version in the first place and problematic ports to many other platforms.
Well, if you go that route, the result can easily be that your program
does not run on Windows. Especially for GNU programs: The primary
goal is that they run on GNU. Any effort spent on a Windows port is
extra effort that not everybody has time for.
Post by Hans-Peter Diettrich
(G)FORTH IMO is a special case because it's (also) a development system.
Building (bootstrapping) a new FORTH system written in FORTH is quite
complicated, in contrast to languages with stand alone tools like
compiler, linker etc.
Not really. Most self-respecting languages have their compiler(s)
implemented in the language itself, resulting in having to bootstrap.
AFAIK the problem Gforth has with Windows is not the bootstrapping;
packaging and installation are different than for Unix.

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
Kaz Kylheku
2023-04-06 08:35:12 UTC
Permalink
Post by Anton Ertl
Post by Hans-Peter Diettrich
Post by Anton Ertl
For a Unix, there were a few hoops we had to jump through to make
Gforth work: e.g., IRIX 6.5 had a bug in sigaltstack, so we put in a
workaround for that; HP/UX's make dealt with files with the same mtime
differently from other makes, so we put in a workaround for that.
Windows, even with Cygwin, puts up many more hoops to jump through;
Bernd Paysan actually jumped through them for Gforth, but a Windows
build is still quite a bit of work, so he does that only occasionally.
Too bad that not all existing OS are POSIX compatible? ;-)
Like many standards, POSIX is a subset of the functionality that
programs use. Windows NT used to have a POSIX subsystem in order to
make WNT comply with FIPS 151-2 needed to make WNT eligible for
certain USA government purchases. From what I read, it was useful for
that, but not much else.
The best POSIX subsystem for Windows is arguably Cygwin. It has
quite a rich POSIX functionality. Not only that, but ANSI terminal
functionality: its I/O system contains a layer which translates
ANSI escape sequences into Windows Console API calls.

Yuo can take a program written on Linux which uses termios to put the
TTY in raw mode, and ANSI escapes to control the screen, and it will
work on Cygwin.

One of its downsides downside is that Cygwin has poor performance
(mainly in the area of file access).

The other downside of Cygwin is that it implements certain conventions
that are at odds with "native" Windows.

In 2016 I started a small project called Cygnal (Cygwin Native
Application Library) to fix problems in this second category,
creating a fork of the Cygwin DLL.

https://www.kylheku.com/cygnal
Post by Anton Ertl
Post by Hans-Peter Diettrich
(G)FORTH IMO is a special case because it's (also) a development system.
Building (bootstrapping) a new FORTH system written in FORTH is quite
complicated, in contrast to languages with stand alone tools like
compiler, linker etc.
Not really. Most self-respecting languages have their compiler(s)
implemented in the language itself, resulting in having to bootstrap.
You can avoid the chicken-and-egg problem that requires boostrapping by
using a host language to implement an interpreter for the target
language. That interpreter can then directly execute the compiler, which
can compile itself and other parts of the run-time, as needed.

It's still a kind of boostrapping, but at no point do you need a
pre-built binary of the target language compiler to build that compiler;
you just need an implementation of a host language.

This works quite well when the host language is good for writing
interpreters, and the target one for compiler work, and also when it's
useful to have an interpreter even when compilation is available.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Hans-Peter Diettrich
2023-04-07 13:35:32 UTC
Permalink
Post by Anton Ertl
You mean: Write your program in Java, Python, Gforth, or the like?
Sure, they deal with compatibility problems for you, but you may want
to do things (or have performance) that they do not offer, or only
offer through a C interface (and in the latter case you run into the
C-level compatibility again).
Except the library also is portable ;-)

Else you end up with:
Program runs only on systems with libraries X, Y, Z installed.
Post by Anton Ertl
Post by Hans-Peter Diettrich
(G)FORTH IMO is a special case because it's (also) a development system.
Building (bootstrapping) a new FORTH system written in FORTH is quite
complicated, in contrast to languages with stand alone tools like
compiler, linker etc.
Not really. Most self-respecting languages have their compiler(s)
implemented in the language itself, resulting in having to bootstrap.
The FORTH compiler also is part of the current monolithic framework.
Replacing a WORD has immediate impact on the just running compiler and
everything else. A bug can make the current system crash immediately,
without diagnostics. Else the current WORDs can not be replaced
immediately, only after a full compilation and only by code that depends
on neither the old nor the new framrwork.
Post by Anton Ertl
AFAIK the problem Gforth has with Windows is not the bootstrapping;
packaging and installation are different than for Unix.
Isn't that the same problem with every language?

DoDi
Thomas Koenig
2023-04-08 18:25:06 UTC
Permalink
Post by Anton Ertl
Most self-respecting languages have their compiler(s)
implemented in the language itself, resulting in having to bootstrap.
This is a bit complicated for GCC and LLVM.

For both, the middle end (and back end) is implemented in C++,
so a C++ interface at class level is required, and that is a
bit daunting.

Examples: Gnat (GCC's Ada front end) is written in Ada, and its
Modula-2 front end is written in Modula-2. On the other hand,
the Fortran front end is written in C++ (well, mostly C with
C++ features hidden behind macros).

The very first Fortran compiler, of course, was written in
assembler.
[It was, but Fortran H, the 1960s optimizing compiler for S/360 was
written in Fortran with a few data structure extensions. -John]

gah4
2023-03-28 21:21:05 UTC
Permalink
On Tuesday, March 28, 2023 at 1:14:29 AM UTC-7, Hans-Peter Diettrich wrote:

(snip)
Post by Hans-Peter Diettrich
Then, from the compiler writer viewpoint, it's not sufficient to define
a new language and a compiler for it, instead it must placed on top of
some popular "firmware" like Java VM, CLR or C/C++ standard libraries,
or else a dedicated back-end and libraries have to be implemented on
each supported platform.
From an announcement today here on an ACM organized conference:


"We encourage authors to prepare their artifacts for submission
and make them more portable, reusable and customizable using
open-source frameworks including Docker, OCCAM, reprozip,
CodeOcean and CK."

I hadn't heard about those until I read that one, but it does sound
interesting.
Kaz Kylheku
2023-03-29 18:34:53 UTC
Permalink
Post by gah4
(snip)
Post by Hans-Peter Diettrich
Then, from the compiler writer viewpoint, it's not sufficient to define
a new language and a compiler for it, instead it must placed on top of
some popular "firmware" like Java VM, CLR or C/C++ standard libraries,
or else a dedicated back-end and libraries have to be implemented on
each supported platform.
"We encourage authors to prepare their artifacts for submission
and make them more portable, reusable and customizable using
open-source frameworks including Docker, OCCAM, reprozip,
CodeOcean and CK."
"We encourage authors to lock their software to third party boat
anchors, such as ..."


--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
[If you are telling people not to use Docker, that whale sailed a long time ago. -John]
George Neuner
2023-03-28 21:26:45 UTC
Permalink
On Tue, 28 Mar 2023 09:21:50 +0200, Hans-Peter Diettrich
Post by Hans-Peter Diettrich
Post by George Neuner
On Sat, 25 Mar 2023 13:07:57 +0100, Hans-Peter Diettrich
Post by Hans-Peter Diettrich
After a look at "open software" I was astonished by the number of
languages and steps involved in writing portable C code. Also updates of
popular programs (Firefox...) are delayed by months on some platforms,
IMO due to missing manpower on the target systems for checks and the
adaptation of "configure". Now I understand why many people prefer
interpreted languages (Java, JavaScript, Python, .NET...) for a
simplification of their software products and spreading.
Actually Python is the /only/ one of those that normally is
interpreted. And the interpreter is so slow the language would be
unusable were it not for the fact that all of its standard library
functions and most of its useful extensions are written in C.
My impression of "interpretation" was aimed at the back-end, where
tokenized (virtual machine...) code has to be brought to a physical
machine, with a specific firmware (OS). Then the real back-end has to
reside on the target machine and OS, fully detached from the preceding
compiler stages.
That is exactly as I meant it.

Python and Java both initially are compiled to bytecode. But at
runtime Python bytecode is interpreted: the Python VM examines each
bytecode instruction, one by one, and executes an associated native
code subroutine that implements that operation.

In contrast, at runtime Java bytecode is JIT compiled to equivalent
native code - which include calls to native subroutines to implement
complex operations like "new", etc. The JVM JIT compiles function by
function as the program executes ... so it takes some time before the
whole program exists as native code ... but once a whole load module
has been JIT compiled, the JVM can completely ignore and even unload
the bytecode from memory.
Post by Hans-Peter Diettrich
Then, from the compiler writer viewpoint, it's not sufficient to define
a new language and a compiler for it, instead it must placed on top of
some popular "firmware" like Java VM, CLR or C/C++ standard libraries,
or else a dedicated back-end and libraries have to be implemented on
each supported platform.
Actually it simplifies the compiler writer's job because the
instruction set for the platform VM tends not to change much over
time. A compiler targeting the VM doesn't have to scramble to support
features of every new CPU - in many cases that can be left to the
platform's JIT compiler.
Post by Hans-Peter Diettrich
My impression was that the FSF favors C and ./configure for "portable"
code. That's why I understand that any other way is easier for the
implementation of really portable software, that deserves no extra
tweaks for each supported target platform, for every single program. Can
somebody shed some light on the current practice of writing portable
C/C++ software, or any other compiled language, that (hopefully) does
not require additional human work before or after compilation for a
specific target platform?
Right. When you work on a popular "managed" platform (e.g., JVM or
CLR), then its JIT compiler and CPU specific libraries gain you any
CPU specific optimizations that may be available, essentially for
free.

OTOH, when you work in C (or other independent language), to gain CPU
specific optimizations you have to write model specific code and/or
obtain model specific libraries, you have to maintain different
versions of your compiled executables (and maybe also your sources),
and you need to be able to identify the CPU so as to install or use
model specific code.


For most developers, targeting a managed platform tends to reduce the
effort needed to achieve an equivalent result.
Post by Hans-Peter Diettrich
DoDi
George
[The usual python implementation interprets bytecodes, but there are
also versions for .NET, the Java VM, and a JIT compiler. -John]
George Neuner
2023-03-29 17:50:45 UTC
Permalink
Post by George Neuner
[The usual python implementation interprets bytecodes, but there are
also versions for .NET, the Java VM, and a JIT compiler. -John]
Thanks John. I knew about the reference implementation, but I was not
aware of the others.
George
gah4
2023-03-29 18:27:49 UTC
Permalink
Post by George Neuner
Right. When you work on a popular "managed" platform (e.g., JVM or
CLR), then its JIT compiler and CPU specific libraries gain you any
CPU specific optimizations that may be available, essentially for
free.
For system like Matlab and Octave, and I believe also for Python,
or one of many higher math languages, programs should spend
most of the time in the internal compiled library routines.

You could write a whole matrix inversion algorithm in Matlab
or Python, but no reason to do that. That is the convenience
of matrix operations, and gets better as they get bigger.

In earlier days, there were Linpack and Eispack, and other
Fortran callable math libraries. And one could write a
small Fortran program to call them.

But now we have so many different (more or less) interpreted
math oriented languages, that it is hard to keep track of them,
and hard to know which one to use.
Thomas Koenig
2023-03-31 05:19:14 UTC
Permalink
Post by gah4
For system like Matlab and Octave, and I believe also for Python,
or one of many higher math languages, programs should spend
most of the time in the internal compiled library routines.
They should, but sometimes they don't.

If you run into things not covered by compiled libraries, but which
are compute-intensive, then Matlab and (interpreted) Python run
as slow as molasses, orders of magnitude slower than compiled code.

As far as the projects to create compiled versions with Python
go, one of the problems is that Python is a constantly evolving
target, which can lead to real problems, especially in long-term
program maintenance. As Konrad Hinsen reported, results in
published science papers have changed due to changes in the Python
infrastructure:

http://blog.khinsen.net/posts/2017/11/16/a-plea-for-stability-in-the-scipy-ecosystem/

At the company I work for, I'm told each Python project will only
use a certain specified version of Python will never be changed for
fear of incompatibilities - they treat each version as a new
programming language :-|

To bring this back a bit towards compilers - a language definition
is an integral part of compiler writing. If

- the specification to o be implemented is unclear or "whatever
the reference implementation does"

- the compiler writers always reserve the right for a better,
incompatible idea

- the compiler writers do not pay careful attention to
existing specifications

then the resuling compiler will be of poor quality, regardless of
the cool parsing or code generation that go into it.

And I know very well that reading and understanding language
standards is no fun, but I'm told that writing them is even
less fun.
gah4
2023-03-31 19:41:32 UTC
Permalink
Post by Thomas Koenig
Post by gah4
For system like Matlab and Octave, and I believe also for Python,
or one of many higher math languages, programs should spend
most of the time in the internal compiled library routines.
They should, but sometimes they don't.
If you run into things not covered by compiled libraries, but which
are compute-intensive, then Matlab and (interpreted) Python run
as slow as molasses, orders of magnitude slower than compiled code.
But then there is dynamic linking.

I have done it in R, but I believe it also works for Matlab and
Python, and is the way many packages are implemented. You write a
small C or Fortran program that does the slow part, and call it from
interpreted code.

And back to my favorite x86 assembler program:

rdtsc: rdtsc
ret

which allows high resolution timing, to find where the program
is spending too much time. Some years ago, I did this on a program
written by someone else, so I mostly didn't know the structure.
Track down which subroutines used too much time, and fix
just those.

In that case, one big time sink is building up a large matrix one
row or one column at a time, which requires a new allocation and
copy for each time. Preallocating to the final (if known) size fixes that.

But then there were some very simple operations that, as you note,
are not included and slow. Small C programs fixed those.
There are complications for memory allocation, which I avoid
by writing mine to assume (require) that all is already allocated.

(snip)
Post by Thomas Koenig
At the company I work for, I'm told each Python project will only
use a certain specified version of Python will never be changed for
fear of incompatibilities - they treat each version as a new
programming language :-|
To bring this back a bit towards compilers - a language definition
is an integral part of compiler writing. If
I have heard about that one.

It seems that there are non-backward compatible changes
from Python 2.x to 3.x. That is, they pretty much are different
languages.

Tradition on updating a language standard is to maintain, as much
as possible, backward compatibility. It isn't always 100%, but often
close enough. You can run Fortran 66 program on new Fortran 2018
compilers without all that much trouble. (Much of the actual problem
comes with extensions used by the old programs.)
[Python's rapid development cycle definitely has its drawbacks. Python 3
is not backward compatible with python 2 (that's why they bumped the major
version number) and they ended support for python 2 way too soon. -John]
Anton Ertl
2023-03-31 16:34:24 UTC
Permalink
Post by Hans-Peter Diettrich
My impression was that the FSF favors C and ./configure for "portable"
code. That's why I understand that any other way is easier for the
implementation of really portable software, that deserves no extra
tweaks for each supported target platform, for every single program.
I have not noticed that the FSF has any preference for C, apart from C
being the lingua franca in the late 1980s and the 1990s, and arguably
for certain requirements it still is.

Now C on Unix has to fight with certain portability issues. In early
times C programs contained a config.h that the sysadmin installing a
program had to edit by hand before running make. Then came autoconf,
which generates configure files that run certain checks on the system
and fill in config.h for you; and of course, once the mechanism is
there, stuff in other files is filled in with configure, too.

It's unclear to me what you mean with "any other way is easier". The
way of manually editing config.h certainly was not easier for the
sysadmins. Not sure if it was easier for the maintainer of the
programs.
Post by Hans-Peter Diettrich
Can somebody shed some light on the current practice of writing portable
C/C++ software, or any other compiled language, that (hopefully) does
not require additional human work before or after compilation for a
specific target platform?
There are other tools like Cmake that claim to make autoconf
unnecessary, but when I looked at it, I did not find it useful for my
needs (but I forgot why).

So I'll tell you here some of what autoconf does for Gforth: Gforth is
a Forth system mostly written in Forth, but using a C substrate. Many
system differences are dealt with in the C substrate, often with the
help of autoconf. The configure.ac file describes what autoconf
should do for Gforth; it has grown to 1886 lines.

* It determines the CPU architecture and OS where the configure script
is running at, and uses that to configure some architecture-specific
stuff for Gforth, in particular how to synchronize the data and
instruction caches; later gcc acquired __builtin___clear_cache() to
do that, but at least on some platforms that builtin is broken
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93811>.

* It checks the sizes of the C integer types in order to determine the
C type for Forth's cell and double-cell types.

* It uses the OS information to configure things like the newline
sequence, the directory and path separators.

* It deals with differences between OSs, such as large (>4GB) file
support, an issue relevant in the 1990s.

* It checks for the chcon program, and, if present, uses it to "work
around SELinux brain damage"; if not present, the brain is probably
undamaged.

* It tests which of several ways is accepted by the assembler to skip
code space (needed for implementing Gforth's dynamic
superinstructions).

* It checks for the presence of various programs and library functions
needed for building Gforth, e.g. mmap() (yes, there used to be
systems that do not have mmap()). In some cases it works around the
absence, sometimes with degraded functionality; in other cases it
just reports the absence, so the sysadmin knows what to install.

That's just some of the things I see in configure.ac; there are many
bits and pieces that are too involved and/or too minor to report here.

Our portability stuff does not catch everything. E.g., MacOS on Apple
Silicon has a broken mmap() (broken as far as Gforth is concerned;
looking at POSIX, it's compliant with that, but that does not justify
this breakage; MacOS on Intel works fine, as does Linux on Apple
Silicon), an issue that's new to us; I have not yet devised a
workaround for that, but when I do, a part of the solution may use
autoconf.

Now when you write Forth code in Gforth, it tends to be quite portable
across platforms (despite Forth being a low-level language where, if
you want to see them, it's easy to see differences between 32-bit and
64-bit systems, and between different byte orders). One reason for
that is that Gforth papers over system differences (with the help of
autoconf among other things); another reason is that Gforth does not
expose many of the things where the systems are different, at least
not at the Forth level. You can use the C interface and then access
all the things that C gives access to, many of which are
system-specific, and for which tools like autoconf exist.

The story is probably similar for other languages.

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
Aharon Robbins
2023-03-23 13:56:23 UTC
Permalink
Post by Thomas Koenig
Not ones written in assembler. But it is possible to download
the source code to many libraries, for example glibc, and then
examine what it is compiled to.
Getting more and more off topic, but I can't let this go.

Glibc is a S W A M P. A newbie who wanders in will drown and never
come out. Even if you are a very experienced C programmer, you don't
want to go there.

Learning assembler in order to understand how machines work is valuable.
Long ago I learned PDP-11 assembler, which is still one of the cleanest
architectures ever designed. I was taking a data structures course at
the same time, and recursion didn't click with me until I saw how it
was done in assembler.

My two cents,

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
[I must admit that when I write C code I still imagine there's a
PDP-11 underneath. -John]
Anton Ertl
2023-03-22 10:02:15 UTC
Permalink
Post by gah4
Not so long after I started learning OS/360 Fortran and PL/I, I found
the compiler option for printing out the generated code in sort-of
assembly language. (Not actually assembleable, though.)
...
Post by gah4
Compilers today don't write out the generated code in the same way,
Unix (Linux) compilers like gcc usually write assembly-language code
if you use the option -S. This code can be assembled, because AFAIK
that's the way these compilers produce object code.

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
David Brown
2023-03-22 13:39:59 UTC
Permalink
Post by Alan Beck
//Hello all,//
Hi,
I have started to learn Assembler out of an old book.
It is ancient (2003) but I don't think 8086 programming has changed
much. But the tools have.
I took assembly language in school but dropped out. Now I want another
go at it.
Would someone be my Mentor and answer a ton of questions that would
dwindle out as time went on?
If it's OK, we could do it here. Or netmail
Books are from a bookstore.
I have both these books on my bookshelf - but it was a /long/ time ago
that I read them.

The big question here is /why/ you are doing this. The 8086 is ancient
history - totally irrelevant for a couple of decades at least. Modern
PC's use x86-64, which is a very different thing. You don't learn
modern Spanish by reading an old Latin grammar book, even though
Spanish is a Latin language.

There are, perhaps, four main reasons for being interested in learning
to write assembly:

1. You need some very niche parts of a program or library to run as
fast as feasible. Then you want to study the details of your target
processor (it won't be an 8086) and its instruction set - typically
focusing on SIMD and caching. Done well, this can lead to an order of
magnitude improvement for very specific tasks - done badly, your
results will be a lot worse than you'd get from a good compiler with
the right options. The "comp.arch" newsgroup is your first point of
call on Usenet for this.

2. You need some very low-level code for things that can't be
expressed in a compiled language, such as task switching in an OS.
Again, you need to focus on the right target. "comp.arch" could be a
good starting point here too.

3. You are working on a compiler. This requires a deep understanding of
the target processor, but you've come to the right newsgroup.

4. You are doing this for fun (the best reason for doing anything) and
learning. You can come a long way with getting familiar with
understanding (but not writing) assembly from looking at the output of
your favourite compilers for your favourite targets and favourite
programming languages on <https://godbolt.org>. Here I would pick an
assembly that is simple and pleasant - 8086 is neither.

I would recommend starting small, such as the AVR microcontroller
family. The instruction set is limited, but fairly consistent and easy
to understand. There is vast amounts of learning resources in the
Arduino community (though most Arduino development is in C or C++), and
you can buy an Arduino kit cheaply. Here you can write assembly code
that actually does something, and the processor ISA is small enough that
you can learn it /all/.


If none of that covers your motivation, then give some more details of
what you want to achieve, and you can probably get better help.
(comp.arch might be better than comp.compilers if you are not interested
in compilers.)
George Neuner
2023-03-22 18:54:49 UTC
Permalink
... I don't think 8086 programming has changed
much. But the tools have. ...
Would someone be my Mentor and answer a ton of questions that would
dwindle out as time went on?
Assembler mostly is off-topic here in comp.compilers, but
comp.lang.asm.x86 will be open to pretty much any question regarding
80x86 assembler.
[Please reply directly unless the response is related to compilers. -John]
Loading...