What should I prepare for a PL PhD program

Discussion:

(too old to reply)

Tianbo Hao

2022-10-18 06:52:51 UTC

Hi there,

I am self-studying compilers these days, and find it really interesting. I
would love to apply for a PhD's degree in PL, can anyone kindly give me
some suggestions for the preparation?

Best,
Tianbo

Nuno Lopes

2022-10-19 08:30:07 UTC

Permalink

I think the best thing you can do to prepare is to search for a good
advisor. That's already a hard task in general, but in PL/compilers even
more so as there aren't that many out there.

PL is very broad, so I don't know exactly what you like, but I suggest you
start participating in an open-source project to learn how production code
is developed and to improve your own coding skills. At least the google
summer of code projects usually have mentors available to bootstrap
students.
Knowing really well how production code is developed is always a plus for
any PL researcher. Most students have no clue.

Good luck,
Nuno

-----Original Message-----
From: Tianbo Hao
Sent: Tuesday, October 18, 2022 7:53 AM
Subject: What should I prepare for a PL PhD program

I am self-studying compilers these days, and find it really interesting. I
would love to apply for a PhD's degree in PL, can anyone kindly give me some
suggestions for the preparation?

Best,
Tianbo

Hans-Peter Diettrich

2022-10-20 09:30:40 UTC

Permalink

Post by Nuno Lopes
PL is very broad, so I don't know exactly what you like, but I suggest you
start participating in an open-source project to learn how production code
is developed and to improve your own coding skills.

Open source projects often direct newbies to their to-do list. That's
the "hall of shame" with problems beyond the skills or knowledge of the
"core" developers.

DoDi

Spiros Bousbouras

2022-10-21 03:37:47 UTC

Permalink

On Thu, 20 Oct 2022 11:30:40 +0200

Post by Hans-Peter Diettrich

Open source projects often direct newbies to their to-do list. That's
the "hall of shame" with problems beyond the skills or knowledge of the
"core" developers.

First , participating in an open source project doesn't mean that one waits
to be "directed". They can simply notice a bug and send a fix or add some new
functionality. Second , issues on the todo list may be stuff that core
developers don't know how to do or it may be stuff that they haven't got
around to doing. In my experience it is almost always the latter.

Thomas Koenig

2022-10-21 21:32:06 UTC

Permalink

Post by Spiros Bousbouras
On Thu, 20 Oct 2022 11:30:40 +0200

Post by Hans-Peter Diettrich

Open source projects often direct newbies to their to-do list. That's
the "hall of shame" with problems beyond the skills or knowledge of the
"core" developers.

A quote from the "Quickstart Guide to Hacking GFortran":

What kind of PR to start with

[...]

Traditionally, internal compiler errors on invalid code have been
considered relatively easy. But you may always find a hard one...

gah4

2022-10-21 23:56:37 UTC

Permalink

On Friday, October 21, 2022 at 4:46:28 PM UTC-7, Thomas Koenig wrote:

(snip)

Post by Thomas Koenig
Traditionally, internal compiler errors on invalid code have been
considered relatively easy. But you may always find a hard one...

Reminds me of the story (I never saw anyone do it) that in the early days
of compilers, they would feed cards from the card recycle bin to them.

That is, as you note, a test of (most likely) invalid code.

But now that we don't have card recycle bins, where do you get a good
selection of invalid code to test them with?
[I've seen reports where people fed strings of random bytes into various
kinds of programs to see what happened. Almost invariably they crashed.
On the other hand, I recall a story from the 1960s in which someone reported
that an IBM Fortran compiler failed on a source card that contained an
obscure hard to punch invalid special character which they happened to use
for an internal delimiter. The quite reasonable reply was "Don't do that."
-John]

Thomas Koenig

2022-10-22 08:49:51 UTC

Permalink

Post by gah4
(snip)

Post by Thomas Koenig
Traditionally, internal compiler errors on invalid code have been
considered relatively easy. But you may always find a hard one...

There appears to be an art to it. At least one frequent submitter
of bug reports to gfortran has mastered, but I don't know how he
does it (and haven't asked).

An automated code generator which generates valid programs according
to the syntax and semantics rules of a langauge and then systematically
violates the rules (especially those prescribed outside the formal
grammar) one by one might be possible. Alternatively, it might
also be feasible to parse an existing code base and systematically
insert violations there.

I am not sure that research has been one on that, but it would
be interesting.
[I believe I have seen both random program generators and "fuzzers"
that make random changes to programs. This isn't just for compilers,
it's for anything that is supposed to interpret its input. -John]

Hans-Peter Diettrich

2022-10-22 23:27:18 UTC

Permalink

Post by Thomas Koenig
An automated code generator which generates valid programs according
to the syntax and semantics rules of a langauge and then systematically
violates the rules (especially those prescribed outside the formal
grammar) one by one might be possible. Alternatively, it might
also be feasible to parse an existing code base and systematically
insert violations there.

Isn't it good practice to maintain a test suite at least for compilers,
that contains both selected valid and invalid code snippets?

For error reports on obviously weird input I'd prepare an equally weird
answer ;-)

DoDi

gah4

2022-10-23 07:00:54 UTC

Permalink

On Saturday, October 22, 2022 at 7:51:45 PM UTC-7, Hans-Peter Diettrich wrote:

(snip)

Post by Hans-Peter Diettrich
Isn't it good practice to maintain a test suite at least for compilers,
that contains both selected valid and invalid code snippets?
For error reports on obviously weird input I'd prepare an equally weird
answer ;-)

The only one I know that actually, really, did that was TeX.
There is a test program that is supposed to execute all code
except for fatal errors.

To do that, you need some type of flow analysis to figure out which
parts of the program are, and especially are not, being executed.
The figure out what to add to the input to execute those that aren't.

I don't think I am quite as good now, but in my early programming days,
I had a tendency to try out features just to try them out, and often enough
find bugs that no-one thought about before.

The only one I can remember now is using ++ in C on a double variable.
There is no rule against using it on floating point types, but it seems that
compiler writers aren't so good at testing it.
[There are certainly code coverage tools that are supposed to let you exercise
all of the code in a program. Again, not just compilers. -John]

Thomas Koenig

2022-10-23 09:28:43 UTC

Permalink

Post by Hans-Peter Diettrich

Isn't it good practice to maintain a test suite at least for compilers,
that contains both selected valid and invalid code snippets?

Very much so.

For example, gcc requires two things for a patch: There needs to
be a test case to make sure that the bug is fixed, or the added
feature keeps on working, and the submitter needs to run the
testsuite to make sure that all other tests continue to work.

This is a good approach, but it has the same basic limitations of
all testsuites: They are never complete, and they also contain
compiler-specific code and also some errors.

Post by Hans-Peter Diettrich
For error reports on obviously weird input I'd prepare an equally weird
answer ;-)

If you have seen enough strange bug reports, it might be equally
challenging to come up with something new as to actually fix the
bug :-)

gah4

2022-10-19 08:33:19 UTC

Permalink

Post by Tianbo Hao
I am self-studying compilers these days, and find it really interesting. I
would love to apply for a PhD's degree in PL, can anyone kindly give me
some suggestions for the preparation?

It seems the CMU usually has a compiler class, but not this year:

https://csd.cmu.edu/course-profiles/15-411_611-compiler-design

A compiler class used to be part of the normal CS coursework,
but it seems less common now.

PhD work is less obvious, but I suspect that CMU is one of the
more likely places to find one.

Anton Ertl

2022-10-20 07:33:52 UTC

Permalink

I recommend reading proceedings of programming language and compiler
conferences, e.g., PLDI and OOPSLA. This gives you a knowledge of
current research topics, and who works on what.

And of course you should read "The researcher's bible":

https://www.research.ed.ac.uk/en/publications/the-researchers-bible

- anton
--
M. Anton Ertl
***@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/