Log in

No account? Create an account
19 May 2009 @ 10:11 pm
Commit message formatting  
(or: GNU coding standards considered harmful)

I've recently been watching GCC and GDB development, and every time I have reason to go looking through their commits, the total lack of useful information in the commit messages really strikes me.

Take this commit message for example, reproduced below for your convenience:

Log message:
	* fork-child.c: Don't include frame.h.  Include terminal.h.
	(fork_inferior): Call new_tty_postfork after forking adn adding
	the child to the inferior list.
	* inferior.h (new_tty_prefork, gdb_has_a_terminal): Don't declare
	* inflow.c (struct terminal_info): Remove const qualifier from
	`run_terminal' field.
	(inferior_thisrun_terminal): Tweak comment.
	(inflow_inferior_exit): Release the `run_terminal' field.
	(copy_terminal_info): New function.
	(new_tty_postfork): New function.
	* terminal.h (new_tty_prefork, new_tty, new_tty_postfork,
	(copy_terminal_info, gdb_has_a_terminal, gdb_setpgid): Declare.
	* inf-ptrace.c: Include terminal.h.
	(inf_ptrace_follow_fork): Copy the parent's terminal info to the
	* linux-nat.c: Include terminal.h.
	(linux_child_follow_fork): Copy the parent's terminal info to the
	* inf-ttrace.c: Include terminal.h.
	(inf_ttrace_follow_fork): Copy the parent's terminal info to the

Okay, so quick quiz: what does this change do? Does it fix a bug? Add a feature? Can you tell from this message? If you get lost in the useless meaningless text, well, me too.

What if instead of showing you that, I showed you this email and this email about an earlier part of the same change. Isn't that much better? Wouldn't it be nice if I'd shown you that to begin with?

Before you think I'm just picking on the maintainers of gdb, let me now point you to the GNU coding standards changelog guidelines, which they're faithfully following.

There's no need to describe the full purpose of the changes or how they work together. If you think that a change calls for explanation, you're probably right. Please do explain it—but please put the explanation in comments in the code, where people will see it whenever they see the code. For example, “New function” is enough for the change log when you add a function, because there should be a comment before the function definition to explain what it does.

Let's think about this...gdb (and all the rest of the GNU projects) are using a version control system. Even one as poor as CVS can tell me that a function is new, when it was added, and who did it. So exactly what value is putting "(new_tty_postfork): New function." in the changelog? Absolutely none. Did I really need the author to tell me that a new .h file was being included now? No, of course not, even if that might be interesting for some reason, it's utterly obvious from the diff.

The GNU coding standard is being actively harmful here, by recommending that people put useless junk in the change message, instead of an explanation of the change. It's really rather unfortunate.

What do I want to see in a commit message? Well, it seems to me that basically everyone except the GNU project has this well figured out already, but:

At a high level: what does the change do (implementation), and what is it for (reason). Include both a one-line summary of the change, and then some prose describing it in more detail. Try to keep it high level: in most cases, the details about each individual thing you touched in your change isn't necessary. I should be able to get that from the diff directly.
fuhm on May 20th, 2009 10:41 pm (UTC)
What they said.
I agree with the other two replies to your comment.

I will note that I find a properly-written, informative, high-level change log file quite useful -- it's often nice to see what changed between the last version of software you had installed and the current one, without getting into the nitty-gritty of the code. It should be more concise than VC commit messages, and targeted towards someone who uses your software, rather than someone who knows the code inside-out. For example, the Twisted NEWS file

But the GNU ChangeLog file as it stands provides zero value, and its existence causes an overall negative by getting in the way of something more useful.
Brooksbrooksmoses on May 20th, 2009 11:14 pm (UTC)
Re: What they said.
A "high-level change log file" is a different thing from the GNU ChangeLog file, and meeting a different need, though. So, yes, useful -- but, IMO, irrelevant to this question.

I think we're arguing at the wrong level, though; we're working from different suppositions.

The GNU rules are based on postulating the use-case that (a) many people will be getting source distributions in the form of sets of files, rather than in the form of repository dumps, and (b) many of those people will want to have some ability to track the historical changes in the files. Oh, and (c) these files may have taken a circuitous path to arrive at the end user and there may be no single canonical repository that represents them.

This use case happens in actual practice. For instance, a commercial variant of GCC will typically start out with a set of files from the FSF, and then make various changes and so forth to those files to apply local patches (such as backporting things to older versions that the FSF considers "frozen"). Those local patches will very likely be made within the company's version-control system, and their internal commit logs may reference proprietary information about their customers or for other reasons may be something they don't wish to allow public access to. And so "use CVSweb" or "use git-blame" or whatever is not a workable answer there for anyone outside the company. However, if the company has maintained a GNU-like ChangeLog file, that will be usable by anyone who receives a copy of the source files.

Now, I think if you agree that that use case should be supported, then you can pretty much derive the existence of either a ChangeLog file such as it currently exists in the GNU coding guidelines, or something pretty much equivalent.

If you don't agree with that use case, then I think the place to have a profitable debate is about whether or not it's a reasonable use case to support, not about the ChangeLog files that do or do not come out of the decision to or not to support it.

Edited at 2009-05-20 11:16 pm (UTC)
fuhm on May 21st, 2009 12:33 am (UTC)
Re: What they said.
Yeah, I don't see that as a valuable use case. Certainly not enough value for developers to spend their precious time manually writing down what is just about as useful as an automated summary of the diff output would be. The time people spend writing the ChangeLog entry would be much more profitably used writing a useful commit message.

Furthermore, if I'm getting a source dump from a commercial vendor, I certainly will also want to see why changes were made. The "what" is almost entirely useless without the "why". How would I know whether I should merge that change back into upstream (or into my branch), without the reason it was done? Do their GNU-style ChangeLog entries really get me that much more value than a diff -r upstream-source commercial-vendor-source would?

On the other hand, they're probably happier just providing me the GNU changelog format, because it's less useful to me. (I have to suppose that releasing source in a useless form is their goal if they're only providing a full dump rather than patch sets. And since not providing reasoning behind changes is the right thing to do according to GNU, even better, they don't even have to look evil!)

That said, I certainly wouldn't mind if developers would provide a GNU-style ChangeLog in addition to useful commit messages. The problem right now is that people write GNU-style ChangeLogs instead of commit messages. If you can get both, fine with me; I can skip right over it and just read the prose!