Log in

No account? Create an account
24 September 2007 @ 05:14 pm
So I want to write a setuid program. And, I want it to not eat the environment (namely, LD_LIBRARY_PATH). It seems that this is impossible to do in linux.

Now, you might reasonably ask why I would want to do such a thing?

Here's an outline of what I want:

a) User invokes setuid program, giving as arguments another program they want to execute.
b) setuid program does some stuff as root (let's say, for illustrative purposes, setting niceness level to -10)
c) setuid program drops privileges
d) setuid program calls exec, passing the user-specified program.

So, see, I'd really like LD_LIBRARY_PATH to pass through my setuid app, to the exec'd (as the originally invoking user) process.

Here's what I found:

ld.so eats all the linker environment variables before even starting my program. Okay, surely it'd be better to ignore them instead of removing them, but whatever, I'll just link my program statically, that ought to solve the problem, right?

NOPE. I lose. In the name of security, the statically-linked-program startup code also erases the environment variables. Apparently there was a security hole at one point with some statically linked suid program calling exec without passing an explicit sane environment. The program it exec'd, if dynamically linked, would of course use the LD_LIBRARY_PATH in the environment, since it wasn't suid. Oops, instant root vuln. So to fix this, even statically linked programs cleanse their environment.

But one silver lining: in the statically linked case, it's actually glibc startup code which is eating the environment, which theoretically I should be able to override in some fashion. All I have to do is take control of the startup sequence before glibc cleanses the environment, make a copy, and continue the normal startup sequence. I thought perhaps defining _start myself, or something along those lines, but I can't manage to get it to work.

Can anyone help me?
18 September 2007 @ 10:26 pm
I'd like to tell you a story today. (Okay, maybe that's a lie. It's really more of a rant.)

Perhaps you're familiar with the program "find". You can (ahem) find it on pretty much any unix system since the dawn of time. This story is about one particular "find" implementation: GNU find, and in particular, two of the predicates it supports: -path, and -ipath.

Sometime in 1993, GNU find 3.8 was released. This release had a few new interesting command line arguments, including "-path" and "-ipath".

On Dec 30, 1993 NetBSD got -path (with comment: "Merged our bugfixes with the 4.4BSD find from uunet")

On May 27, 1994, the history for find in FreeBSD's cvs repository starts. It had -path, but not -ipath.

On Feb 23, 2001, FreeBSD find added -ipath (amongst others)

Okay, now on to the fun part

On Aug 8, 2004, GNU find deprecates -path and -ipath at the request of Richard Stallman, and at the same time, adds two new names for the same functionality: -wholename and -iwholename. Why? Because he didn't like the names(!). The maintainer of find went along with the request. Furthermore, he states "Use of the predicate -ipath generates a warning about the deprecated status of -ipath. Use of the predicate -path does not, since -path is also implemented by the HP-UX operating system." The deprecation and new predicates make it into GNU find 4.2.0.

Okay...so, let me get this straight: it's considered important, in 2004 to keep compatibility with HP/UX, a dead commercial unix, and thus find doesn't print a warning for the now-deprecated "-path" predicate. But apparently nobody cares about keeping compatibility with 10 years worth of GNU find syntax (and, not coincidentally, 3 years of history in FreeBSD find), so it's okay to print annoying warnings about -ipath? Wow, just Wow. Nice priorities there. Cause, you know, there's nobody who'd ever want to write any script which works on GNU find 4.1 and GNU find 4.2 at the same time.

Oh, but it gets even better, now.

On Aug 17, 2007, a bug is reported against find:

The next revision of POSIX (at least, as of draft 3 of POSIX 200x, freely available to Austin group members), will mandate the addition of the -path expression.

Right now, GNU find has -path == -wholename, and -ipath == -iwholename. Of these four expressions, only -path is (going to be) POSIX-mandated, while only -ipath causes a deprecation warning. Perhaps it is time to reverse this, and make both -path and -ipath be warning-free, and instead make -wholename and -iwholename issue deprecation warnings (-wholename because the same thing can be achieved via a POSIX-mandated alternative, similar to how -d is deprecated in favor of POSIX-mandated -depth; and -iwholename for consistency with -wholename).

The patch was applied on 22 August 2007, to be released in GNU find 4.3.9.

On Sep 18, 2007, I installed Debian etch on a system, which has GNU find 4.2.28 rather than 4.1.20, and it starts printing loads of warnings: "find: warning: the predicate -ipath is deprecated; please use -iwholename instead.". I investigated the reason why the change was made, and get even more irritated. I then find that GNU find 4.3.9 reverses the change and get even more again irritated.

And finally, one last gem: GNU find 4.2 also introduced new extensions: "-warn" and "-nowarn" to "Turn warning messages on or off. The default behaviour corresponds to -warn if standard input is a tty, and to -nowarn otherwise." Wow, that almost sounds sensible. So all I have to do is use find < /dev/null to get rid of the warning in a compatible way. No, of course not, that would be way too easy. I'll give you one guess as to what warning message -nowarn doesn't suppress. Very funny, eh?

Please, if you write fundamental software that I depend on, try not to screw up like this. If you're going to make a change that has negative value to begin with, at least have the decency to do it right.
13 September 2007 @ 03:08 am
I'm not sure what to think about this email I received yesterday. Were the messages I wrote to the python-3000 mailing list sufficiently witty as to attract the attention of a book publisher? Or maybe it was my awesome expositional abilities?

Or maybe this was spam, just blindly sent out to random people who had written emails to the list. I'd like to think the first, but the second seems more likely given the content of the message...

I am writing you as an acquisitions editor with
[Omitted]. I came across your name on the
Python 3000 mailing list. I have recently heard about
Python 3, and was wondering if you are very familiar
with the program? Do you expect it to grow in
importance and/or?interest? Do you think it would be a
good book topic to pursue? If so, any potential
authors come to mind? Or perhaps this is a project
that interests you???Your feedback is greatly
appreciated. I look forward to hearing back from

(Note: extraneous ?s are original, not an artifact, on my side, at least).
23 August 2007 @ 03:55 pm
About four years ago, I put together a small linux box with 512MB of RAM, an Athlon XP 1800+, a Hauppauge TV tuner card, a 120GB hard drive, and MythTV. I connected this to my TV with an S-Video cable from the built-in motherboard video, built myself a IR receiver and had a pretty nice PVR.

Of course, over the years, I've upgraded various pieces of the system. I added a DVD drive, and threw out my set-top DVD player. It turns out, mythtv not only makes a better PVR than is available commercially, it also makes a better DVD player than many on the market. Partly because of two "features" it doesn't have: "user-operation prohibited" (aka: You Will Watch These Previews) and Region Coding. But also, it has a good output scaler, the drive doesn't often have trouble reading DVDs, and it doesn't crash as often as my old player did. (yes, that's right, my hardware DVD player crashed more often.)

I've also done some upgrades in support of High Definition TV. I now have a PCHDTV 3000 digital tuner to record the MPEG-2 digital streams (some HD, some SD), a GeForce FX 5200 video card to be able to output the 1920x1080 resolution, and a Infrant ReadyNAS with 500GB (soon to become 750GB) of storage.

Cost is certainly not an advantage this system has over renting a PVR from your cable company. It's been a while, but as I recall, the original system cost me about $600 to put together. All told, the upgrades since then probably cost another $800.

I won't go through the many advantages it has over a cable company PVR or a tivo, but, two of the most obvious are automatic commercial skip and that you can schedule programs via a web server, rather than having to use a remote control. But the most important to me, really, is control. I can make it do whatever I want. I don't have to be limited by the functionality the cable company wants to let me have. I'm no "open source or die" zealot either. But when they're putting the majority of their work into features expressly designed to inconvenience me, it just makes sense to use an open platform.

However, there are two ways I'm quite dependent upon the whims of an external company for it to keep working. The first is the actual cable signal. I'm lucky enough to live in an area in which both RCN and Comcast provide cable service. I'm very happy about that, because RCN provides unencrypted MPEG-2 streams for all the "basic cable" channels, plus a few extras. And that's a pretty good list. I'm told Comcast does not; they encrypt everything except the channels available over-the-air. Now, RCN doesn't broadcast unencrypted by mistake, it is their explicit policy. However, there's always the concern that they might decide to change that policy once CableCards become more ubiquitous. If they did, I'd be in trouble, as it's currently impossible to use a CableCard with linux. This is of course by design (as I'm an evil hacker trying to pay for and watch their content, don't'cha know, can't have that).

Oh, and while in the middle of writing this, I just ran across an article discussing how users of the HD TiVo are also going to be screwed soon, because the HD TiVo doesn't support bidirectional CableCard communication. Why doesn't it? Because the cable company wants to control the entire user interface of any device connected to their network. So much for innovation...


Another important part of a PVR is the tv listings. If it doesn't know what programs are showing, it's not really usable. While you might think that obviously the TV stations want to disseminate the program listings as far and wide as possible, it turns out, the TV listings aren't actually provided by the stations. They often have no idea what they're playing, other than the absolute basic timeslot data. So, no episode info, descriptions, etc. In the USA, much of that is determined and distributed by one of two companies: Tribune Media Services or GemStar. Of course, my cable company pays one of them for a subscription to the listings, so they can show it to me on the TV Guide channel and the set-top-box TV guide. But do they make the data available to me? Of course not. Luckily TMS has been directly providing a data feed in XML format free for non-commercial use for a few years now. Unluckily, they decided to shut it down, effective September 1st.

Fortunately for me, a number of tv-related OSS developers got together and started a non-profit organization, Schedules Direct, to fill this void. Unfortunately for me, they now charge a subscription fee, to cover the fee that TMS is in turn charging them.

So, now, I'm going to have to pay a subscription fee to get the same guide data that I'm already paying for as part of my cable subscription. The cost is pretty small, but still, irritating.

Why do content companies try to make it so damn hard to pay them for content? I'd be in the market to upgrade to an HD-DVD or Blu-Ray drive to replace the DVD drive, except, of course, I couldn't use it, because it's currently impossible to play such a disk on an open platform. I'd be willing to pay the cable company for some premium movie channels, except, of course, I couldn't use them, because it's impossible to decrypt them on an open platform. It really boggles my mind.
16 August 2007 @ 12:27 am
My bzr adventures are unfortunately not going as well as they seemed to be initially.

Here's the scenario that more closely models the actual workflow I wish to have with bzr.

The players:
1) Official "trunk" SVN repository
2) Automatically updated BZR repository mirroring SVN repository. Not writable by anyone other than SVN server.
3) A bunch of users.

Here's the plan:
a) User 1 makes a branch off BZR repository.
b) User 1 works on some code.
c) User 2 makes a branch off BZR repository.
d) User 2 merges User 1's branch into his own
e) User 2 modifies some stuff User 1 was working on.
f) User 1 makes some final commits to his branch and then pushes his branch to SVN.
g) BZR mirror of SVN gets auto-updated.
h) User 2 pulls new revisions from BZR mirror of SVN. bzr should know that the commits he merged directly from User 1 have already been applied and not try to re-merge them.
i) User 2 commits to SVN.

And now, the problems...

Problem 1
bzr-svn has a time-consuming step of pulling down all the revision mapping info from the svn repository, and then analzying the repository to figure out the branching scheme, before it can do anything. This data isn't stored with the branch, it's instead stored in ~/.bazaar. This means that every user has to repeat this, the first time they want to get started with bzr. Unfortunate.

Problem 2
Remember bug 131692 I ran into before? Well, that is killing me again, but this time I don't know how to work around it easily. Step (h) works (bzr does indeed know which commits were merged, as it stores that info in file properties in the svn repository, yippie!). However, User 2 cannot commit back into SVN, because of that bug.

Problem 3
My actual repository is laid out like this:

When I first used bzr-svn, I told it to pull from $SVNREPOS/trunk/project1. Now, when I did this, it chose a branching scheme of "single-trunk/project1" (as shown in ~/.bazaar/subversion.conf). So, now it refuses to pull down any other projects or even other branches of the same project. It recommended that I type "bzr help svn-branching-schemes" and choose a different branching scheme, but this didn't help me, as there are only two options listed there: "trunk", for the /trunk/* layout, and "none" for no branches, just one branch at the root.

AND! This is a per-user configuration variable. So, even if I do find out that there is a branching-scheme I could manually change to in the configuration file, every user would also have to make this same modification to their configuration file. That's not really usable.

Problem 4
bzr is currently *damn* slow. My repository has about 60k revisions in it. This seems to be more than bzr is currently able to really handle. Making a branch from shared repository to other outside the shared repository (on a local disk) takes about 15m, and making one within a shared repository takes 1m30s. The first number is important, because bzr will take at least that long for a remote user to get the repository. The second number is important, because this is something that people are supposed to be doing often. And both are simply way too long.

The end
I think this'll probably conclude my experimentation with bzr for now. It's looking nice, but it's just not usable quite yet. There seems to be quite a lot of activity going on trying to fix issues (most especially I know the speed issues are being worked on), so I hope that when I try again another release or two from now, bzr will be blazing fast, and the bugs and usability problems in bzr-svn will have been fixed.
10 August 2007 @ 08:00 pm
So, after the great success I had converting my existing repository, I decided to try out some scenarios which gave git-svn a hard time.

Scenario 1: Branch from svn into bzr. Make some changes in bzr. Also make some changes in svn. Now, commit changes from bzr back to svn.

Initial test: FAILED. Here's the test case.

set -e
set -x

mkdir test-bzr-svn
cd test-bzr-svn


# Create SVN repository
svnadmin create svnrepo
svn co file://$BASE/svnrepo/ svn
cd svn
echo "asdf" > foo
svn add foo
svn ci -m "first commit"

# Make a branch in BZR, and commit stuff to it:
cd $BASE
bzr branch file://$BASE/svnrepo bzr
cd bzr
echo "sdfg" >> foo
bzr add foo
bzr ci -m "bzr add to foo"
cd ..

# Make a commit in svn.
cd $BASE/svn
echo "newfile!" > newfile
svn add newfile
svn ci -m "added newfile with svn"

cd $BASE/bzr
bzr merge file://$BASE/svnrepo/
bzr ci -m "Merge from svn"
# THIS FAILS with error:
# bzr: ERROR: These branches have diverged.  Try using "merge" and then "push".
bzr push file://$BASE/svnrepo/

A helpful person on #bzr suggested that, while this was likely a bug in bzr-svn, I could probably work around it by using "bzr checkout" to keep a "clean" branch of svn in bzr, and doing all the actual work on a secondary branch.

Test result: PASSED (yay! bzr-svn works!)

set -e
set -x

mkdir test-bzr-svn
cd test-bzr-svn


svnadmin create svnrepo
svn co file://$BASE/svnrepo svn
cd svn
echo "asdf" > foo
svn add foo
svn ci -m "first commit"

cd $BASE
# --dirstate-with-subtree is a secret option that makes bzr-svn
# work inside a shared repository
bzr init-repo --dirstate-with-subtree bzr
cd bzr
bzr checkout file://$BASE/svnrepo trunk
bzr branch trunk branch
cd branch
echo "sdfg" >> foo
bzr ci -m "bzr add to foo"
cd ..

cd $BASE/svn
echo "newfile!" > newfile
svn add newfile
svn ci -m "added newfile with svn"

cd $BASE/bzr/trunk
bzr update
bzr merge ../branch
bzr ci -m "Merge from svn"

Okay, so, we're looking good.

Next up: more complicate branching and merging, simulating multiple developers all using bzr-svn, merging between them, and committing back to svn.

Update: Bug was filed.
09 August 2007 @ 08:03 pm
A few months back, I decided to try out some of the fancy new version control systems. I'm currently using svn, and this is simply not going to change immediately, so the foremost requirement for another system is that it must be able to interoperate with svn. Given this, I found basically two choices: bzr (with bzr-svn plugin) and git (with git-svn plugin).

I first tried bzr, but back then (with bzr 0.16, I believe), it completely and utterly failed to work with my repository, spewing unintelligible tracebacks at me. I gave up and tried git. Git is very fast. Ultra Mega fast. After fixing a minor bug in the git-svn plugin, it managed to convert my repository (about 60k revs on the trunk) in about 9 hours. Pretty damn good. Unfortunately, I soon ran into serious roadblocks: it seems as if it's pretty much impossible to use branches in git, and still be able to commit to svn using git-svn. That seems to me to ruin the whole point of using git. So I left that alone.

Recently, I decided to give bzr another chance. I'm now using bzr trunk (0.19.0dev0).

So I ran a command: bzr branch svn+ssh://hostname/repository/trunk. Nifty that it's just like branching from a bzr repository, no different command or anything. So, good news: this time, I did end up converting my ~60k revision repository without crashing. I ran into a number of problems along the way, however:

There's three steps bzr goes through in the process of branching a svn repository. Each one had a problem:

  1. Initially loading in the metadata from svn.

    This leaks memory like a sieve, and ended up using 9GB by the time it was done. Luckily I have 12GB of memory in my machine.

  2. Analyzing the repository.

    This was going *realllllly* slow (didn't finish in 10 minutes). It turns out this is because bzr stores a sqlite database in ~/.bazaar, which for me is on NFS. Bad news. I symlinked that to a local directory, and it finished in < 30 seconds. The sqlite database was only 350MB, it could've stored the entire thing in RAM. Maybe if bzr used fewer transactions, it wouldn't be so slow? (https://bugs.launchpad.net/bzr-svn/+bug/131008)

  3. Loading revision data. I thought this was going really well. It finished about 30000 revisions in about an hour. However, by the end of the process, it was simply crawling, taking 25 minutes to finish the last 800 revisions. Additionally, this too leaked memory like a sieve. Part way through the process, I noticed that it was using up all of my memory, so I interrupted it and restarted.

    Interrupting a bzr branch operation is an interesting story in its own right. According to the official story, you can't resume. You can if you try hard enough, though, by manually adding a branch to your repository:
    python -c "import bzrlib.bzrdir; bzrlib.bzrdir.BzrDir.open('.').create_branch()

    And the continuing the operation via "bzr pull svn+ssh://blahblah". YMMV, this isn't exactly a supported operation. :) (https://bugs.launchpad.net/bzr/+bug/125067).

Anyhow, yay! I've got a bzr branch of my svn repository.

Next up: trying to do something useful with it.