Today's post is a guest post from Nico Weber, who did so much good work on Chrome in his 20% time I didn't realize he wasn't fulltime on the project until it was announced that he joined.

Browsers are racing to add "Hardware accelerated rendering" to their feature checklists. As usual, this means different things in different browsers. In Chrome, the near-term plan is to accelerate CSS 3d transforms and WebGL.

Chrome's renderer process is sandboxed, so it can't access the graphics card directly. To remedy this, a new process is introduced: The GPU process. The GPU process talks to the GPU and keeps all GPU-related state (modelview matrix, textures, what have you).

CSS 3d Transforms

When a page uses CSS 3d transforms, the renderer renders the contents of the layer with the to-be-transformed content as usual into memory and then sends it to the GPU process (via shared memory), which keeps it in a texture. The GPU process then composites all the layer textures, using the current transform of every layer, and writes the result into an IOSurface the size of the tab contents (an IOSurface is a OS X primitive that represents a piece of GPU memory that can be shared across processes). Since the textures and the surface are all in GPU memory, this is a GPU->GPU copy, so this is fast. The GPU process then notifies the browser process, which takes the same IOSurface and renders it to the screen.

When e.g. the rotation angle of a layer is changed, the renderer only needs to send the updated angle to the GPU process, and recompositing can happen completely on the graphics card.

CoreAnimation plugins

For historical reasons, plugins using the CoreAnimation drawing model also send their contents to the browser in an IOSurface. For correct compositing, plugins would need to send them to the GPU process, which would then composite them with the other layers. The browser shouldn't need to do anything to draw such plugins. That's not implemented yet, and until then css transforms on plugins won't work.

Video playback

Longer term, the GPU process will also do hardware accelerate decoding of movie frames, as far as I understand. The movie container parsing will stay in the renderer, this wants to be sandboxed. The flow will be: raw data from the network arrives in browser -> sent to renderer -> splits it into raw video frames sent to gpu process -> gpu process does hardware accelerated decoding into a texture -> gpu composites that texture with its other textures into the IOSurface -> browser draws IOSurface. This might be somewhat inaccurate, I'm not too familiar with how video works.

WebGL

A WebGL texture upload looks similar: The encoded image arrives over the network in the browser -> sent to renderer for decompression -> uncompressed data sent to GPU process -> GPU process uploads image data into a texture. Since textures are usually uploaded only once and after that referenced by ID, that's not quite as terrible as it sounds.

When the renderer wants to execute a WebGL command, say glUniform4f(), it just calls that function. Through a shim library, the function call is serialized into something called a command buffer, which is then sent to the GPU process and executed.

Aside: Comparison with Safari

Safari currently does not have a separate sandboxed renderer process: HTML rendering and UI live in the same process, and this process can access the GPU directly. This means that Safari can upload the rendered HTML directly into a CALayer (roughly the same as an OpenGL texture) and let CoreAnimation composite all layers directly onto the screen. This saves a gpu->gpu copy.

For WebGL, Safari can call out to OpenGL directly, without having to ship all GL commands to another process.

2010/07/22 23:49

» wstring removal

Chrome, in one of the few truly Windows-specific places in its design, was originally written using C++ wstrings throughout. wstring is a string of wchar_t and generally represents a string of Unicode, but in Chrome originally they were used everywhere, even for plain ASCII values like the names of command line switches. wstrings are UCS-2* on Windows and UCS-4 on other platforms, which makes them very convenient on Windows where you can pass them directly to native APIs -- and much less sensible everywhere else.

I and others have slowly been removing and untangling them from the code but it is slow going. See, for example, how we're up to comment 47 on this bug. Every time you call into a new module that module will frequently expect a wstring so you need to do string conversions. Even worse, people continue to add more code that uses wstrings along with some TODO like "fix this for non-Windows platforms". It's a standard technical debt sort of thing: you're trying to finish your feature and worrying about the proper string type is as the bottom of your stack.

However, we still need to deal with Unicode text; what if not wstrings? One common approach is to use UTF-8 everywhere, which is what I had argued for, but there are two good arguments against this. One is that UTF-16 is the native string type of JavaScript, WebKit, and Windows, and the fewer encoding conversions the better. The more interesting argument is that programmers, even the rare enough sort of programmers who understand encoding issues, will always make mistakes and will mix up strings of ASCII, bytes, and UTF-8 without thinking about the consequences.** (This can perhaps be mitigated by a separate u8string type, but hey, I lost this argument.) The conclusion for Chrome was to migrate to using UTF-16 strings when necessary to store Unicode. Since e.g. myutf16string.data() gives you back a pointer of the wrong type to pass to fopen(), it's really hard to screw up.

My cleanup approach of late has been one of trying to limit ongoing damage: I make it so if you add new code that does the wrong thing, your code is more painful to write. This is accomplished by removing functions that accept wstrings from the lowest-level libraries, which means when writing higher-level code you have to keep thunking in and out of wstrings. For example our path abstraction (UTF-16 on Windows, UTF-8 on Mac, bytes on Linux) has methods ::ToWStringHack() and ::FromWStringHack() which hopefully make my colleagues feel bad every time they have to use them.

I worry that my anti-wstring crusade is a losing battle -- wasting more time than it's saving -- and one that will never end since it's not urgent to fix (I mostly work on it while waiting for other larger projects to progress; one of our open source contributors has also been helping***). Nor is it technically that important -- sure, we're wasting some memory, but the proper way to approach memory consumption is through measurement and tackling the big parts first (converting all the switch constants from four-byte wchars to single-byte ASCII shaved off a couple kb from our binary); sure, some users have file names that are bytes instead of Unicode but they can't be that common. But for me at least, it's more about the principle of the thing: being able to tell that two different collections of bytes are different things that shouldn't be mixed are what separates us from the animals.

* I know in theory they are UTF-16 but in reality few programmers ever get that right. For all practical purposes you're screwed if you're not in the BMP.

** "You should just write the code correctly" is never a good response to problems like these. Everyone always writes the code as bug-free as they can; our objective should be to make the compiler help with catching mistakes.

*** It turns out that even if you're not a very experienced programmer you can productively hack on large projects like the kernel or Chrome; you just need to tackle small tasks.

2010/07/09 12:18

» Antialiased clipping

Today's post is a guest post from Adam Langley, whose recent post about SSL extensions we're doing is also likely of interest to readers of this blog.

Clipping.

Most graphics libraries perform 'immediate' anti-aliased clipping. If you set a clipping path which involves a curve then you can draw shapes that intersect it and the edges will be anti-aliased. However, Skia (the graphics library which we use for Chrome on Linux and Windows) doesn't do this because anti-aliased clipping is just an approximation.

Consider the figure with the four squares, below.

At the top left is an anti-aliased clipping region. The darker the pixel, the more is covered by the path. If we were to fill the region with green, we would get the image at the bottom left. When drawing, we consider how much of the clipping region covers each pixel and convert that to an alpha value. For a pixel which was half covered by the clipping region we would calculate 50% × background_color + 50% × green.

However, consider what happens when we first fill with red (top right) and then with green (bottom right). We would expect that the result would be the same as filling with green - the second fill should cover the first. But for pixels which are fractionally covered by clipping region, this isn't the case.

The first fill, with red, works correctly as detailed above. But when we come to do the second fill, the background_color isn't the original background color, but the slightly red color resulting from the first fill. Both CoreGraphics and Cairo have this bug.

It might seem trivial, but if you end up covering anti-aliased clipping regions multiple times you end up with unsightly borders around the clip paths.

The second problem with anti-aliasing, even when done correctly, is that it makes it impossible to put polygons next to each other. The anti-aliased edges end up with hairline seams because the detail of the edge has been lost.

In order to ameliorate this we have a hack in place for Chrome. When a clipping path is requested we create a layer on top of the painting bitmap. All paints while the clip is in effect write to the layer. When the clipping state is popped we perform an anti-aliased clearing outside the clipping path and merge the layer down. We have a few tricks up our sleeve: we'll only create a single layer for multiple clipping paths in the same level of the clipping stack (because we can clean outside of them all when popping the stack). Also, the 'clip outside' operation is still Skia-native, thus aliased.

The single layer trick isn't strictly correct, but it saves memory and worked at the time. It depends on the fact that WebKit applies all the clipping paths for a level of the clipping stack before drawing anything.

As people work on WebKit, this is falling apart. The assumption that the single layer trick depends on isn't always valid any more. Also, people are starting to use clipOut and having two different clipping methods in play ends badly.

Compounding that, the layer code never worked for <canvas> at all. Canvas calls don't manage the clipping stack as WebKit does. In fact, canvas code might not ever bother to pop the clipping stack. So, for canvas, we still use immediate, 1-bit clipping.

(Also, the anti-aliased seams made deanm cry and I couldn't break his beautiful demos.)

So, we might consider adding immediate mode anti-aliased clipping to Skia. Sure, it's an approximation, but it appears that it might be an approximation worth having.

When a clip is in effect, an SkRegion is iterated over and results in a series of rectangles into which the drawing is performed. These rectangles, in the case of a clipping path, are one row high and consist of the scanlines of the path.

These scan lines are cached in the SkRegion when the clipping path is set. It stores them in an array of ints, which appear to be formatted as <Y value> <Number of X spans> <X1 start> <X2 end> .... These spans are generated by creating a dummy SkBlitter and feeding that, and the clipping path, to an SkScan. The dummy blitter records the scan lines that that SkScan calls back with (rather than painting as an SkBlitter would typically do).

So, as you can see, the 1-bit clipping concept is baked pretty deeply in here.

If we wanted the SkRegion iteration to result in an alpha array for each scanline we have a problem. The SkRegion is called for each new scanline, but the SkScan wants to call back for each scanline. Options:

  • Store the whole alpha channel in memory (a bitmap as large as the clipping region, or as large as the whole bitmap for a clipOut)
  • Rewrite the SkScan to invert the flow of control (rather you than me, sir)
  • Switch stacks inside of Skia (you think you have bugs now, wait till you see these ones!)

Also, the latter two involve rescanning the path for each drawing operation. I suspect that will seriously damage your speed.

The only saving grace is that the alpha channel consists of mostly 0xff. Only on the edges are there any other values. So the first option, with suitable compression smarts, might be reasonable.

However, I only allocated a Friday to this problem and that's probably several weeks of work to do.

In the mean time, we can continue to use layers with a couple of modifications. Firstly, the single layer trick doesn't work any more. The rest of WebKit adds clips after drawing and expects them to work. That means that each clip has to allocate a new layer.

Secondly, we can't mix clipping modes any more. That means that clipOut can't be immediate and, if we have a top-level clipOut, the layer is as big as the underlying bitmap. It also means that the rectangle clips have to be converted into paths, which is much slower.

So, in the interim, we will probably throw CPU and memory at the problem although everyone is pretty maxxed out for 6.x.

Bug 35516 was reported as a memory leak but if memory leaks you wouldn't notice because you run out of memory first. Rather than telling you to go read the bug, I can summarize it quickly:

  1. XSLT is part of the web
  2. XSLT allows scientific notation for input numbers
  3. XSLT allows roman numerals for stringifying numbers (why? it's likely useful when numbering an ordered list of items, for example)

You can helpfully combine the above by making a document containing the following, as the bug reporter did:

<xsl:number value="1e100" format="i"/>

and with that you'll discover that libxml's XSLT support consumes all available memory in computing the resulting string.

On the one hand, this is just an amusing anecdote; there are plenty of other ways you can make a page consume all available memory. ("Doctor, it hurts when I write out 1e100 in unary.") On the other hand, every one of these sorts of bugs leads to a full browser crash in single-process browsers. Mozilla quickly fixed the equivalent bug, but now that I look it seems they're cherry-picking it onto branches just recently; who knows how/whether other browser vendors are affected. My security friends tell me that OOMs are common in browsers so this isn't much of an emergency; it's not like there's any shortage of ways to crash IE.

What this anecdote most goes to show is this: there is a ton of API exposed to the web, more than you would ever expect. As I like to say to people, sandboxing WebKit is not because we didn't think WebKit wasn't a well-written piece of software -- its popularity proves it is quite good -- but rather that there is just too much code there to be confident it is all safe and correct.

What does it mean for a browser to be fast? It turns out to be a bit hard to pin down, really. I was asked about this and tried to give a lightning talk at the recent Ubuntu Developer Summit on how we think about application speed in general but I think I didn't make my points well, so I thought I'd try to expand a bit here on what you ought to consider when comparing browsers.

Benchmarks

Many start with benchmarks. Benchmarks are loved by the tech press because they give a score and you can make nice charts to show relative scores. But benchmarks by their nature only measure a very specific thing and can only attempt to simulate what users will experience. For browsers, the majority of benchmarks are JavaScript benchmarks; while nobody disputes JavaScript's importance, it also is not the dominant speed factor for simple web pages, and most web pages are simple. I think the importance of recent work in improving JavaScript engines has more to do with the sorts of sites we will create in the future, like this JavaScript NES emulator, though certainly there are plenty of existing sites like Gmail that hugely benefit from the current generation of fast JS engines.

The result of fixating on JavaScript benchmarks are frustrating comments like "Mozilla via Wine is faster than Linux-compiled Mozilla, therefore Mozilla doesn't care about Linux". This misinformed legend came from JavaScript benchmarks. But a browser's implementation of JavaScript is likely nearly identical code across platforms! I imagine the speed difference stemmed from the different qualities of compilers, and so the benchmarking difference Mozilla experienced ought to be the same for any other cross-platform browser. I find that comment frustrating on multiple levels. First, that the conclusion doesn't follow from the premise; second, that the premise isn't exactly true, because JS benchmarks don't matter as much as they're made out to; and then on top of that these benchmarks aren't even measuring the platform-specific code so the conclusion wouldn't follow even if the premise were true.

Newer benchmarks are coming out that attempt to cover more than just JavaScript. A good example is Dromaeo, which has a portion of the test that benchmarks the DOM. But on that note, be wary of third-party benchmarks! John, the author of Dromaeo, knows more about web development than most browser developers do, so I am much less skeptical of his test than I am of others. It is very easy to write a performance test that looks good but doesn't measure something useful; see e.g. the SunSpider 0.9.1 announcement for a discussion of how a bug in the test framework interacts with power management, and that flaw existed in a test being written by experienced browser developers rather than random web enthusiasts.

Cyclers

A perhaps better measure is the end-to-end performance of loading a real web page; that incorporates the JavaScript engine as well as the rest of the web stack: parsing HTML, measuring fonts, etc. We and Mozilla (and I assume the other browser vendors) have test suites that run a set of on-disk pages through their browser. It would be natural for third parties to use these tests to compare browser rendering speeds except that the pages you'd like to test are real content like Yahoo's homepage, and copyrighted and therefore not republishable. (I know our set of pages is in a private repo; I glanced through Mozilla's MXR and only found what looks like a placeholder.)

To make these tests repeatable, they load pages off the disk rather than fetching today's version of the web pages they test. (Same as the speed ads; look at the description there for technical notes.) None of the benchmarks discussed so far include network speed, and that's a shame: there are likely a ton of interesting things to be found in that area, like how different browsers use different per-host connection limits (a tradeoff: more requests in parallel, but you have to pay TCP slow start multiple times) or how Chrome will pre-fetch DNS of sites you typically visit on startup (this behavior in fact probably completely dominates any web-rendering or JS difference versus other browsers -- visit about:dns in Chrome for more info).

Even with network speed included there are other parts to a browser that affect performance, like the networking stack and the cache. I remember earlier in Chrome's development Mike discovered a network-level bug (my memory is vague but it was something buffering improperly, probably Nagle) that was causing us to fetch pages later than IE. The above tests wouldn't have revealed the performance improvement he had produced. Depending on how you determine when a page is done loading it may not even cover the time spent putting the pixels up on the screen. And loading Gmail is a crazy multi-second process involving multiple redirects and progress bars on top of the expected JS and rendering bits; I don't think anyone's tests cover Gmail load time yet.

Stopwatch

I think that sort of observation, that all tests are by their nature synthetic and don't cover real browsing, is the place from where Microsoft drew their browser benchmarks of last year, where they claimed IE 8 was the fastest browser available. Unfortunately, a benchmark where you say "we paid some people to look really hard at it and they concluded we were the fastest" doesn't convince a lot of people even if your intentions are pure. Though articles like this say it didn't pass the smell test, conceptually I think this sort of approach better captures what performance benchmarks are trying to measure. It is too bad this kind of test is not something you can reproduce reliably or I'm sure browser developers would all be optimizing for it.

Perception and Jank

But there are still other factors that make people call a browser fast. Continuing our journey from measurable hard numbers to fuzzier stopwatch tests, I assert that what matters more than your measured performance is what the user perceives as your speed, and in that respect here are a few more interesting areas to consider that are much farther away from web pages.

One is UI latency (what we call "jank"). Does the browser respond quickly when you type in the URL bar? When you make a new tab? Peter did a talk on this which I haven't watched, but surely that goes into a lot of detail. This was the area I had hoped to most impress on the Ubuntu developers as being important to consider in the software they develop: small little hiccups cause you to feel the application is slow even if it can render a thousand pages in a second. (For example, I think the package updating tools in Ubuntu are particularly bad in this area.) I think this is the area where we outperform Mozilla the most, and why we've become increasingly popular on Linux, though it's difficult to quantify.

One good example of a jank-reducing tactic is how inline autocomplete works in Chrome. When typing a URL, we attempt autocomplete the URL from your browsing history as well as show a drop-down of other things you may be looking for. To make it predictable what happens when you press enter, we synchronously autocomplete: there should never be a case where waiting some amount of time before pressing enter produces a different result from pressing enter immediately. But this means we can't autocomplete from data found on the disk, because waiting to load data from the disk would make the autocomplete laggy. The fix is to preload the entire completion set (it's small compared to your browsing history) into memory on startup. (But not exactly during startup -- there's actually a tiny window after you startup where typing in the URL bar doesn't autocomplete.)

Startup

We measure and optimize another performance stat that is almost entirely unrelated to the above categories: startup time. In my talk I picked on GNOME's calculator (in response they've already fixed it!), but there are plenty of other similar demos, like how I just counted to five after clicking the Ubuntu menu on my laptop before the menu came up. I've written posts before with more technical details about startup work we've done: both through benchmarking and fiddling with low-level system bits, but perhaps it's useful to step back and consider why it matters.

I was skeptical at first, but now I strongly believe that the startup time of an app sets the expectation for the rest of the app. It's something of a placebo effect surely, but taking one step further along the unmeasurability spectrum I think what matters more than even the speed your user experiences is the speed your user thinks they're experiencing. When you start up as quickly as a light-weight app, people feel they're using a light-weight app even when that isn't the case (in reality, any browser that can render the web is kind of enormous, us included). For example, despite switching editors a year ago I find myself still instinctively using vi occasionally just because emacs takes so long to start, and I don't even realize I'm subconsciously avoiding emacs until after I type the wrong commands into vi and I notice I've forgotten how vi works.

For some discussion of app startup, this Mozilla engineer's blog is definitely worth reading; I intend to steal his good ideas at some point. But more fundamentally, fast startup comes from doing less work at startup, which means careful engineering across all the code.

Conclusion

One rule I've learned from working on Chrome (it's been three years now, whoa) is that if you don't measure the performance of something, that performance will regress. It's just a natural consequence of how software development is done, where more time is spent adding things than removing things. To combat this we use buildbots running performance tests to generate charts (warning: enormous, browser-killing page), and our bots go red if performance regresses on those tests. (They frequently do, and then we fix the code.)

If I could have you remember one thing from this post, it is this: benchmarks are useful to the extent you understand the technical details behind them. If you are not a browser developer, it is my professional opinion* that the best way to evaluate which browser is faster is to just try them out yourself on pages you care about. For example Opera users claim its many features make them able to browse faster, and if that is true for them then I hope they enjoy Opera. And whatever you do, don't repeat that Mozilla JS stat anymore -- it really bugs me. :)

* Professional to the extent that it is my opinion as a professional, not that this is in any way a statement on behalf of my employer.

2010/05/05 23:59

» WOFF support

Adam implemented WOFF support for Chromium last week. He discusses his thoughts on it here, so I'll direct you there for analysis.

To confirm his similarity argument, I note that the WOFF-decoding bits were implemented within the OTS project in such a way that the existing WebKit support for TrueType fonts only needed about eight lines of patch to work.

(Why push everything through OTS, rather than integrating WOFF into WebKit? Because web fonts are scary.)

2010/04/30 11:12

» Caching Nodelists

Bug 33696 changed how some caches behave in WebKit, for a ~20% speed improvement in the Dromaeo benchmark.

For context, Dromaeo is a browser benchmark created by John Resig of jQuery fame. As compared to the V8 and SunSpider benchmarks (which focus on JavaScript performance), Dromaeo touches more of the code where JavaScript interacts with the browser.

But there was a catch: this caching change violated the letter of the HTML5 spec. The spec described a sensible but cache-unfriendly behavior, in that multiple calls to getElementsByTagName would each return a new object. That means that this code:

var x = document.getElementsByTagName('a');
x.foo = 1;
var y = document.getElementsByTagName('a');
alert(y.foo);

would have different behavior before and after this change.

Some head-scratching and digging into Mozilla code (browser development: so much easier when the browser vendors can just say, "let's see what {Mozilla,WebKit} does") found that Mozilla had already implemented such a cache, so it seems unlikely any web developers relied on either implementation of this behavior. And additionally the spec was inconsistent in its wording for related functions.

So the conclusion was to change the spec. I imagine this is exactly the sort of reason HTML5's intended timeline is so far out in the future. (Speaking of which, I recently saw someone dig up this old interview with Hixie which I found pretty interesting.)

The whole bug is worth a read.

2010/04/14 09:55

» URL copy and paste

The URL bar ("omnibox"), despite its appearance, is a weird hybrid of a text box and something else. IP addresses are what computers understand while words (search terms) are what humans understand. The omnibox attempts to span the two simultaneously, transiently switching between searches and URLs, prefetching DNS as you type.

Here's an interesting instance of this tension. Load a URL that contains some Japanese text as a query parameter. (Don't speak Japanese? Here's an example that would come up -- a Google search for some Japanese text.) The URL bar shows the text as you'd expect, despite it heading to the server as a bunch of percent-escaped bytes. So what happens when you try to copy and paste this URL?

Well, there are two use cases for copy and paste. If I select a word out of my URL bar and paste it into this post, I expect the word to come through -- the human interpretation of the URL. But then if I want to share this URL with a friend, I want to select the entire URL and paste it in its machine-readable form, with the escapes -- the machine interpretation of the URL.

So that is what we do: selecting portions of the URL copies it in the human-readable form, while selecting the entire URL copies it in its underlying machine representation. This seemed entirely crazy to me when I first learned about it but I had never noticed it until it was pointed out to me and I confirmed Firefox does it too.

(Why not just always use the human-readable form everywhere? I dunno. I guess you'd need to mandate UTF-8 only, otherwise it'd be unclear what sequence of bytes a given string of Unicode produces, which I think is probably ok because that's what browsers currently assume; but even then I worry about normalization forms and sites that use non-UTF-8 byte strings in their query parameters. I wonder what we and other browsers do with Big5 in a URL?)

This is the reason I'm not too concerned with the news we're dropping the http:// prefix from URLs using that protocol. It's already what browsers assume on input from an unprefixed URL, since users never type them anyway (though our Strict Transport Security has allowed sites to opt into HTTPS instead). And copy and paste should work as before since we're already doing special behavior for it. (What about X selections? The fact that they currently behave differently is just a bug, nothing more. I'm a little surprised it's not going down the same code path.)

2010/03/25 21:59

» Other Unixes

Various people interested in running Chrome on non-Linux unixes have sniffed around the project and eventually someone stepped up: there's a FreeBSD port that seems to work for some people. (I notice they've recently updated their page to ask for money; I'll note it seems pretty legit to me, since I have firsthand experience to tell you that author has put a lot of time into it.)

But dumping a multi-hundred-kb patch file is not enough to consider a product ported -- all of the code ought to land the upstream Chrome tree, and quickly since it will rot quickly. This burden has fallen instead to an enthusiastic OpenBSD developer, who has taken the FreeBSD patch and started on this perhaps larger task: tediously splitting the patch up into bite-sized reviewable chunks, and then modifying each through the code review process (frequently resulting in larger changes than the original port). I feel like I've looked at twenty patches from him so far and more come every week.

Not to be left behind, recently I've also been seeing and reviewing patches from someone working on getting things to run on Solaris, though my perception is that it is much farther from working. Had I time to get multiple interns perhaps I'd have allocated some to these projects; in retrospect they also would've been good summer of code projects.


Porting from Linux in theory shouldn't be too much harder than just recompiling, but practice is practice because it differs from theory. The bulk of the patches just extend the ifdefs and build rules to add the new platforms. When we only had three platforms to worry about, we didn't go to much effort to make a distinction between Linux-specific bits, marked with OS_LINUX, and "stuff that generally is used on free Unix-like operating systems", such as making dotfiles in your home directory, X11 clipboard behaviors, or GTK.

For the decent amount of code we do share between Linux and the Mac (e.g., the IPC subsystems between Mac and Linux are nearly identical, and implemented in terms of Unix domain sockets) there is a shared OS_POSIX define already in use, but there is also some code that is Linux-specific; it's a waste of time to try to anticipate in advance the right layer for abstractions (like, say, how we currently find our binary path by looking in /proc). The majority of the shared code, however, is in bits like GTK that are going to be literally identical across Linux/*BSD/Solaris and perhaps others.

The initial temptation was to make some sort of OS_UNIX_THAT_ISNT_MACOSX define. But nobody could come up with a good way to describe what that actually means, and instead we now we have a larger set of feature-related defines, like TOOLKIT_GTK (that is, the UI is implemented matching the GNOME interface guidelines), USE_X11 (necessary for all of our backing store tricks), and even USE_NSS (the crypto library used for SSL). The last one is especially a victory of generalization done by porters because it turns out for various reasons we're moving towards using NSS on Windows, and to bring that previously Linux-specific code up on Windows mostly involved flipping a few defines.

With that said, there's kind of an endless long tail of configurations that will get progressively harder and harder to support -- even Linux at all is something like 1% of our user base, and I imagine the BSDs are 1% of that. I expect in the longer term, it will probably be the case that these ports will work like the FreeBSD one, where they get a working tree snapshot going every few months without consistently-working code in the interim. But on the other hand, free software means that if someone wants to run our code on NetBSD, they just need to step up with their editor.

2010/03/04 10:57

» Latin-1 decoding change

Originally I started this blog with the intent that each post would be about a particular commit. Here's a post back in that vein, about a change I had no involvement in but thought might be of interest to you:

WebKit bug 35233 -- Optimize Latin-1 decoding in TextCodecLatin1::decode()

How many ways do I like this change?

  • Nokia wrote it but we all benefit from it;
  • C++ master Darin proposed a template-hacker modification to it that makes it type-safe without losing the speed benefit;
  • The precommit bots that Googlers wrote caught that an earlier version would've broken the 32-bit Mac build;
  • The patch itself is a simple but clever low-hanging-fruit optimization.

The only piece it's missing is a regression test, but I guess WebKit currently doesn't have a lot in the way of performance regression tests.

PS: Since I'm again deep in a free-software lovefest, I'll just bury here that it's also pretty neat that Mozilla is both using bits of Nitro (one of the many confusing names of the Apple JavaScript engine) for improving JS speed and bits of Chrome for its out of process plugins. My mind circles around all of the biological metaphors we use for these things -- source trees, cross-pollination of ideas -- and I feel briefly like I'm living in the future.