michael@0: michael@0: michael@0: michael@0: ongoing michael@0: http://www.tbray.org/ongoing/ michael@0: michael@0: michael@0: rsslogo.jpg michael@0: /favicon.ico michael@0: 2006-04-26T20:10:25-08:00 michael@0: Tim Bray michael@0: ongoing fragmented essay by Tim Bray michael@0: All content written by Tim Bray and photos by Tim Bray Copyright Tim Bray, some rights reserved, see /ongoing/misc/Copyright michael@0: Generated from XML source code using Perl, Expat, XML::Parser, Emacs, Mysql, and ImageMagick. Industrial strength technology, baby. michael@0: michael@0: michael@0: Spring in White on White michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/26/Spring-in-White-on-White michael@0: 2006-04-26T13:00:00-08:00 michael@0: 2006-04-26T20:10:16-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
Most people would generally prefer a climate where it’s bright and warm most of the time. But for Canadians and others who live where it’s not, there are compensations, and one is the experience of spring. I have a picture.
michael@0:
michael@0:

Most people would generally prefer a climate where it’s bright michael@0: and warm most of the time. But for Canadians and others who live where it’s michael@0: not, there are compensations, and one is the experience of michael@0: spring. I have a picture.

michael@0: Pear blossoms against cherry blossoms michael@0:

The blossoms are pear in the foreground, cherry behind.

michael@0:

After all the months of 50° North Latitude winter—icy-sharp in most michael@0: of Canada, wet and dark here in Vancouver—the soul, the spirit, and the michael@0: libido all spring to life when the sun comes back. We’ve had a solid year of michael@0: crappy weather, but this last Saturday through Monday were solidly summery, michael@0: bright michael@0: and warm; and in this season the days are already long and each gets michael@0: longer so fast you can feel it.

michael@0:

On the back porch, our pear tree’s branches were silhouetted against the michael@0: neighbors’ big wild old cherry; the cherry yields no edible fruit but who michael@0: cares, it’s beautiful michael@0: tree any time of year.

michael@0:
michael@0: michael@0: michael@0: Scott michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/26/Scott michael@0: 2006-04-26T13:00:00-08:00 michael@0: 2006-04-26T20:06:50-08:00 michael@0: michael@0: michael@0: michael@0:
michael@0:

I’ve been watching our internal leadership conference and spending quite a michael@0: bit of time talking in the virtual hallways, and I’ve been surprised at michael@0: the intensity of feeling about Mr. McNealy. Yes, there are those michael@0: here saying “About bloody time, now we can make some progress” but there’s a michael@0: much bigger group that is genuinely emotional about this transition. michael@0: Maybe it’s a function of seniority: I never met nor corresponded with Scott, and michael@0: he hasn’t been michael@0: much of a presence in the company’s conversation in the time I’ve been here. michael@0: But there are a lot of smart, seasoned, unsentimental people making it clear michael@0: that michael@0: he’s been a major force in their lives, at a more personal level than I’m michael@0: used to hearing when people speak about executives. I guess also that to a michael@0: lot of people, Sun’s vision, for which Scott gets some of the credit, was a michael@0: radical and wonderful thing. I first used Unix in 1979 and quit a nice michael@0: big-company job michael@0: to become a VAX-bsd sysadmin in 1983, so I’ve always kind of michael@0: lived inside that vision. michael@0: But I’ll tell you one thing, what I’ve been hearing the last couple of days michael@0: makes me really regret that I didn’t get to know Scott.

michael@0:
michael@0: michael@0: michael@0: Jacobs, Pictures, Spartans michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/26/Jane-Jacobs michael@0: 2006-04-26T13:00:00-08:00 michael@0: 2006-04-26T17:28:59-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
michael@0:

Jane Jacobs died; michael@0: the city I live in, Vancouver, is pretty solidly Jacobsian both in its current michael@0: shape and its planning dogma. By choosing to live here I’m empirically a michael@0: fan. Oddly, few have remarked how great Jacobs michael@0: looked; her face commanded the eye. Which leads me Alex michael@0: Waterhouse-Hayward’s wonderful michael@0: Jane Jacobs & Viveca Lindfors; michael@0: surprising portraits and thoughts on decoration. W-H’s blog has become one of michael@0: only two or three that I michael@0: stab at excitedly whenever I see something new. For example, see michael@0: Sex Crimes, Homicide and Drugs michael@0: and yes, that’s what it’s about. michael@0: Staying with the death-and-betrayal theme, and apparently (but not really) michael@0: shifting back 2½ millennia, see John Cowan’s michael@0: The michael@0: War (after Simonides), being careful to look closely at the links. michael@0: I’ve michael@0: written michael@0: about those same wars.

michael@0:
michael@0: michael@0: michael@0: LAMP and MARS michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/25/Scaling-Rails michael@0: 2006-04-25T13:00:00-08:00 michael@0: 2006-04-26T07:24:06-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
michael@0:

At michael@0: that Rails conference, when I michael@0: was michael@0: talking michael@0: to michael@0: Obie Fernandez, he asked, more or michael@0: less “How can Sun love us? We’re not Java” and I said, more or less, “Hey, michael@0: you’re programmers, you write software and there have to be computers to run michael@0: it, we sell computers, why wouldn’t we love you?” Anyhow, we touched on michael@0: parallelism a bit and I talked up the michael@0: T1; michael@0: Obie took that ball and michael@0: ran with it, michael@0: saying all sorts of positive things about synergy between Rails’ michael@0: shared-nothing architecture and our multicore systems. Yeah, well, good in michael@0: theory, but I’m too old to make that kind of prediction without running some michael@0: tests. Hah, it turns out that michael@0: Joyent has been michael@0: doing that, and have michael@0: 76 michael@0: PDF slides on the subject. michael@0: If you care about big-system scaling issues, read the whole thing; a little michael@0: long, but amusing and with hardly any bullet lists. If you’re a Sun michael@0: shareholder looking for a pick-me up, check out slides 40-41, 49, and 52-74. michael@0: Oh, I gather that the T1, Solaris, and ZFS are OK for Java too. michael@0: [Update: The title was just “SAMR”, as in LAMP with two new letters. michael@0: Enough people didn’t get it that I was forced to think about it, and MARS michael@0: works better anyhow.]

michael@0:
michael@0: michael@0: michael@0: Real-Time Journalism michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/25/Talk-With-Berlind michael@0: 2006-04-25T13:00:00-08:00 michael@0: 2006-04-26T06:40:19-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
michael@0:

I got email late yesterday from michael@0: David Berlind: “Hey, can michael@0: I call you for a minute?” He wanted commentary on michael@0: a story he was writing that I michael@0: think is about the potential for intellectual-property lock-ins on RSS and Atom michael@0: extensions. I say “I think is about” because the headline is “Will or could michael@0: RSS get forked?”. After a few minutes’ chat, David asked if he could record michael@0: for a podcast, and even though I only had a cellphone, the audio came out OK. michael@0: The conversation was rhythmic: David brought up a succession of potential michael@0: issues and answered each along the lines of “Yes, it’s reasonable to worry michael@0: about that, but in this michael@0: case I don’t see any particular problems.” michael@0: Plus I emitted a mercifully-brief rant on the difference between protocols, michael@0: data, and software. michael@0: On the one hand, I thought David could have been a michael@0: little clearer that I was pushing back against the thrust of his story, but on michael@0: the other hand he included the whole conversation right michael@0: there in the piece, so anyone who actually cares can listen and find out what michael@0: I actually said, not what I think I said nor what David reported I said. michael@0: I find this raw barely-intermediated journalism (we michael@0: talk on the phone this afternoon, it’s on the Web in hours) a little michael@0: shocking still. michael@0: On balance, it’s better than the way we used to do things.

michael@0:
michael@0: michael@0: michael@0: The Transition Explained michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/24/CEO-Transition michael@0: 2006-04-24T13:00:00-08:00 michael@0: 2006-04-24T16:49:05-08:00 michael@0: michael@0: michael@0: michael@0:
michael@0:

It’s not that complicated, really. michael@0: Bloggers are michael@0: taking over the world. michael@0: Resistance is futile; you will be assimilated.

michael@0:
michael@0: michael@0: michael@0: 5✭♫: One More Cup of Coffee michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/24/One-More-Cup-Of-Coffee michael@0: 2006-04-24T13:00:00-08:00 michael@0: 2006-04-24T13:00:00-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
I’m not really a Bob Dylan fan. A voice like that, and a tunesmithing talent like that, come along only a few times per century, but he’s still kind of irritating. That aside, the song One More Cup of Coffee, from the 1976 album Desire, can’t be ignored; wonderful tune, wonderful orchestration, wonderful performance. (“5✭♫” series introduction here; with an explanation of why the title may look broken.)
michael@0:
michael@0:

I’m not really a michael@0: Bob Dylan fan. A voice michael@0: like that, and a tunesmithing talent like that, come along only a few times michael@0: per century, but he’s still kind of irritating. michael@0: That aside, the song One More Cup of Coffee, from the 1976 album michael@0: Desire, can’t be michael@0: ignored; wonderful tune, wonderful orchestration, wonderful performance. michael@0: (“5✭♫” series introduction here; michael@0: with an michael@0: explanation of why the title may look broken.)

michael@0: Desire, by Bob Dylan michael@0:

The Context

michael@0:

Nothing I can possibly write will add any wisdom to the michael@0: millions of words, some 90% of them in excess of needs, written on the subject michael@0: of this particular person.

michael@0:

A personal statement: Bob Dylan has long irritated me for, during the first michael@0: thirty years or michael@0: so of his career, never having given a straight answer to a straight question, michael@0: and for writing songs with dozens of boring verses. But they’ll still be michael@0: listening michael@0: to lots of his performances long after I’m dead, and in recent years he’s michael@0: become a better, more direct, interview.

michael@0:

My taste in Dylan is a little unusual: once you get past One More Cup michael@0: of Coffee, my favorites would be Baby Let Me Follow You michael@0: Down (from the Last Waltz soundtrack) and michael@0: Crash on the Levee (Down in the Flood) from michael@0: The Basement michael@0: Tapes.

michael@0:

Desire, the record, is hit and miss. Joey, michael@0: glorification of the life of some mafioso, is flawed in concept michael@0: and unlistenable in execution. Hurricane, whatever you think michael@0: about michael@0: Mr. Carter, that song michael@0: rocks; and Isis hits pretty hard too.

michael@0:

The Music

michael@0:

Is there anything in One More Cup of Coffee that’s not michael@0: perfect? Well yes, in the verses, the michael@0: lyrics on occasion drag (“He oversees his kingdom / So no stranger does michael@0: intrude / His voice it trembles as he calls out / For another plate of food”). michael@0: But apart from that, the sentiment is compelling, michael@0: Scarlet Rivera’s michael@0: violin is beautifully scored and played, the tune is to die for, and the michael@0: backing vocals are by Emmylou Harris, who you can bet is going to be here in michael@0: the 5-✭ series one of these days. michael@0: And while there’s not much middle ground on the subject of Dylan’s singing, if michael@0: you like it, you’ll really like this song.

michael@0:

Listen to the choruses: Bob and Emmylou veer wildly around the rhythm, then michael@0: coalesce on the beat when it matters, and they’re making it michael@0: up as they go along, they’re wholly inhabiting the moment, and it’s michael@0: quite, quite perfect.

michael@0:

Sampling It

michael@0:

Oh yeah, it’s out there. And there’s a live version too; but the smart michael@0: thing would be to go buy the un-compressed un-DRM’ed shiny round silver michael@0: version of Desire; it’s a keeper.

michael@0:
michael@0: michael@0: michael@0: Atomic Monday michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/24/Atomic-Monday michael@0: 2006-04-24T13:00:00-08:00 michael@0: 2006-04-24T00:44:06-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
michael@0:

First of all, implementors of anything Atom-related need to spend some time michael@0: chez michael@0: Jacques Distler; in particular, the conversation that plays out in the michael@0: comments. Second, there’s this piece of software called michael@0: Planet Planet that allows you to michael@0: make an aggregate web page by reading lots of feeds; for example, see michael@0: Planet Apache or michael@0: Planet Sun. michael@0: Sam Ruby decided that its Atom support needed some work, so michael@0: he did michael@0: it. Now, here’s the exciting part: he pinged me over the weekend and said michael@0: “Hey, look at this” wanting to show me his cleverly-Atomized michael@0: Planet Intertwingly feed. michael@0: I looked at it in michael@0: NetNewsWire and was puzzled for michael@0: a moment; some but not all of the michael@0: things in the feed were highlighted as unread, even though this was the first michael@0: time I’d seen it. Then the light went on. michael@0: This michael@0: is Atom doing exactly what we went to all that trouble to make it do. michael@0: NetNewsWire has good Atom support and, because Atom entries all have unique michael@0: IDs and timestamps, it can michael@0: tell that it’s seen lots of those entries before in other feeds that I michael@0: subscribe to. That’s how I found Jacques’ piece. This is huge; anyone who michael@0: uses synthetic or aggregated feeds knows that dupes are a big problem, showing michael@0: up all over the place. michael@0: No longer, Atom makes that problem go away.

michael@0:
michael@0: michael@0: michael@0: Hyatt on the High-Res Web michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/22/High-Res-Web michael@0: 2006-04-22T13:00:00-08:00 michael@0: 2006-04-23T17:12:18-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
michael@0:

Check out Dave Hyatt’s michael@0: excellent write-up on michael@0: designing and rendering Web pages so they take advantage of the michael@0: higher-resolution screens that may be coming our way. michael@0: I emphasize “may” because I’ve seen how slowly we’ve picked up pixels over michael@0: the years. The first really substantial screen I ever worked on was a michael@0: 1988-vintage Sun workstation with about a million pixels. The Mac on my michael@0: lap right now, which has 125 times as much memory as that workstation, has michael@0: only 1.38 million pixels. michael@0: Anyhow, Hyatt has some smart things to say on the issues, michael@0: which are trickier than you might think. I suspect that sometime in a couple of michael@0: years, if I still care about ongoing, I’m going to michael@0: have to go back and reprocess all the images so that higher-res versions are michael@0: available for those who have the screens and don’t mind downloading bigger michael@0: files. michael@0: Anyhow, Dave’s piece may be slightly misleading in that he talks about SVG michael@0: as though michael@0: it’s something coming in the future. Not so, check out michael@0: this nifty SVG Atom michael@0: logo, which works fine in all the Mozilla browsers I have here. michael@0: Load it up, resize the window, and watch what happens. Then do a “view michael@0: source”. michael@0: [Update: michael@0: Jeff Schiller writes to tell me that michael@0: Opera 9 does SVG (and Opera 8 “SVG Tiny”) too.] michael@0: [Dave Walker writes: Though the shipping version of Safari doesn’t support SVG, michael@0: the michael@0: nightlies do.] michael@0: [Dave Lemen michael@0: points to michael@0: JPEG 2000 as possibly michael@0: useful in a high-res context.]

michael@0:
michael@0: michael@0: michael@0: Wrong About the Infield Fly Rule michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/23/Wrong-About-the-Infield-Fly-Rule michael@0: 2006-04-23T13:00:00-08:00 michael@0: 2006-04-23T15:02:41-08:00 michael@0: michael@0: michael@0: michael@0:
michael@0:

My brother michael@0: Rob is really taking to michael@0: this blogging medium. Check out his recent michael@0: Credo, michael@0: and also the only instance I’ve seen of michael@0: Anglo-Saxon alliterative poetry michael@0: applied to a mini-van.

michael@0:
michael@0: michael@0: michael@0: Statistics michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2004/12/12/BMS michael@0: 2004-12-12T12:00:00-08:00 michael@0: 2006-04-23T10:10:02-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
Almost every Sunday I grab the week’s ongoing logfiles and update my numbers. I find it interesting and maybe others will too, so this entry is now the charts’ permanent home. I’ll update it most weeks, probably. [Updated: 2006/04/23.]
michael@0:
michael@0:

Almost every Sunday I grab the week’s ongoing michael@0: logfiles and update my numbers. michael@0: I find it interesting michael@0: and maybe others will too, so michael@0: this entry is now the charts’ permanent home. I’ll update it most weeks, michael@0: probably. michael@0: [Updated: 2006/04/23.]

michael@0: Browser market shares at ‘ongoing’ michael@0:

Browsers visiting ongoing, michael@0: percent.

michael@0: Browser market shares at ‘ongoing’, visitors via search engines michael@0:

Browsers visiting ongoing via michael@0: search engines, percent.

michael@0: Search engine market shares at ‘ongoing’ michael@0:

Search referrals to ongoing .

michael@0: RSS and Atom feed fetches michael@0:

Fetches of the RSS 2.0 and Atom 1.0 feeds.

michael@0:

The notes on usage and source code will return in coming weeks when I get michael@0: the cycles to rewrite this whole article.

michael@0:

What a “Hit” Means

michael@0:

I recently michael@0: updated the michael@0: ongoing software michael@0: (but haven’t updated the Colophon I see, oops). michael@0: Anyhow, the XMLHttpRequest now issued by each page seems to be a michael@0: pretty reliable counter of the number of actual browsers with humans behind michael@0: them reading the pages. I checked against michael@0: Google Analytics michael@0: and the numbers agreed to within a dozen or two on days with 5,000 to 10,000 michael@0: page views; interestingly, Google Analytics was always 10 or 20 views michael@0: higher.

michael@0:

Anyhow, do not conclude that now I know how many people are michael@0: reading whatever it is I write here; because I publish lots of short pieces michael@0: that are all there in my RSS feed, and anyone reading my Atom feed gets the michael@0: full content of everything. michael@0: I and I have no #&*!$ idea how many people look at my feeds.

michael@0:

By the way, this was the first time in weeks and weeks that I’d looked at the michael@0: Analytics numbers, and they showed almost exactly zero change from the report michael@0: linked above. So I’m going to turn them off; they’re a little too intrusive michael@0: and I think may be slowing page loads.

michael@0:

Anyhow, I ran some detailed statistics on the traffic for Wednesday, michael@0: February 8th, 2006.

michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
Total connections to the server180,428
Total successful GET transactions155,507
Total fetches of the RSS and Atom feeds88,450
Total GET transactions that actually fetched data (i.e. status code michael@0: 200 as opposed to 304)87,271
Total GETs of actual ongoing pages (i.e. not CSS, js, or michael@0: images)18,444
Actual human page-views6,348
michael@0:

So, there you have it. Doing a bit of rounding, if you take the 180K michael@0: transactions and subtract the 90K feed fetches and the 6000 actual human page michael@0: views, you’re left with 84,000 or so “Web overhead” transactions, mostly michael@0: stylesheets and graphics and so on. michael@0: For every human who viewed a page, it was fetched almost twice again by michael@0: various kinds of robots and non-browser automated agents.

michael@0:

It’s amazing that the whole thing works at all.

michael@0:
michael@0: michael@0: michael@0: XML Automaton michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/18/XML-Grammar michael@0: 2006-04-18T13:00:00-08:00 michael@0: 2006-04-23T08:25:56-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
In December of 1996 I released a piece of software called Lark, which was the world’s first XML Processor (as the term is defined in the XML Specification). It was successful, but I stopped maintaining it in 1998 because lots of other smart people, and some big companies like Microsoft, were shipping perfectly good processors. I never quite open-sourced it, holding back one clever bit in the moronic idea that I could make money out of Lark somehow. The magic sauce is a finite state machine that can be used to parse XML 1.0. Recently, someone out there needed one of those, so I thought I’d publish it, with some commentary on Lark’s construction and an amusing anecdote about the name. I doubt there are more than twelve people on the planet who care about this kind of parsing arcana. [Rick Jelliffe has upgraded the machine].
michael@0:
michael@0:

In December of 1996 I released a piece of software called michael@0: Lark, which was michael@0: the world’s first michael@0: XML Processor (as the michael@0: term is defined in the michael@0: XML Specification). michael@0: It was successful, but I stopped maintaining it in 1998 because lots of other michael@0: smart people, and some big companies like Microsoft, were shipping perfectly michael@0: good processors. I never quite open-sourced it, holding back one michael@0: clever bit in the moronic idea that I could make money out of Lark somehow. michael@0: The magic sauce is a finite state machine that can be used to parse XML 1.0. michael@0: Recently, someone out there needed one of those, so I thought I’d publish michael@0: it, with some commentary on Lark’s construction and an amusing anecdote about michael@0: the name. michael@0: I doubt there are more than twelve people on the planet who care about michael@0: this kind of parsing arcana. michael@0: [Rick Jelliffe michael@0: has michael@0: upgraded the machine].

michael@0:

Why “Lark”?

michael@0:

Lauren and I went to michael@0: Australia in late 1996 to visit her mother and to get married, which we michael@0: did on November 30th. Forty-eight hours later, Lauren twisted her knee michael@0: badly enough that she was pretty well michael@0: confined to a sofa for the rest of our Australian vacation.

michael@0:

So I broke out my computer and finished the work I’d already started on my michael@0: XML processor, and decided to call it Lark for Lauren’s Right michael@0: Knee.

michael@0:

How Lark Worked

michael@0:

Lark was a pure michael@0: deterministic michael@0: finite automaton (DFA) michael@0: parser, with a little teeny state stack. michael@0: Some of its transitions were labeled with named “events” that would provoke michael@0: the parser to do something if, for example, it had just recognized a start tag michael@0: or whatever.

michael@0:

DFA-driven parsers are a common enough design pattern, although I think michael@0: Lark is the only example in the XML space. michael@0: There are well-known parser generators such as michael@0: yacc, michael@0: GNU bison, and michael@0: javacc, michael@0: usually used in combination with lexical scanners such as michael@0: flex so that michael@0: you can write your grammar in terms of tokens not characters. michael@0: Also, they handle LALR langauges, so the parsing technique is quite a bit michael@0: richer than a pure state machine.

michael@0:

I thought I had a better idea. The grammar of XML is simple michael@0: enough, and the syntax characters few enough, that I thought I could just michael@0: write down the state machine by hand. michael@0: So that’s what I did, inventing a special-purpose DFA-description michael@0: language for the purpose.

michael@0:

Then I had a file called Lark.jin which was really a Java michael@0: program that used the state machine to parse XML. The transition “events” michael@0: in the machine were mapped to case labels in a huge michael@0: switch construct. Then there was a horrible, horrible michael@0: Perl program that read the Lark.jin and the automaton, michael@0: generated the DFA tables in Java syntax, inserted them into the code and michael@0: produced Lark.java, which you actually compiled michael@0: to make the parser.

michael@0:

So while Java doesn’t have a preprocessor, Lark did, which made quite a few michael@0: things easier.

michael@0:

There were a lot of tricks; some of the state transitions michael@0: weren’t on characters, they were on XML character classes such as michael@0: NameChar and so on. michael@0: This made the automaton easier to write, and in fact, to keep the class files michael@0: small, the character-class transitions persisted into the Java form, and the michael@0: real DFA was built at startup time. michael@0: These days, quick startup might be more important than .class michael@0: file size.

michael@0:

What Was Good

michael@0:

It was damn fast. James Clark managed to hand-craft a michael@0: Java-language XML parser called michael@0: XP that was a little faster michael@0: than Lark, but he did that by clever I/O buffering, and I was determined to michael@0: leapfrog him by improving my I/O.

michael@0:

This was before the time of standardized XML APIs, but Lark had a stream API michael@0: that influenced SAX, and a DOM-like tree API; both worked just fine. michael@0: Lark is one of very few parsers ever to have survived the michael@0: billion michael@0: laughs attack.

michael@0:

Lark was put into production in quite a few deployments, and the flow of michael@0: bug reports slowed to a trickle. michael@0: Then in 1998 I noticed that IBM and Microsoft and BEA and everyone else michael@0: were building XML Processors, so I decided that it wasn’t worthwhile michael@0: maintaining mine.

michael@0:

What Was Bad

michael@0:

I never got around to teaching it namespaces, which means it wouldn’t be michael@0: real useful today.

michael@0:

It had one serious bug that would have been real work to fix and since michael@0: nobody ever encountered it in practice, I kept putting it off and never did. michael@0: If you had an internal parsed entity reference in an attribute value and the michael@0: replacement text included the attribute delimiter (' or michael@0: "), it would scream and claim you had a busted XML document.

michael@0:

That Automaton

michael@0:

What happened was, michael@0: Rick Jelliffe, who is a michael@0: Good Person, was michael@0: looking for michael@0: a FSM for XML and I eventually noticed, and so I sent him mine.

michael@0:

There’s no reason whatsoever to keep it a secret: michael@0: here it is. michael@0: Be warned: it’s ugly.

michael@0:

Fortunately, there were only 227 states and 8732 transitions, so the state michael@0: number fit into a michael@0: byte; that and the associated event index pack into a short. michael@0: To make things even tighter, the transitions were only keyed by characters up michael@0: to 127, as in 7-bit ASCII. michael@0: Characters higher than that can’t be XML syntax characters, so we’re only michael@0: interested whether they fall into classes like NameChar and michael@0: NameStartChar and so on. A 64K byte[] array takes michael@0: care of that, each byte having a class bitmask.

michael@0:

As a result of all this jiggery-pokery, the DFA ends up, believe it michael@0: or not, constituting a short[227][128].

michael@0:

Here’s a typical chunk of the automaton:

michael@0:
1. # in Start tag GI
michael@0: 2. State StagGI BustedMarkup {in element type}
michael@0: 3. T $NameC StagGI
michael@0: 4. T $S InStag !EndGI
michael@0: 5. T > InDoc !EndGI !ReportSTag
michael@0: 6. T / EmptyClose !EndGI
michael@0:

This state, called StagGI, is the state where we’re actually michael@0: reading the name of a tag, we got here by seeing a < followed michael@0: by a NameStart character.
michael@0: Line 1 is a comment.
michael@0: In line 2 we name the state, and support error reporting, providing the name michael@0: of another state to fall back into in case of error, and in the curly braces, michael@0: some text to help build an error message.
michael@0: Line 3 says that if we see a valid XML Name character, we just stay in this michael@0: state.
michael@0: Line 4 says that if we see an XML space character, we move to state michael@0: InStag and process an EndGI event, which would stash michael@0: the characters in the start tag.
And so on.

michael@0:

Other Hackery

michael@0:

An early cut of Lark used String and StringBuffer objects to hold all the michael@0: bits and pieces of the XML. This might be a viable strategy today, but in michael@0: 1996’s Java it was painfully slow. michael@0: So the code goes to heroic lengths to live in the land of character arrays at michael@0: all times, making Strings only when a client program asks for one through the michael@0: API. The performance difference was mind-boggling.

michael@0:

An Evil Idea

michael@0:

If you look at the automaton, and the Lark code, at least half—I’d bet michael@0: three quarters—is there to deal with parsing the DTD and then dealing with michael@0: entity wrangling. michael@0: A whole bunch more is there to support DOM-building and walking.

michael@0:

I bet if I went through and simply removed support for anything coming out michael@0: of the <!DOCTYPE>, including all entity processing, michael@0: then discarded michael@0: the DOM stuff, then added namespace support and SAX and StAX APIs, it would be michael@0: less than half its current size. michael@0: Then if I reworked the I/O, knowing what I know now and stealing some tricks michael@0: that James Clark uses in michael@0: expat, I bet it would michael@0: be the fastest Java XML parser on the planet for XML docs without a michael@0: DOCTYPE; by a wide margin. It’s hard to beat a DFA.

michael@0:

And it would still be fully XML 1.0 compliant. Because (snicker) this is michael@0: Java, and your basic core Java now includes an XML parser, so I could simply michael@0: instrument Larkette to buffer the prologue and if it saw a DOCTYPE with an michael@0: internal subset, defer to Java’s built-in parser.

michael@0:

I’ll probably never do it. But the thought brings a smile to my face.

michael@0:
michael@0: michael@0: michael@0: Just A Kid michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/22/Just-a-Kid michael@0: 2006-04-22T13:00:00-08:00 michael@0: 2006-04-22T13:37:58-08:00 michael@0: michael@0: michael@0: michael@0:
michael@0:

Last weekend, Lauren felt like cooking up home-made Easter eggs, so michael@0: the shopping list included “chocolate chips (large bag)”. I was heading down michael@0: the bulk-foods aisle and realized one of the vertical acrylic bins was full of michael@0: them. Someone had been sloppy, and there was a little heap of chocolate chips michael@0: on the shelf underneath it. For a second, I flashed into pure eight-year-old michael@0: mode, thinking “Holy cow, there’s a whole bin full of chocolate michael@0: chips, and more just lying there!” I popped a few in my mouth and they were michael@0: excellent; semi-sweet, dark, strong, and firm. I was still in the state that michael@0: Buddhists don’t mean when they say “Child’s Mind”, thinking “I michael@0: can get as many as I want!” The list did say “large bag” after all, so I put michael@0: a bag under the spout and gleefully jammed the lever all the way michael@0: over. At home, Lauren said “You went overboard, a bit, didn’t you?” michael@0: and now we have a plastic canister-full in the pantry which should last us michael@0: into 2007. It’s a good feeling.

michael@0:
michael@0: michael@0: michael@0: Goddess michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/22/Goddess michael@0: 2006-04-22T13:00:00-08:00 michael@0: 2006-04-22T12:25:59-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
michael@0:

That would be my wife michael@0: Lauren. After michael@0: I b0rked our michael@0: Win2K gamebox, I tried re-installing the OS and eventually reduced it to michael@0: complete brick-ness, it recognized neither the video adapter nor the network michael@0: card. So Lauren brushed me aside and started wrestling with the problem, and michael@0: to make a long story short, it almost completely works again. At one point michael@0: she seemed nearly infinite in her capabilities, sitting in front of the michael@0: computer wrangling software updates while knitting baby stuff and looking up michael@0: words in a German dictionary for the kid’s homework. Some of the German nouns michael@0: and muttered curses at the Windows install sounded remarkably like each other. michael@0: Why would anyone not marry a geek? The only problem is that Win2K won’t michael@0: auto-switch resolutions to play games any more, it gets the frequency wrong michael@0: and the LCD goes pear-shaped, you have to hand-select the frequency and michael@0: switch into the right resolution first. LazyWeb?

michael@0:
michael@0: michael@0: michael@0: Routing Around Spotlight michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/21/Routing-Around-Spotlight michael@0: 2006-04-21T13:00:00-08:00 michael@0: 2006-04-21T23:16:25-08:00 michael@0: michael@0: michael@0: michael@0: michael@0:
Herewith two hideously ugly little shell scripts for use when Spotlight refuses to search your mail. Spotlight is a flawed v1.0 implementation of a really good idea and will, I’m sure, be debugged in a near-future release. [Update: The LazyWeb is educating me... these are moving targets.]
michael@0:
michael@0:

Herewith two hideously ugly little shell scripts for use when Spotlight michael@0: refuses to search your mail. michael@0: Spotlight is a flawed v1.0 implementation of a really good idea and will, I’m michael@0: sure, be debugged in a near-future release. michael@0: [Update: The LazyWeb is educating me... these are moving targets.]

michael@0:

My problem is that whereas Mail.app will search my To/From/Subject michael@0: lines (slowly, and with a michael@0: really irritating GUI), michael@0: the “Entire Message” option just doesn’t work, it returns instantly with no michael@0: results. Yes, I’ve read the hints about making Spotlight re-index, michael@0: but it just flatly refuses to work for me. Mind you, I have a lot of michael@0: email, but still, it should at least try.

michael@0:

It turns out I had never really figured out the -print0 and michael@0: -0 idioms that a lot of the shell-command stalwarts now have. michael@0: Thanks to Malcolm Tredinnick for raising my consciousness.

michael@0:

This lives in $HOME/bin under the name michael@0: mailgrep:

michael@0:
#!/bin/sh
michael@0: find $HOME/Library/Mail/IMAP* -name '*.emlx' -print0 | \
michael@0:   xargs -0 fgrep -i $@
michael@0:

Isn’t xargs a funny command? I’ve discovered that it’s nearly michael@0: impossible to describe what does, and then why what it does is necessary, but michael@0: there are just a whole bunch of places where you’d be lost without it.

michael@0:

This lives in $HOME/bin/mailview:

michael@0:
#!/bin/sh
michael@0: find $HOME/Library/Mail/IMAP* -name '*.emlx'  -print0 | \
michael@0:   xargs -0 fgrep -i -l -Z $@ | \
michael@0:   xargs -0 open
michael@0:

The first cut of this dodged xargs and used an michael@0: incredibly-inefficient and slow chain of -exec arguments to open michael@0: the files one at a time with michael@0: view (aka vim), to work around michael@0: a well-known vim misfeature; it complained about the input michael@0: not being a terminal and left my Terminal.app keystrokes borked.

michael@0:

But Malcolm, confirming my belief in the broken-ness of vim, michael@0: said “Oh, *that* ‘view’. I thought it was some sexy Mac ‘view my email’ app”. michael@0: D’oh, of course; the magic OS X open command does just the right michael@0: thing. michael@0: Erm, you might want to run mailgrep before you run michael@0: mailview; I’m not sure what would happen if you asked OS X to michael@0: open three or four thousand email messages at once.

michael@0:
michael@0: michael@0: michael@0: FSS: Pink Flowers michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/21/Dracon-Help michael@0: 2006-04-21T13:00:00-08:00 michael@0: 2006-04-21T17:19:27-08:00 michael@0: michael@0: michael@0: michael@0:
Friday Slide Scan #28 is two Eighties florals, one interior, one exterior. With a confession.
michael@0:
michael@0:

Friday Slide Scan #28 is two Eighties florals, one interior, one michael@0: exterior. With a confession.

michael@0:

First some spring flowers fallen from a tree, just as now in our front michael@0: yard, at dusk.

michael@0: Fallen pink treeflowers on grass at dusk michael@0:

I’m not sure what these are, but look at the light in the center. Rewards michael@0: enlarging.

michael@0: Flowers in shadow with light in background michael@0:

Here’s the confession. Sometimes on Fridays when I’m feeling kinda michael@0: burned-out, I knock off work and do these slide scans in the office, because michael@0: this is where I have the michael@0: big michael@0: screen. michael@0: Blowing these pictures up to mega-huge, picking away at the old-slide crud and michael@0: scanning artifacts, tinkering with the colour balance, and listening; I never michael@0: play music while I’m writing or coding seriously, but I play it real loud while michael@0: photo-editing. It’s all pretty well pure pleasure; you just can’t imagine michael@0: how good that second one above looks at near-native size. michael@0: It reconstitutes the part of my mind that I earn my living with; that’s my michael@0: story and I’m sticking to it.

michael@0:

Images in the Friday Slide Scans are from 35mm slides taken between 1953 michael@0: and 2003 by (in rough chronological order) michael@0: Bill Bray, michael@0: Jean Bray, Tim Bray, Cath michael@0: Bray, and michael@0: Lauren Wood; when I know michael@0: exactly who took one, I’ll say; in this case, at least one is by Cath Bray. michael@0: Most but not all of the slides were on Kodachrome; they were digitized using michael@0: a Nikon CoolScan 4000 ED scanner and cleaned up by a combination of the Nikon michael@0: scanning software and PhotoShop Elements.

michael@0:
michael@0: michael@0: michael@0: Spring Pix michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/20/Spring-Pix michael@0: 2006-04-20T13:00:00-08:00 michael@0: 2006-04-20T23:07:10-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
Three pictures around Vancouver; one of a fresh green springtime tree, two of rotten old buildings being torn down.
michael@0:
michael@0:

Three pictures around Vancouver; one of a fresh green springtime tree, two michael@0: of rotten old buildings being torn down.

michael@0:

There’s nothing quite as fresh as just-sprouted deciduous leaves; michael@0: another few weeks and this tree will be just a tree.

michael@0: Sunlit fresh young leaves michael@0:

I have a thing about demolition. michael@0: The first is a rotten dingy old one-story on Main Street near 23rd, the second michael@0: is an unlovely grey mid-rise being torn down to build still more condos at michael@0: Homer and Helmcken.

michael@0: Demolition site on Main Street, Vancouver michael@0: Demolition site at Homer and Helmcken, Vancouver michael@0:
michael@0: michael@0: michael@0: Totten’s Trip michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/20/Totten-on-Iraq michael@0: 2006-04-20T13:00:00-08:00 michael@0: 2006-04-20T21:05:22-08:00 michael@0: michael@0: michael@0: michael@0: michael@0:
michael@0:

Michael J. Totten is a michael@0: journalist and blogger who’s back and forth to the michael@0: Middle East and writes about it, quite well in my opinion; he supports this by michael@0: freelancing and with his blog’s tip jar. He gets lots of michael@0: link love from the right-wing blogosphere, which is puzzling because Totten is michael@0: balanced and clear-eyed and doesn’t seem to have any particular axe to grind. michael@0: Recently, he and a friend were michael@0: having fun in michael@0: Istanbul and, on a random drive out into the country, decided on impulse to michael@0: keep going, all the way across Turkey and into Iraq; into the Kurdish michael@0: mini-state in Iraq’s north, to michael@0: be precise. It makes a heck of a story, with lots of pictures, in six parts: michael@0: I, michael@0: II, michael@0: III, michael@0: IV, michael@0: V, and michael@0: VI. michael@0:

michael@0:
michael@0: michael@0: michael@0: The Cost of AJAX michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/19/The-Cost-of-AJAX michael@0: 2006-04-19T13:00:00-08:00 michael@0: 2006-04-20T00:37:46-08:00 michael@0: michael@0: michael@0: michael@0:
michael@0:

James Governor michael@0: relays a michael@0: question that sounds important michael@0: but I think is actively dangerous: do AJAX apps present more of michael@0: a server-side load? The question is dangerous because it’s meaningless and michael@0: unanswerable. Your typical Web page will, in the process of michael@0: loading, call back to the server for a bunch of stylesheets and graphics and michael@0: scripts and so on: for example, this ongoing page calls michael@0: out to three different graphics, one stylesheet, and one JavaScript file. michael@0: It also has one “AJAXy” XMLHttpRequest call. michael@0: From the server’s point of view, those are all just requests to dereference michael@0: one URI or another. In the case michael@0: of ongoing, the AJAX request is for a static file less michael@0: than 200 bytes in size (i.e. cheap). michael@0: On the other hand, it could have been for something that required a michael@0: complex outer join on two ten-million-row tables (i.e. very michael@0: expensive). And one of the virtues of michael@0: the Web Architecture is that it hides those differences, the “U” in URI stands michael@0: for “Uniform”, it’s a Uniform interface to a resource on the Web that could michael@0: be, well, anything. michael@0: So saying “AJAX is expensive” (or that it’s cheap) is like saying “A mountain michael@0: bike is slower than a battle tank” (or that it’s faster). michael@0: The truth depends on what you’re doing with it. michael@0: In the case of web sites, it depends on how many fetches you do and michael@0: where you have to go to get the data to satisfy them. michael@0: ongoing is a pretty quick web site, even though it runs michael@0: on a fairly modest server, but michael@0: that has nothing to do with AJAX-or-not; it’s because of the particular way michael@0: I’ve set up the Web resources that make the pages here. michael@0: I’ve michael@0: argued elsewhere michael@0: that AJAX can be a performance win, system-wide; but that argument too is michael@0: contingent on context, lots of context.

michael@0:
michael@0: michael@0: michael@0: Hao Wu and Graham McMynn michael@0: michael@0: http://www.tbray.org/ongoing/When/200x/2006/04/18/Hao-Wu michael@0: 2006-04-18T13:00:00-08:00 michael@0: 2006-04-18T22:00:40-08:00 michael@0: michael@0: michael@0: michael@0: michael@0: michael@0: michael@0:
michael@0:

Graham McMynn is a teenager who was kidnapped in Vancouver on April 4th and michael@0: freed, in a large, noisy, and michael@0: newsworthy michael@0: police operation, on April 12th. michael@0: Hao Wu is a Chinese michael@0: film-maker and michael@0: blogger who was kidnapped in michael@0: Beijing on February 22nd in a michael@0: small, quiet police operation not intended to be newsworthy, and who has not michael@0: been freed. michael@0: Read about it michael@0: here, michael@0: here, and michael@0: here. michael@0: Making noise about it might influence the government of China to michael@0: moderate its actions against Mr. Wu, and can’t do any harm. michael@0: Mr. McMynn’s kidnappers were a gaggle of small-time hoodlums, one of whom was michael@0: out on bail while awaiting trial for another kidnapping (!). michael@0: Mr. Wu’s were police. michael@0: In a civilized country, the function of the police force is to deter such michael@0: people and arrest them. A nation where they are the same people? Nobody michael@0: could call it “civilized”.

michael@0:
michael@0: michael@0: