Friday, January 30, 2009

how many pinyin combinations are there?

Prelude: Yes, I'm back-to-back posting about arcane Chinese language trivia. But with the week off for Chinese New Year, I've been spending a lot of time studying and thinking about these issues.

Shortly after posting earlier today on unique pinyin combinations, I discovered a terrific and related post about the pinyin chart on a site called Laowai Chinese. The comments section is recommended reading in addition to the post.

Albert from Laowai Chinese reckons there are 409 possible pinyin sound combinations. However, the actual number of syllables is somewhat debatable, because different pinyin charts and dictionaries list different possible combinations. Here is a chart that I created that includes 412 combinations:

The three extra combinations that I included that Albert didn't are tei 忒, kei 剋, and rua .

On the other hand, here's a list that includes 416 possible sounds. Although that list omits dia, rua, and tei, it includes these that I don't: diang, shong, yai, nia, sei, lün, and lüan. But I'm not so sure these are valid, since I can't find any characters for these combinations.

Ok, last comment on this today. Here's a pretty cool online pinyin chart that includes audio pronunciation. This chart includes certain obscure combinations like tei, kei, den, eng, dia, rua, and chua, but still omits others like lo, yo, and ei.

unique pinyin combinations

Prelude: This is an off-topic post featuring arcane and obscure information about the Chinese language.

As an engineer / statistician / nerd, I am always classifying, analyzing, and quantifying the world around me. I am fascinated by the underlying structure and logic of all things, even things that are theoretically outside the realm of engineers. I also love tables and graphs.

Therefore, you can imagine my excitement when, on my first day learning Chinese, I discovered the chart of pinyin syllables:

How incredible that all of the possible sounds in an entire language may be represented so simply and logically, and on just one page! Not counting the tones, there are just over 400 possible sounds in Chinese. I've forgotten now how many sounds are possible in English, but it's on the order of tens of thousands.

Very early on in my Chinese study, I became fascinated with the frequency distribution of characters (including tones) within the pinyin chart. Occasionally, the uneven distribution would seem to make linguistic / cultural sense. For example, there is only one common character for the pinyin combination si3, 死, meaning death. Although I'm out of my league in postulating here, one could certainly imagine that as the society developed, the language would be clarified to ensure there wasn't any confusion about death, hence leaving the word isolated phonetically.

Anyway, over the years, I've tracked a lot of unique / unexpected / fascinating things I have discovered about the pinyin chart. And so I thought I'd share some of them here for your curiosity, trivial entertainment, and perhaps aid in your own language pursuit.

First off, here's a list of pinyin combinations for which there is only one commonly used character. (My unscientific definition of "commonly used" here is that the character is included in my cell phone's Chinese input system.)

I'll start easy:

neng 能
gei 给
zhei 这
shei 谁
dei 得
sen 森
nin 您
ri 日
me 么

I really can't say why this is so interesting to me, but it is. I mean, there are dozens, if not hundreds, of characters for the pinyin "yi." How come "neng" only gets one?

Ok, now let's get a little more obscure. Here are a few less common pinyin combinations for which, again, there is only one common character:

fo 佛
dia 嗲
ei 诶
zei 贼
nou 耨
seng 僧
lia 俩
zhuai 拽
lo 咯

Ok, moving on. Here's a list of unusual pinyin combinations that, although they do have more than one character, you might have rarely encountered before:

jiong 窘,
miu 谬
pou 剖
cen 岑
beng 泵
pie 撇
weng 翁
zuan 钻
chuai 揣
chuo 戳

Ok, final list for this post. Here's a few pinyin combinations that technically exist, but are so obscure that my cell phone and even many dictionaries include no characters. Many pinyin tables don't even include these as possible sound combinations!

tei 忒
rua
chua
den
eng
kei 剋
nun

Ok, enough for today. Anyone out there know of other obscure pinyin combinations? Lastly, I'll conclude with some links to more Chinese character esoterica:

This blog:
How many pinyin combinations are there?

Danwei:
Acceptance comes for obscure characters
Problems with crazy characters
Living with an obscure name

56minus1's excellent Chinese net-speak series:
Chinese net-speak Part 1
Chinese net-speak Part 2
Chinese net-speak Part 3

Laowai Chinese's post on the pinyin chart.

Thursday, January 29, 2009

CNN on climate change - the good and the bad


This morning, I was impressed to discover CNN.com leading with a story about Al Gore's testimony to the Senate Foreign Relations Committee. Gore spoke about the imperative for the United States to negotiate and agree this year to an international treaty to reduce global greenhouse gas emissions.

I was also encouraged by the tone in these two paragraphs in the article:
During the hearing, Republican staffers handed out a statement contending that there are "significant objections" to claims about climate change. The document, which did not name Gore, said there is "a continued international outpouring of skeptical scientists" along with research "to refute warming fears."

The idea that the world's climate is being changed by human activities is supported by studies accepted by the vast majority of scientists with expertise in the field. The Intergovernmental Panel on Climate Change, the U.S. National Academy of Sciences, the American Meteorological Society, the American Geophysical Union and the American Association for the Advancement of Science are among groups that have issued reports backing that position.
I am encouraged here for a couple of reasons. First, the article does not hedge on whether or not climate change is being caused by humans. The second paragraph here is simply strong, direct, factual journalism, which is desperately needed to bring US public opinion on climate change more closely in line with reality.

Second, although the article does describe the actions of the climate skeptics, the claims of those skeptics are presented as claims alone (with quotation marks), not as truths. To me, this is journalistically a step in the right direction. By following the Republicans' "claim" of an "outpouring of skeptical scientists" with a real list of real scientific organizations, the article essentially discredits the claim. (To be fair, a perfect article would have discredited the claim directly, but nonetheless progress is still progress.)

That having been said, though, CNN doesn't deserve all praise today. It is a daily habit of mine to read the CNN International home page followed by the CNN US page. I do this for a variety of reasons, but primarily I am curious about the differing emphasis and priority assigned to different news stories for the two markets.

And sure enough, my elation over a cover story on climate change was immediately quashed when I discovered that the US edition of CNN did not even feature the story at all on the home page:


What's going on here? Gore testifying in front of the Senate on climate change is important enough to make the cover of the international page, but on the US page is usurped by such hard-hitting headlines as "Vegetable ad deemed too hot for TV"?

And so the long struggle to change US public opinion on climate change goes on...

Thursday, January 22, 2009

censoring obama's inaugural address

Several media and blog sources are reporting the Chinese government's censorship of Obama's inaugural address. From the AP:
At one point, Obama said earlier generations "faced down communism and fascism not just with missiles and tanks, but with sturdy alliances and enduring convictions." He later addressed "those who cling to power through corruption and deceit and the silencing of dissent — know that you are on the wrong side of history."

Translations of the speech on China's most popular online portals, Sina and Sohu, were missing the word "communism" in the first sentence. The paragraph with the sentence on dissent had been removed.
Although I trust the AP's reporting, I still wanted to verify this for myself. Sina.com has a special inauguration page here:


(Interesting side note: the header refers to him as "Jr.," which I've never seen in the Western media, and is not used on his official White House page.)

From Sina's inauguration page, clicking 发表演说 takes you to a page featuring both a video of the speech and the supposed complete text (全文). Let's take a closer look at both sensitive instances.

He says "communism" at 10:14 in the video. The video is not edited (as it was during the CCTV live broadcast), but the subtitle omits the word:

Original English: "Recall that earlier generations faced down fascism and communism not just with missiles and tanks"

Subtitle: "他们不仅仅是靠导弹和坦克击败法西斯主义" ("They didn't merely rely on missiles and tanks to defeat fascism")

The text on the site is slightly different from the subtitle ("回想先辈们在抵抗法西斯主义之时,他们不仅依靠手中的导弹或坦克"), but still omits "communism."

Now let's take a look at the second part, about dissent. This portion starts around 12:50 on the video.

Again, the video itself is not edited, although the subtitles appear to use the word "suppress" instead of "silence."

Original English: "To those who cling to power through corruption and deceit and the silencing of dissent"

Subtitles: "对于那些通过腐败、欺骗和镇压异见者来攫取权力的领导人" ("To those leaders who grab power by corruption, deception, and suppression of dissenters")

In any case, although the censorship of the subtitles doesn't appear to be too heavy, that of the text version of the speech is. As indicated in the AP article, an entire paragraph is omitted. From what I can tell, these are lines of the speech that are omitted from the Chinese text version on Sina.com:
To the Muslim world, we seek a new way forward, based on mutual interest and mutual respect. To those leaders around the globe who seek to sow conflict or blame their society's ills on the West, know that your people will judge you on what you can build, not what you destroy. To those who cling to power through corruption and deceit and the silencing of dissent, know that you are on the wrong side of history, but that we will extend a hand if you are willing to unclench your fist. To the people of poor nations, we pledge to work alongside you to make your farms flourish and let clean waters flow; to nourish starved bodies and feed hungry minds. And to those nations like ours that enjoy relative plenty, we say we can no longer afford indifference to the suffering outside our borders, nor can we consume the world's resources without regard to effect. For the world has changed, and we must change with it.
I think the title of Austin's post at the Time China blog captures very concisely the depressing irony here: "the silencing of 'silencing of dissent'".

bu zheteng

There is an interesting linguistic debate happening here in China in regards to something Hu Jintao said during his recent speech commemorating the 30th anniversary of China's reform and opening. While pushing forward with development goals, China should 不动摇,不懈怠,不折腾, which the China Daily translated as "don't sway back and forth, relax our efforts or get sidetracked." Everyone seems content with the translation of the first two terms, but there is a lot of debate about the third - 不折腾 (bu4 zhe1 teng) - both what Mr. Hu meant and how to translate it.

A couple of weeks ago, Danwei posted a great summary of the issue, and I recommend readers start there for background. Yesterday, Austin at the Time China blog posted his (humorous) interpretation and suggestion, and linked to the recently published state-run Xinhua's suggestions:
- bu zheteng
- no trouble-making
- avoid self-inflicted setbacks
- don't flip flop
- don't get sidetracked
- don't sway back and forth
- no dithering
- no major changes
- avoid futile actions
- stop making trouble and wasting time、no self-consuming political movements
What fascinates me is the first suggestion - "bu zheteng" - which is just the Chinese rendered in the standard romanization system, pinyin. The implication being that if it can't be translated adequately, why try?

Last night, my colleagues, all of whom are Chinese, and I had a discussion about how to translate bu zheteng. They all seem to agree that the best solution is simply for us English-speakers to adopt bu zheteng into our language. What do you think?

Bearing in mind that I'm an engineer, not a linguist, off the top of my head I can think of two categories of Chinese words that have been adopted into the English language:

The first is Chinese words that have been fully integrated and are included in standard English language dictionaries. Examples: tofu, from the Chinese dou4 fu 豆腐, and kung fu, from the Chinese gong1 fu 功夫.

The second is Chinese words that expats living in China routinely use colloquially when speaking to each other, either because no equivalent English word exists, or because it describes perfectly a phenomenon unique to China. Examples:

- chai 拆, meaning to demolish, e.g. "I used to love that restaurant; too bad it got chai'ed last week."
- mafan 麻烦, meaning troublesome / annoying, e.g. "Traveling during Chinese New Year's is too much mafan, I think I'll just stay in Beijing next week."

My prediction is that bu zheteng will be integrated by expats into the unique brand of Chinglish that we use when speaking to other China expats, but that there is little to no chance that bu zheteng will become the next tofu.

--
Update 2/17/09:

Great photo today in The Beijinger:

--

Tuesday, January 20, 2009

1-20-09


Today is a great day for America, and for the world. Let us rejoice in the triumph over struggle that Mr. Obama's election represents, and in the desperately needed hope and vision that he so passionately and eloquently brings.

Mr. Obama: we are ready.

Monday, January 19, 2009

fuel economy improvement and where to get bang for your buck

The NYT has a good op-ed today called "Energy Inefficient." Towards the end, there's this interesting paragraph:
The Union of Concerned Scientists points out that switching from an S.U.V. that gets 14 miles per gallon to one that gets 16 would save the same amount of fuel as swapping a 35-mile-a-gallon car for a 51-m.p.g. new generation gas-sipper. This is not an argument for more S.U.V.’s. It simply shows that we can wring savings from modest efficiency gains in products we already use.
Since this is somewhat counter-intuitive, I thought a brief explanation might be useful.

The confusion stems directly from the units used to express fuel economy in America. "Miles per gallon," while perhaps clearer for consumers, is a difficult framework in which to think about resource consumption (since the resource - gallon of gas - is, by definition, fixed).

On the other hand, consider the alternative -- flip the term and talk about gallons of fuel used per mile traveled. In the example above, an SUV that gets 14 mpg burns 0.0714 gallons of fuel per mile traveled, or, to make things easier, 7.14 gallons of fuel per 100 miles traveled. The car that gets 35 mpg burns 2.86 gallons of fuel per 100 miles traveled.

First of all, from the perspective of limiting absolute energy consumption, the car is clearly a better choice, and that fact should not be lost in this discussion.

But consider the impact stemming from relative improvements to the two vehicles. If I improve the SUV's fuel economy from 14 to 16 mpg, my fuel consumption per 100 miles traveled drops by almost a gallon - from 7.14 to 6.25. In other words, I have saved one gallon of fuel as compared with my baseline scenario. To get the same improvement from the car's baseline, I have to decrease my fuel consumption from 2.86 gallons per 100 miles traveled to 1.96, corresponding to a fuel economy improvement to 51 mpg:

The optimal scenario would of course be to upgrade all your vehicles to 51 mpg (or more). However, given that such an ideal is often not practical or reasonable, comparative analysis as described here can be very beneficial from political, business, and personal perspectives.
--
Update 1/20/09:
The Energy Analysis blog has a similar post about this issue, and includes the following relevant graph showing the non-linear relationship between fuel consumption and mpg:

--

Friday, January 16, 2009

olympic pollution reductions confirmed by NASA satellite

A new study from NASA analyzed satellite measurements of air pollution over Beijing to conclude:
The [Olympic] emission restrictions had an unmistakable impact. During the two months when restrictions were in place, the levels of nitrogen dioxide (NO2) -- a noxious gas resulting from fossil fuel combustion (primarily in cars, trucks, and power plants) -- plunged nearly 50 percent. Likewise, levels of carbon monoxide (CO) fell about 20 percent.
The following images show comparative NO2 levels around China during August, 2005-2007 (left) and August, 2008 (right). Note the disappearance of the color red over Beijing in the image on the right.


Much of the air quality discussion on this blog and elsewhere has been about particulate pollution, not NO2 or CO, so it's nice to see the expanded analysis.

Monitoring of air pollution by satellite is just awesome. Last November, I heard a fascinating presentation by Argonne National Lab's Dr. David Streets on recent developments in satellite monitoring. It is getting so exact, he said, that, "we are exploring the potential of monitoring the change of power plant emissions in China from space." He then showed an example of pinpointing the opening of new power plants in Inner Mongolia through satellite observation:


Note pixel size of ~12km. Incredible.

Related post: final day of temporary air quality measures


two good summaries of the near-future green vehicles market


If you want to get quickly up to speed on how the US alternative fuel vehicle market is shaping up over the next few years, I recommend the following two posts:

"Everything you could want to know about the plug-in hybrid and electric vehicle announcements at the Detroit auto show" (from Climate Progress via Calcars.org; note the excellent "ways to stay informed" links at the bottom);

"Green Cars of 2008: Mega-Ginormous Summary of the Year" (from Treehugger).

Image: Treehugger


Thursday, January 15, 2009

other databases of chinese air quality data

Sorry about the deluge of posts today. I've recently returned to China after some time away in the States and am playing catch up on everything I missed.

Here I'm posting two additional databases of Chinese APIs outside of MEP's datacenter. Neither database has new data outside of the government-issued API data, however.

First, the Clean Air Initiative for Asian Cities has excellent China air quality data sets available for downloading. For example, Beijing APIs from 2000 to mid-2008 (along with tons of analysis) available on this page. Note that data for other cities may be downloaded at the bottom of the page. Note also the excellent API to pollutant concentration converter on the right.

Second, in late December, Imagethief posted "all the data you ever wanted on Beijing and Shanghai air pollution." In the post, he links to a 5MB Excel file containing daily API data for Beijing and Shanghai from 2000 through mid-2007. The spreadsheet also contains a lot of analysis.

Some day I'll post my own spreadsheets, but that will require me to clean them up (a lot) and also figure out a way to share files easily. I'm sure there's a way, I just haven't looked into it yet.

In any case, as you cut through the data from the above sources on your own, let me know what you discover!

why you can't average APIs

The Asia Society's Room with a View website I mentioned in my last post is excellent. However, they have one indicator of air quality that is a little problematic, and here I'd like to explain why. The indicator I'm referring to is average API (which they call "average pollution").

Mathematically, it doesn't make sense to average APIs. This is because the conversion from pollutant concentration to API is non-linear, as shown in this graph:


Why this complicates averaging is best explained through an example:

Consider three different days with APIs 25, 100, and 250. The average API is 125, which corresponds to a PM10 concentration of 200 ug/m^3. Unfortunately, though, the actual average PM10 concentration for those three days is not 200.

Our three days with APIs 25, 100, and 250 correspond to PM10 concentrations of 25, 150, and 385 ug/m^3, respectively. The average PM10 is 187 ug/m^3, which corresponds to a real "average" API of 118, several points lower than that estimated using the other method.

Because of the non-linear conversion, to get an accurate "average API," we have to convert to PM10, average those, then convert back to API.

The difference isn't huge, and I do think that average APIs may occasionally be useful for snapshot, comparative indicators of the air quality of a given time period (as used by the Asia Society). However, it should be understood that this method usually gives a higher (worse) estimate of air quality than reality, and the average should never be used to convert back to pollutant concentration.

Detailed equations for converting back and forth from API to PM10 may be found at the bottom of this post.

daily photos of beijing's air quality


The Asia Society has an excellent website that features daily photos of Beijing's air, along with API info and a great video on the challenge of parallel economic development and environmental protection.

Wednesday, January 14, 2009

summary of beijing's 2008 air quality

Happy New Year! I thought I would start off the year with a brief look back at Beijing's air quality during 2008.

On December 31st, Xinhua reported that Beijing had achieved 274 "Blue Sky Days" in 2008. This was well in excess of the 2008 goal of 256, and even well above the 2009 goal of 259. But what does it mean in terms of air quality and human health? Let's take a closer look at the data to find out.

First of all, according to my tally, Beijing actually only achieved 272 Blue Sky Days in 2008, with one data point (9/6) missing. Even if we assume the 9/6 sky was blue though, that only amounts to 273, not the reported 274. What's going on here? FYI, I performed my tally by first downloading Beijing's 2008 API data (available by querying MEP's datacenter) then counting the number of days with API 100 or below. Am I doing something wrong here?

--
Update 1/15/09: In reviewing the data, I realized that MEP's 2008 API database is missing two data points - 9/6 and 6/4. Assuming both of these days were Blue Sky Days yields 274. I missed this the first time around because I forgot that I should be looking for 366 total data points (leap year!) not 365.
--

In any case, as I have written about before, the "Blue Sky Day" metric is problematic for several reasons. Perhaps what bothers me most about it is that it tells us nothing about actual air quality; increasing annual numbers of Blue Sky Days does not necessarily mean better air quality(1). To evaluate air quality, we need numbers for daily / annual concentrations of air pollutants. Although the Beijing EPB publishes annual pollutant concentrations in the Beijing Environmental Annual Reports, the 2008 report won't be available until this summer. So we need to improvise:

Starting from the database of 2008 API values, I converted back to daily PM10 concentrations using the formulas at the bottom of this post. I assume that the primary pollutant on all days is PM10(2). Averaging over the year I get:

2008 Average PM10 concentration for Beijing: 123 ug/m^3.

The good news? This is a 17% improvement over last year. The bad news? The PM10 concentration is still over six times higher than the WHO annual target of 20 ug/m^3:


During the Olympics, Beijing saw a 50% reduction in air pollution as the city enjoyed its cleanest air in ten years. Clearly, the success of the anti-pollution campaigns was a driving force behind 2008's relative improvement over years past. At the same time, though, we have a long way to go, and the considerable pollution of the city even in a "successful" year like 2008 should not be underestimated.


(1)Here, I'm not referring to data biasing. Rather, I'm simply considering the fact that Blue Sky Days are binary, as opposed to being a concentration value or gradual scale. Consider this extreme situation: if every day in one year had an API of 100, though the number of Blue Sky Days would be 365, the average annual PM10 concentration (indicating air quality) would actually be worse than it was in 2007 or 2008 in Beijing.

(2) This assumption is slightly problematic because for APIs below 50 the primary pollutant is not listed, although for APIs above 50 the primary pollutant is almost always PM10. To estimate the accuracy of this method, I used it on the 2006 and 2007 daily API databases and calculated an annual average PM10 concentration result for each year that deviated from the Beijing EPB's reported values by well under 1%. Therefore, I think the assumption is pretty reasonable.