SICP問題1.45の回答と説明（SICP1.3.4, ex1.45） : サルノオボエガキ

この記事の対象読者：『計算機プログラムの構造と解釈』を読んだことがあって、問題1.45に納得がいってないひと

1.3.4 値として返される手続き（計算機プログラムの構造と解釈第二版）

SICPの読んでたときにけっこう苦労した練習問題。
n乗根を求めるための平均緩和法の適用回数を「実験して確かめよ」って書いてあるけど、たぶん実験だけでは分からない。回答自体はカンニング（検索）して見つけたけど、なぜそうなのかをちゃんと説明しているのを見つけられなかったので、この記事で説明してみる。

平均緩和法、不動点探索の復習

回答のまえに平均緩和法や不動点探索についておさらいしておきます。

不動点探索（fixed-point）ではx=f(x)となる点をみつけます。y=f(x)を計算してさらにf(y)を計算して、というのを繰り返す。図解すると次の赤線のような軌道を描きます。

ところがこの方法では上手くいかない場合がある。次のようにy=xを軸として対称な曲線だと同じ軌道をグルグルしてしまい不動点に行き着かない。

このようなときに平均緩和法を適用するといい具合に非対称に歪めてくれる。この変形のおかげで不動点探索がうまくいくようになります。

四乗根を平均緩和法で求める

平均緩和の不動点として立方根を見つけることが出来る. 困ったことに, この方法は四乗根には使えない.

問題文にはこう書いてあるけど、実際やってみると四乗根を見つけることはできます。見つけられないのは（上で説明したとおり）y=xの軸に対称な場合ですが、平均緩和法を適用すれば非対称になるはずなので見つけられるということです。
でもこれをコーディングして実行すると三乗根に比べてだいぶ遅いです。

;;; 安全に計算の動きをおうため、計算回数の上限を指定できるように改造した
 (define (fixed-point-limited f first-guess limit)
   (define (close-enough? v1 v2)
     (< (abs (- v1 v2)) tolerance))
   (define (try guess count)
     (let ((next (f guess)))
       ;; 計算過程を表示する
       (display count) (display ": ") (display next) (newline)
       (if (or (= count limit) (close-enough? guess next))
           (list count next) ; 計算回数も一緒に返す
           (try next (+ count 1)))))
   (try first-guess 1))

 ;;; 平均緩和法1回での4乗根計算手続き
 (define (root4-ad1 x limit)
   (fixed-point-limited
     (average-damp (lambda (y) (/ x
                                  (* y y y))))
    1.0
    limit))

 (root4-ad1 16 10)
 ; (10 1.8597959434977196)

 (root4-ad1 16 100)
 ; (100 1.9364452032204889)
 (root4-ad1 16 1000)
 ; (1000 1.9781656271972445)

 ;(time (root4-ad1 81 0))
 ; real 176.408
 ; user 175.530
 ; sys    0.380
 ; (449999982 3.000050001249936)

81の四乗根を求めるのに4億回も計算して176秒も掛かっている。

なぜ四乗根の計算は遅い？

四乗根の不動点探索の様子を見てみましょう。平均緩和法を一回適用したグラフはこのようになります。その上での不動点を求める軌道はこのように螺旋を描きます。

比べて、平均緩和法を2回適用した場合は、このようにまっすぐに不動点に向かう軌道になります。

直感的に上より下のほうが早く不動点に行き着きそうですよね。

不動点探索に向いた曲線とその求め方

不動点探索に向いた曲線、早く探索が収束するための曲線ってどんなのでしょう。上のふたつのグラフを比較してみると、グラフの最下点の位置が重要そうに思えます。
最下点が不動点の位置より右だと、探索軌道が螺旋を描いてしまう。最下点が不動点より左だとまっすぐな探索軌道を描く。
最下点と不動点のx軸上の位置関係が、最下点＜＝不動点となっていると良さそうです。

ということで平均緩和法を複数回適用した場合に、その最下点がどうなるか調べてみれば良さそうですね。
調べやすくするために平均緩和法を適用した数式を変形してみます。

・準備
定数mのｎ乗根を求めるための不動点探索用関数を以下のように定義する。
\[g_{n}(x)=\frac{m}{x^{n-1}}\]

$g_{n}(x)$に平均緩和法をｋ回適用した関数をつぎのように定義する
\[
\begin{eqnarray}
f_{n,k}(x) & = & \frac{1}{2} (x+ \frac{1}{2} (x+\ldots \frac{1}{2} (x+ g_n(x) )\ldots)) \nonumber \\
& = & \frac{1}{2} (x+\frac{1}{2} (x+ \ldots \frac{1}{2} (x+ \frac{m}{x^{n-1}} ) \ldots )) \nonumber \\
& = & (\frac{1}{2} + \frac{1}{2^2}+ \frac{1}{2^3}+ \ldots + \frac{1}{2^k})x + \frac{1}{2^k} \frac{m}{x^{n-1}} \nonumber \\
& = & \frac{2^k-1}{2^k}x + \frac{1}{2^k} \frac{m}{x^{n-1}} \nonumber \\
& = & \frac{1}{2^k}((2^k-1)x+ \frac{m}{x^{n-1}})
\end{eqnarray}
\]

$f_{n,k}$の最下点を調べたい。そのため式（１）を微分して0になる点を調べる。
\[
Df_{n,k}(x)= \frac{1}{2^k}(2^k-1+(1-n) \frac{m}{x^n})
\]
\[
\begin{eqnarray}
Df_{n,k}(x) & = & 0 \nonumber \\
\frac{1}{2^k}(2^k-1+(1-n) \frac{m}{x^n}) & = & 0 \nonumber \\
2^k-1+(1-n) \frac{m}{x^n} & = & 0 \nonumber \\
\frac{m}{x^n} & = & \frac{1-2^k}{1-n}
\end{eqnarray}
\]

さて$f_{n,k}$の不動点は$\sqrt[n]{m}$である。不動点にあるとき式（２）の左辺は１になる。
\[
\begin{eqnarray}
1 & = & \frac{1-2^k}{1-n} \nonumber \\
1-n & = & 1-2^k \nonumber \\
n & = & 2^k \nonumber \\
\log_2 n & = & \log_2 2^k \nonumber \\
& = & k \log_2 2 = k \nonumber \\
\log_2 n & = & k
\end{eqnarray}
\]

回答

ようやく求める平均緩和回数がわかりそうだ。式（３）の k は整数にしないと回数にならないので、左辺を切り上げるか切り捨てるかしないといけない。

切り上げ：式（２）の右辺は１より大きくなる。左辺のxは不動点より小さい値でなくてはならない
切り捨て：式（２）の右辺は１より小さくなる。左辺のxは不動点より大きい値でなくてはならない

不動点と最下点xの位置関係は「最下点ｘ＜＝不動点」が良いってことはグラフを描いて分かってました。なので式（３）は切り上げのほうが良さそうです。
まとめると平均緩和法の適用回数kはつぎの式になります。

\[ k= \lceil \log_{2}{n} \rceil \]

;;; ｎ乗根を探す手続き
 (define (root-n x n)
   (fixed-point
    ((repeated average-damp
                 ;; 回数：2を底とするlog nの小数点以下切上げ
                 (ceiling (/ (log n) (log 2))))
     (lambda (y) (/ x
                    (expt y (- n 1))))) ; exptはべき乗
    1.0))

 ;; test
 (root-n 2 2)
 ; 1.4142135623746899
 (root-n 27 3)
 ; 3.0000234260125787
 (root-n 256 8)
 ; 2.0000000000039666
 (root-n 65536 16)
 ; 2.000000000076957

参考リンク

切り捨てが多いけど理由説明は特にないんですよね。上のグラフの議論から切り上げのほうが良いと私は考えてます。

[SICP 1.3.4] Exercise 1.40-1.46 Memo | Nisei Kimura's Blog

この回答だと$\log_{2}{n}$を切り捨て（floor）している

SICP 孤読書会 - 1.3 高階手続きによる抽象 (1.3.1 〜 1.3.4) - プログラミング的な何か、あと自転車日本一周とか

回答コードだとやっぱりfloorしてる。しかしよくみんな$\log_2{n}$だと気づくなー。

SICP 1.3.4 Procedures as Returned Values - プログラミング再入門

こちらはceiling

SICP Exercises Section 1.3.4 | Ivan Ivanov's Weblog

floor

問題1.45 – SICP（計算機プログラムの構造と解釈）その23 : Serendip - Webデザイン・プログラミング

四捨五入（round）は新しいパターン

問題1.45 - n-oohiraのSICP日記 - sicp

floor、たぶん上リンクのようなのを参考にした結果

SICP 1.45证明？ - 数学 - 知乎

中国語ですが上と同じような議論で証明してあります。結論は$m \geq \log_{2}{n}$で、切り上げです！

3 件のコメント:

color commands ark さんのコメント...: Computer science is a terrible name. First, it's not a science. And it's also not about computers.
3:12 People in the future will recognize that people were really formalizing intuitions about process -- how to do things. Talking precisely about how to knowledge. As opposed to geometry that talks about what is true.
5:36 What's a process? A process is like a magical spirit that lives in the computer and does something. What directs a process is a pattern of rules called a procedure. Procedures are the spells. The programming language is the language for casting the spells.
7:25 Computer science is the business is in formalizing the "how to" imperative knowledge.
10:12 As opposed to the constraints in other kinds of engineering, where the constraints of what you can build are the constraints of physical systems, the constraints imposed in building large software systems are the limitations of our own minds.
10:55 Abstraction. Engineering technique whereby a "black box" can be used without knowing its implementation details. And these "black boxes" can be combined to create even more complex systems.; 12:03 午前
xcom2 cheats さんのコメント...: This is not the 6.001 given to undergraduates. I actually took 6.001 fall semester of '86 and it was given in 10-250 which is a giant auditorium, not this tiny classroom. There were about 200 students in the real 6.001. I don't even know who these people could be, because 001 wasn't offered to graduate students. (Although, the lecture itself is the same one we got as undergraduates -- it was all about managing complexity).
Read more; 2:04 午前
why do chia seeds gel さんのコメント...: This is not the 6.001 given to undergraduates. I actually took 6.001 fall semester of '86 and it was given in 10-250 which is a giant auditorium, not this tiny classroom. There were about 200 students in the real 6.001. I don't even know who these people could be, because 001 wasn't offered to graduate students. (Although, the lecture itself is the same one we got as undergraduates -- it was all about managing complexity).; 5:24 午後

コメントを投稿

2015-08-27