問題描述
我嘗試了以下代碼,但沒有發現 np.dot 和 np.multiply 與 np.sum 之間的區別
這里是 np.dot 代碼
logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)打印(logprobs.shape)打印(日志問題)成本 = (-1/m) * logprobs打印(成本.形狀)打印(類型(成本))打印(成本)
它的輸出是
(1, 1)[[-2.07917628]](1, 1)<類'numpy.ndarray'>[[ 0.693058761039 ]]
這是 np.multiply 與 np.sum 的代碼
logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))打印(logprobs.shape)打印(日志問題)成本 = - logprobs/m打印(成本.形狀)打印(類型(成本))打印(成本)
它的輸出是
<代碼>()-2.07917628312()<類'numpy.float64'>0.693058761039
我無法理解類型和形狀的差異,而兩種情況下的結果值相同
即使在壓縮前代碼的情況下成本值與后相同但類型保持相同
cost = np.squeeze(cost)打印(類型(成本))打印(成本)
輸出是
<class 'numpy.ndarray'>0.6930587610394646
你正在做的是計算 二元交叉熵損失,用于衡量模型的預測(此處為:A2
)與真實輸出(這里:Y
).
這是您的案例的可重現示例,它應該解釋為什么您在第二種情況下使用 np.sum
在[88]中:Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])在 [89] 中:A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])在 [90] 中:logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)# `np.dot` 返回二維數組,因為它的參數是二維數組在 [91] 中:logprobs出[91]:數組([[-0.78914626]])在 [92] 中:成本 = (-1/m) * logprobs在 [93] 中:成本出[93]:數組([[ 0.09864328]])在 [94] 中:logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))# np.sum 返回標量,因為它對 2D 數組中的所有內容求和在 [95] 中:logprobs輸出[95]:-0.78914625761870361
請注意,np.dot
僅對與此處 (1x8) 和 (8x1)
匹配的內部尺寸求和.因此,8
s 將在點積或矩陣乘法期間消失,產生的結果為 (1x1)
,這只是一個 標量,但返回作為形狀 (1,1)
.
另外,最重要的是注意這里 np.dot
與 np.matmul
完全相同,因為輸入是二維數組(即矩陣)
在[107]中:logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)在 [108] 中:logprobs出[108]:數組([[-0.78914626]])在 [109] 中:logprobs.shape出 [109]: (1, 1)
以標量值的形式返回結果
np.dot
或 np.matmul
根據輸入數組返回任何結果數組形狀.如果輸入是二維數組,即使使用 out=
參數也無法返回 標量.但是,我們可以使用 np.asscalar()
如果結果數組的形狀為 (1,1)
(或更一般地說是 scalar 包裹在 nD 數組中的值)
在 [123]: np.asscalar(logprobs)輸出[123]:-0.7891462576187036在 [124] 中:類型(np.asscalar(logprobs))出[124]:浮動
<塊引用>
ndarray 大小為 1 到 標量 值
在 [127]: np.asscalar(np.array([[[23.2]]]))出局[127]:23.2在 [128] 中:np.asscalar(np.array([[[[23.2]]]]))出局[128]:23.2
I have tried the following code but didn't find the difference between np.dot and np.multiply with np.sum
Here is np.dot code
logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
print(logprobs.shape)
print(logprobs)
cost = (-1/m) * logprobs
print(cost.shape)
print(type(cost))
print(cost)
Its output is
(1, 1)
[[-2.07917628]]
(1, 1)
<class 'numpy.ndarray'>
[[ 0.693058761039 ]]
Here is the code for np.multiply with np.sum
logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
print(logprobs.shape)
print(logprobs)
cost = - logprobs / m
print(cost.shape)
print(type(cost))
print(cost)
Its output is
()
-2.07917628312
()
<class 'numpy.float64'>
0.693058761039
I'm unable to understand the type and shape difference whereas the result value is same in both cases
Even in the case of squeezing former code cost value become same as later but type remains same
cost = np.squeeze(cost)
print(type(cost))
print(cost)
output is
<class 'numpy.ndarray'>
0.6930587610394646
What you're doing is calculating the binary cross-entropy loss which measures how bad the predictions (here: A2
) of the model are when compared to the true outputs (here: Y
).
Here is a reproducible example for your case, which should explain why you get a scalar in the second case using np.sum
In [88]: Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])
In [89]: A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])
In [90]: logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
# `np.dot` returns 2D array since its arguments are 2D arrays
In [91]: logprobs
Out[91]: array([[-0.78914626]])
In [92]: cost = (-1/m) * logprobs
In [93]: cost
Out[93]: array([[ 0.09864328]])
In [94]: logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
# np.sum returns scalar since it sums everything in the 2D array
In [95]: logprobs
Out[95]: -0.78914625761870361
Note that the np.dot
sums along only the inner dimensions which match here (1x8) and (8x1)
. So, the 8
s will be gone during the dot product or matrix multiplication yielding the result as (1x1)
which is just a scalar but returned as 2D array of shape (1,1)
.
Also, most importantly note that here np.dot
is exactly same as doing np.matmul
since the inputs are 2D arrays (i.e. matrices)
In [107]: logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)
In [108]: logprobs
Out[108]: array([[-0.78914626]])
In [109]: logprobs.shape
Out[109]: (1, 1)
Return result as a scalar value
np.dot
or np.matmul
returns whatever the resulting array shape would be, based on input arrays. Even with out=
argument it's not possible to return a scalar, if the inputs are 2D arrays. However, we can use np.asscalar()
on the result to convert it to a scalar if the result array is of shape (1,1)
(or more generally a scalar value wrapped in an nD array)
In [123]: np.asscalar(logprobs)
Out[123]: -0.7891462576187036
In [124]: type(np.asscalar(logprobs))
Out[124]: float
ndarray of size 1 to scalar value
In [127]: np.asscalar(np.array([[[23.2]]]))
Out[127]: 23.2
In [128]: np.asscalar(np.array([[[[23.2]]]]))
Out[128]: 23.2
這篇關于np.dot 和 np.multiply 與 np.sum 在二進制交叉熵損失計算中的區別的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!