問題描述
我有一些數(shù)據(jù)存儲為帶有 dtype=object
的 numpy 數(shù)組,我想提取一列列表并將其轉(zhuǎn)換為 numpy 數(shù)組.這似乎是一個簡單的問題,但我發(fā)現(xiàn)解決它的唯一方法是將整個事物重鑄為列表列表,然后將其重鑄為 numpy 數(shù)組.有沒有更 Pythonic 的方法?
I have some data which is stored as a numpy array with dtype=object
, and I would like to extract one column of lists and convert it to a numpy array. It seems like a simple problem, but the only way I've found to solve it is to recast the entire thing as a list of lists and then recast it as a numpy array. Is there a more pythonic approach?
import numpy as np
arr = np.array([[1, ['a', 'b', 'c']], [2, ['a', 'b', 'c']]], dtype=object)
arr = arr[:, 1]
print(arr)
# [['a', 'b', 'c'] ['a', 'b', 'c']]
type(arr)
# numpy.ndarray
type(arr[0])
# list
arr.shape
# (2,)
將數(shù)組重鑄為 dtype=str
會引發(fā) ValueError
,因為它試圖將每個列表轉(zhuǎn)換為字符串.
Recasting the array as dtype=str
raises a ValueError
since it is trying to convert each list to a string.
arr.astype(str)
# ValueError: setting an array element with a sequence
可以將整個數(shù)組重建為列表列表,然后將其轉(zhuǎn)換為 numpy 數(shù)組,但這似乎是一種迂回的方式.
It is possible to rebuild the entire array as a list of lists and then cast it as a numpy array, but this seems like a roundabout way.
arr_2 = np.array(list(arr))
type(arr_2)
# numpy.ndarray
type(arr_2[0])
# numpy.ndarray
arr_2.shape
# (2, 3)
有沒有更好的方法來做到這一點(diǎn)?
Is there a better way to do this?
推薦答案
雖然通過列表的方式比通過 vstack
的方式更快:
Though going by way of lists is faster than by way of vstack
:
In [1617]: timeit np.array(arr[:,1].tolist())
...
100000 loops, best of 3: 11.5 μs per loop
In [1618]: timeit np.vstack(arr[:,1])
...
10000 loops, best of 3: 54.1 μs per loop
vstack
正在做:
np.concatenate([np.atleast_2d(a) for a in arr[:,1]],axis=0)
一些替代方案:
In [1627]: timeit np.array([a for a in arr[:,1]])
100000 loops, best of 3: 18.6 μs per loop
In [1629]: timeit np.stack(arr[:,1],axis=0)
10000 loops, best of 3: 60.2 μs per loop
請記住,對象數(shù)組只包含指向內(nèi)存中其他位置的列表的指針.雖然 arr
的 2d 特性使得選擇第二列變得容易,但 arr[:,1]
實際上是一個列表列表.并且對其進(jìn)行的大多數(shù)操作都是這樣對待的.reshape
之類的東西不會跨越 object
邊界.
Keep in mind that the object array just contains pointers to the lists which are else where in memory. While the 2d nature of arr
makes it easy to select the 2nd column, arr[:,1]
is effectively a list of lists. And most operations on it treat it as such. Things like reshape
don't cross that object
boundary.
這篇關(guān)于將 numpy 列表數(shù)組轉(zhuǎn)換為 numpy 數(shù)組的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!