問題描述
這是一個自我回答的帖子.為什么?因為缺乏數據樣本,Power BI 中的許多問題都沒有得到解答.此外,許多人似乎想知道如何使用 Python 在 Power BI 中編輯數據表.當然,世界需要在 Power BI 中更廣泛地使用 Python.有些人認為您必須將 Python 片段應用到在其他地方加載的現有表.我對這篇文章的回答將向您展示如何在一個空的 Power BI 文件中使用幾行代碼構建一個(相當大的)數據樣本.
那么,如何在 Power BI 中使用 Python 構建數據樣本并對其進行更改?
我將向您展示如何構建包含分類值和數值的 10000
行的數據集.我正在使用 Python 庫
現在,使用 Transform >運行 Python 腳本
,插入上面的代碼片段,然后點擊 OK
得到這個:
您現在有一個包含 2 列和 3 行的初步表格.這是在 Power BI 中實現 Python 的一個非常簡潔的細節.這是運行代碼片段后可供您使用的三個不同數據集.Dataset
是默認構造的,但是因為我們從一個空表開始,所以它是空的.如果我們從一些其他數據開始,Run Python Script
的第一行解釋了這個表的用途# 'dataset' 保存了這個腳本的輸入數據
.它是以 pandas 數據框的形式構建的.最后一個表 df_metadata
只是我們真正感興趣的數據集的簡要描述:df_dataset
,但我將其添加到混合中是為了說明所有您在片段中制作的數據框將可供您使用.您通過單擊名稱旁邊的 Table
來選擇要繼續處理的表格.
就是這樣!您現在有一個混合數據類型表,可以繼續使用 Python 或 Power BI 本身進行處理:
從這里您可以:
- 使用任何菜單選項繼續處理您的桌子
- 插入另一個 Python 腳本
- 復制您的原始數據框并通過右鍵單擊
Queries
下的Table
創建一個Reference
繼續處理另一個版本:
This is a self-answered post. Why? Because many questions in Power BI go unanswered because of lacking data samples. Also, many seem to wonder how to edit data tables in Power BI using Python. And, of course, the world needs a more wide-spread usage of Python in Power BI. Some think that you have to apply a Python snippet to an existing table loaded elsewhere. My answer to this post will show you how to build a (fairly big) data sample with a few lines of code in an otherwise empty Power BI file.
So, how can you build a data sample and make changes to it using Python in Power BI?
I'll show you how to build a dataset of 10000
rows that contains both categorical and numerical values. I'm using the Python libraries numpy and pandas for the data generation and table operations, respectively. The snippet below simply draws a random element from two lists 10000
times to build two columns with a few street and city names, and adds a list of random numbers into the mix. Then I'm using pandas to organize the data in a dataframe. Using Python in the Power BI Power Query Editor
, your input has to be a table, and your output has to be a pandas dataframe.
Python snippet:
import numpy as np
import pandas as pd
np.random.seed(123)
streets=['Broadway', 'Bowery', 'Houston Street']
cities=['New York', 'Chicago', 'Baltimore']
rows = 1000
lst_cities=np.random.choice(cities,rows).tolist()
lst_streets=np.random.choice(streets,rows).tolist()
lst_numbers= np.random.randint(low=0, high=100, size=rows).tolist()
df_dataset=pd.DataFrame({'City':lst_cities,
'Street':lst_streets,
'ID':lst_numbers})
df_metadata = pd.DataFrame([df_dataset.shape])
Power BI:
In Power BI Desktop, click Enter Data
to go to the Power Query Editor
. In the following dialog window, do absolutely nothing but clicking OK
. The result is an empty table and two steps under Applied steps
:
Now, use Transform > Run Python Script
, insert the snippet above and click OK
to get this:
You now have a preliminary table with 2 columns and 3 rows. And this is a pretty neat detail of the implementation of Python in Power BI. These are three different datasets that are made available to you after running your snippet. Dataset
is constructed by default, but is empty since we started out with an empty table. If we started out with some other data, the first line of the Run Python Script
explains the purpose of this table # 'dataset' holds the input data for this script
. And it is constructed in the form of a pandas dataframe. The last table df_metadata
is only a brief description of the dataset we're really interested in: df_dataset
, but I've added it to the mix in order to illustrate that all dataframes made by you in your snippet will be available to you. You chose which table to continue working on by clicking Table
next to the name.
And that's it! You now have a table of mixed datatypes to keep working on either using Python or Power BI itself:
From here you can:
- Keep working on your table using any menu option
- Insert another Python script
- Duplicate your original dataframe and keep working on another version by creating a
Reference
by right-clickingTable
underQueries
:
這篇關于如何使用 Python 在 PowerBI 中制作可重現的數據樣本?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!