pandas: Dataframe

pandas で特定の文字列を含む行を抽出（完全一致、部分一致）

複数条件:

&, |

否定:

~ (NOT)

型変換

Python: pandas でカラムの型を変換する

検索

選択肢: .isin(list)

1
2
result = df[df["code"].isin(["0", "1"])]    # 0,1 のみ
result = df[~(df["code"].isin(["0", "1"]))]  # 0, 1以外

ユニーク

unique:

1
items = df["code"].unique()

重複を除外(drop_duplicates):

1
result = df[["code", "name"]].drop_duplicates()

重複を抽出:

1
2
3
4
size = df.groupby(keys).size()
res = size[ size > 1 ]
if res.shape[0] > 0:
    print(res)

変換

dict 一覧:

1
dict_list = df.to_dict(orient="records")

GROUP BY

dropna=False 指定しないと、キーの値が NaN(None) の場合、対象にならない

1
df.groupby(keys, dropna=False)

結果を DataFrame にするにはインデックスをリセットする

1
df = df.groupby(keys, dropna=False)["amount"].aggregate("sum").reset_index()

nan 行の削除

１つでも nan が含まれていたら:

1
df = df.dropna()

指定カラムに nan が含まれていたら削除:

1
df = df.dropna(subset=["amount"])

nan を None に変更

1
2
3
import numpy as np
excel = pd.read_excel(...)
excel = excel.replace([np.nan], [None])

値の変更

DataFrame の値の更新 - pandas

1
2
3
4
def customer_names(row):
    return Customer.objects.filter(code=row["code]).values_list("first_name", "last_name").first()

df[['first_name', 'last_name']] = df.apply(customer_names, axis=1, reduce=False)

差分確認( indicator=True)

1
meged = pd.merge(df_left, df_right, on=["key1","key2","key3"], how='outer',indicator=True)

`_merge`	意味
`both`	一致
`left_only`	`df_left` のみ存在
`right_only`	`df_right` のみ存在

【Pandas】両者の Dataframe の差分を確認したい！

数値型への変換

Pandas で欠損のある列の文字列型の数値を数値型に変換する

1
2
# value_str 列の値を数値に変えられるものは変えた列を作る
df["value_num"] = pd.to_numeric(df["value_str"], errors="coerce")

四捨五入

round は五捨五超入（偶数丸め）

当月売上ｗ粗利を 1000 円単位で表示

1
df_epm["sales_profit"] = ((df_epm["当月売上粗利"].str.replace(',', '').astype(int) / 1000) + 0.01).round(0)

openpyxl

https://www.soudegesu.com/en/post/python/pandas-with-openpyxl/

1
2
3
4
5
6
wb = self.excel_from_response(response)
ws = wb.worksheets[0]
data = ws.values
columns = next(data)[0:]        # １行目ヘッダー
df = pd.DataFrame(ws.values, columns=columns)
df.drop(df.index[[0]])

列の削除

3 カラム削除:

1
mg.drop(["created_at", "updated_at", "md5"], axis=1)

Excel

pandas で Excel ファイル（xlsx, xls）の読み込み（read_excel）

1
2
3
4
5
data = pd.read_excel(
    "/Users/hdknr/Downloads/処理変更案.xlsx",
    sheet_name="名寄一覧表",
    skiprows=[0],   # 1行目をスキップ
)

pandas: Dataframe#

検索#

ユニーク#

変換#

GROUP BY#

nan 行の削除#

nan を None に変更#

値の変更#

差分確認( indicator=True)#

数値型への変換#

四捨五入#

openpyxl#

列の削除#

Excel#