Python拆分列中文和字元

需求描述：我們日常實際的工作中經常需要把一列資料按中文和數位或者字母單獨拆分出來

匯入所需的庫：

import pandas as pd

定義函數 extract_characters，該函數接受三個引數：file_path（Excel檔案路徑）、sheet_name（工作表名稱）和 column_name（列名）。

def extract_characters(file_path, sheet_name, column_name):

讀取Excel檔案並將其儲存到DataFrame中：

df = pd.read_excel(file_path, sheet_name=sheet_name)

建立兩個新的列 '中文' 和 '其他字元'，並將它們新增到DataFrame中：

df['中文'] = ''
df['其他字元'] = ''

遍歷DataFrame的每一行資料：

for index, row in df.iterrows():

獲取指定列的值，並將其轉換為字串：

text = str(row[column_name])

初始化兩個空字串變數 chinese 和 other，用於儲存中文字元和其他字元：

chinese = ''
other = ''

遍歷每個字元：

for char in text:

判斷當前字元是否為中文字元（Unicode範圍為\u4e00到\u9fff）：

if '\u4e00' <= char <= '\u9fff':

如果是中文字元，則將其新增到 chinese 字串中：

chinese += char

如果不是中文字元，則將其新增到 other 字串中：

other += char

將中文字元集合新增到新的 '中文' 列中：

df.at[index, '中文'] = chinese

將其他字元集合新增到新的 '其他字元' 列中：

df.at[index, '其他字元'] = other

返回處理後的DataFrame物件：

return df

定義測試範例的檔案路徑、工作表名稱和列名：

file_path = r'測試.xlsx'
sheet_name = 'Sheet1'
column_name = '店鋪銷售sku'

呼叫 extract_characters 函數，並將結果儲存在 result_df 中：

result_df = extract_characters(file_path, sheet_name, column_name)

將處理後的DataFrame儲存為Excel檔案：

result_df.to_excel('result.xlsx', index=False)

完整程式碼：