[資料分析與視覺化] 基於matplotlib和plottable庫繪製精美表格

2023-07-10 15:00:59

plottable是一個Python庫,用於在matplotlib中繪製精美客製化的圖形表格。plottable的官方倉庫地址為:plottable。本文主要參考其官方檔案,plottable的官方檔案地址為:plottable-doc。plottable安裝命令如下:

pip install plottable

本文所有程式碼見:Python-Study-Notes

# jupyter notebook環境去除warning
import warnings
warnings.filterwarnings("ignore")
import plottable
# 列印plottable版本
print('plottable version:', plottable.__version__)
# 列印matplotlib版本
import matplotlib as plt
print('matplotlib version:', plt.__version__)
plottable version: 0.1.5
matplotlib version: 3.5.3

1 使用說明

1.1 基礎使用

下面的程式碼展示了一個簡單的圖形表格繪製範例,plottable提供了Table類以建立和展示圖形表格。

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from plottable import Table

# 生成一個包含亂資料的表格
d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 基於pandas表格資料建立和展示圖形表格
tab = Table(d)

# 儲存圖片
plt.savefig("table.jpg", dpi=300,bbox_inches='tight')
plt.show()

對於plottable的Table類,其構造引數介紹如下:

  • df: pd.DataFrame, 要顯示為表格的DataFrame物件
  • ax: mpl.axes.Axes, 繪製表格的座標軸物件,預設為None
  • index_col: str, DataFrame中的索引列名。預設為None
  • columns: List[str], 哪些列用於繪圖。為None表示使用所有列
  • column_definitions: List[ColumnDefinition], 需要設定樣式列的style定義類,預設為None
  • textprops: Dict[str, Any], 文字屬性的字典,預設為空字典
  • cell_kw: Dict[str, Any], 單元格屬性的字典,預設為空字典
  • col_label_cell_kw: Dict[str, Any], 列標籤單元格屬性的字典,預設為空字典
  • col_label_divider: bool, 是否在列標籤下方繪製分隔線,預設為True。
  • footer_divider: bool, 是否在表格下方繪製分隔線,預設為False。
  • row_dividers: bool, 是否顯示行分隔線,預設為True
  • row_divider_kw: Dict[str, Any], 行分隔線屬性的字典,預設為空字典
  • col_label_divider_kw: Dict[str, Any], 列標籤分隔線屬性的字典,預設為空字典
  • footer_divider_kw: Dict[str, Any], 頁尾分隔線屬性的字典,預設為空字典
  • column_border_kw: Dict[str, Any], 列邊框屬性的字典,預設為空字典
  • even_row_color: str | Tuple, 偶數行單元格的填充顏色,預設為None
  • odd_row_color: str | Tuple, 奇數行單元格的填充顏色,預設為None

在這些引數之中,控制表格繪圖效果的引數有以下幾類:

  • column_definitions:列的樣式自定義
  • textprops:文字的樣樣式自定義
  • cell_kw:表格單元格的樣式自定義
  • 其他設定引數的樣式

在這些引數中,最重要的引數是column_definitions,因為column_definitions可以控制幾乎所有的繪圖效果。接下來本文主要對column_definitions的使用進行具體介紹。

1.2 列的樣式自定義

plottable提供了ColumnDefinition類(別名ColDef)來自定義圖形表格的單個列的樣式。ColumnDefinition類的構造引數如下:

  • name: str,要設定繪圖效果的列名
  • title: str = None,用於覆蓋列名的繪圖示題
  • width: float = 1,列的寬度,預設情況下各列的寬度為軸的寬度/列的總數
  • textprops: Dict[str, Any] = field(default_factory=dict),提供給每個文字單元格的文字屬性
  • formatter: Callable = None,用於格式化文字外觀的可呼叫函數
  • cmap: Callable = None,根據單元格的值返回顏色的可呼叫函數
  • text_cmap: Callable = None,根據單元格的值返回顏色的可呼叫函數
  • group: str = None,設定每個組都會在列標籤上方顯示的分組列標籤
  • plot_fn: Callable = None,一個可呼叫函數,將單元格的值作為輸入,並在每個單元格上建立一個子圖並繪製在其上
    要向其傳遞其他引數
  • plot_kw: Dict[str, Any] = field(default_factory=dict),提供給plot_fn的附加關鍵字引數
  • border: str | List = None,繪製垂直邊界線,可以是"left" / "l"、"right" / "r"或"both"

通過ColumnDefinition類來設定Table類的column_definitions引數,可以實現不同表格列樣式的效果。如果是同時多個列的繪圖效果,則需要使用[ColumnDefinition,ColumnDefinition]列表的形式。一些使用範例如下

設定列標題和列寬

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from plottable import ColumnDefinition, ColDef, Table

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# name表示設定哪個列的樣式
tab = Table(d, column_definitions=[ColumnDefinition(name="A", title="Title A"),
                                   ColumnDefinition(name="D", width=2)])

plt.show()

設定列的文字屬性和文字格式

from plottable.formatters import decimal_to_percent

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# index列的文字居中,加粗
# 列A的文字數值改為百分制
tab = Table(d, column_definitions=[ColumnDefinition(name="index", textprops={"ha": "center", "weight": "bold"}),
                                   ColumnDefinition(name="A", formatter=decimal_to_percent)])

plt.show()

設定列單元格背景色和字型顏色

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# cmap設定單元格背景色
tab = Table(d, column_definitions=[ColumnDefinition(name="A", cmap=matplotlib.cm.tab20, text_cmap=matplotlib.cm.Reds),
                                   ColumnDefinition(name="B", cmap=matplotlib.cm.tab20b),
                                   ColumnDefinition(name="C", text_cmap=matplotlib.cm.tab20c)])

plt.show()

設定列的分組名

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 將列B和列C視為同一組,該組命名為group_name
tab = Table(d, column_definitions=[ColumnDefinition(name="B", group="group_name"), 
                                   ColumnDefinition(name="C", group="group_name")])

plt.show()

設定列邊框

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 將列B和列C視為同一組,該組命名為group_name
tab = Table(d, column_definitions=[ColumnDefinition(name="A", border="l"), 
                                   ColumnDefinition(name="C",  border="both")])

plt.show()

呼叫函數的使用

ColumnDefinition類的plot_fn和plot_kw引數提供了自定義函數實現表格效果繪製的功能。其中plot_fn表示待呼叫的函數,plot_kw表示待呼叫函數的輸入引數。此外在plotable.plots預置了一些效果函數,我們可以參考這些效果函數定義自己的繪圖函數。預置效果函數如下:

from pathlib import Path
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap
from plottable import ColumnDefinition, Table
# 呼叫預置繪圖函數
from plottable.plots import image,monochrome_image,circled_image,bar,percentile_bars,percentile_stars,progress_donut

cmap = matplotlib.cm.tab20
# 柱狀圖繪製
fig, ax = plt.subplots(figsize=(1, 1))
# 0.7表示數值,lw邊框線寬
b = bar(ax, 0.7, plot_bg_bar=True, cmap=cmap, annotate=True, lw=2, height=0.35)
plt.show()

# 星星百分比圖
fig, ax = plt.subplots(figsize=(2, 1))
stars = percentile_stars(ax, 85, background_color="#f0f0f0")

# 圓環圖
fig, ax = plt.subplots(figsize=(1, 1))
donut = progress_donut(ax, 73, textprops={"fontsize": 14})
plt.show()

對於待呼叫的函數,可以通過help函數檢視這些函數的引數含義。

help(progress_donut)
Help on function progress_donut in module plottable.plots:

progress_donut(ax: matplotlib.axes._axes.Axes, val: float, radius: float = 0.45, color: str = None, background_color: str = None, width: float = 0.05, is_pct: bool = False, textprops: Dict[str, Any] = {}, formatter: Callable = None, **kwargs) -> List[matplotlib.patches.Wedge]
    Plots a Progress Donut on the axes.
    
    Args:
        ax (matplotlib.axes.Axes): Axes
        val (float): value
        radius (float, optional):
            radius of the progress donut. Defaults to 0.45.
        color (str, optional):
            color of the progress donut. Defaults to None.
        background_color (str, optional):
            background_color of the progress donut where the value is not reached. Defaults to None.
        width (float, optional):
            width of the donut wedge. Defaults to 0.05.
        is_pct (bool, optional):
            whether the value is given not as a decimal, but as a value between 0 and 100.
            Defaults to False.
        textprops (Dict[str, Any], optional):
            textprops passed to ax.text. Defaults to {}.
        formatter (Callable, optional):
            a string formatter.
            Can either be a string format, ie "{:2f}" for 2 decimal places.
            Or a Callable that is applied to the value. Defaults to None.
    
    Returns:
        List[matplotlib.patches.Wedge]

通過plot_fn和plot_kw引數設定自定義繪圖函數和函數輸入引數,可以展示不同的繪圖效果,如下所示:

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# plot_fn和plot_kw
tab = Table(d, textprops={"ha": "center"},
            column_definitions=[ColumnDefinition(name="B", plot_fn=bar,plot_kw={'plot_bg_bar':True,'cmap':cmap, 
                                'annotate':True, 'height':0.5}),
                                ColumnDefinition(name="D", plot_fn=progress_donut,plot_kw={'is_pct':True,})])

plt.show()

自定義文字格式

plottable提供了以下三個自定義函數來表示不同的文字格式:

  • decimal_to_percent:將數值資料變為百分比
  • tickcross:將數值格式化為✔或✖
  • signed_integer:新增正負符號

我們可以通過ColumnDefinition的formatter來設定文字格式,如下所示:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from plottable import ColumnDefinition, Table
from plottable.formatters import decimal_to_percent,tickcross,signed_integer

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
tab = Table(d, column_definitions=[ColumnDefinition(name="A", formatter=decimal_to_percent),
                                   ColumnDefinition(name="C", formatter=tickcross),
                                   ColumnDefinition(name="D", formatter=signed_integer)])

plt.show()

此外,也可以自定義函數來設定文字格式,如下所示:

def setformat(x):
    # 使用format格式函數
    return "{:.2e}".format(x)

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
tab = Table(d, textprops={"ha": "center"},column_definitions=[ColumnDefinition(name="B", formatter=setformat),
                                   ColumnDefinition(name="D", formatter=lambda x: round(x, 2))])

plt.show()

1.3 行列自定義

存取行列單元格

plottable提供了直接存取Table範例的某一行、某一列的方法,如下所示:

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 範例化Table物件
tab = Table(d)

# 根據列名,提取整列
tab.columns['A']
Column(cells=[TextCell(xy=(1, 0), content=0.0, row_idx=0, col_idx=1), TextCell(xy=(1, 1), content=0.09, row_idx=1, col_idx=1), TextCell(xy=(1, 2), content=0.95, row_idx=2, col_idx=1), TextCell(xy=(1, 3), content=0.08, row_idx=3, col_idx=1), TextCell(xy=(1, 4), content=0.92, row_idx=4, col_idx=1)], index=1)
# 讀取某列第1行的內容
tab.columns['B'].cells[1].content
0.04
# 根據行索引,提取整行
tab.rows[1]
Row(cells=[TextCell(xy=(0, 1), content=1, row_idx=1, col_idx=0), TextCell(xy=(1, 1), content=0.09, row_idx=1, col_idx=1), TextCell(xy=(2, 1), content=0.04, row_idx=1, col_idx=2), TextCell(xy=(3, 1), content=0.42, row_idx=1, col_idx=3), TextCell(xy=(4, 1), content=0.64, row_idx=1, col_idx=4), TextCell(xy=(5, 1), content=0.26, row_idx=1, col_idx=5)], index=1)
# 提取表頭列名
tab.col_label_row
Row(cells=[TextCell(xy=(0, -1), content=index, row_idx=-1, col_idx=0), TextCell(xy=(1, -1), content=A, row_idx=-1, col_idx=1), TextCell(xy=(2, -1), content=B, row_idx=-1, col_idx=2), TextCell(xy=(3, -1), content=C, row_idx=-1, col_idx=3), TextCell(xy=(4, -1), content=D, row_idx=-1, col_idx=4), TextCell(xy=(5, -1), content=E, row_idx=-1, col_idx=5)], index=-1)

設定單元格樣式

在上面的例子可以看到plottable直接存取表格行列物件,因此我們可以通過設定這些物件的繪圖屬性來直接更改其繪圖效果或文字效果,所支援更改的屬性如下:

  • 單元格屬性
    • set_alpha:設定單元格的透明度。
    • set_color:設定單元格的顏色。
    • set_edgecolor:設定單元格邊緣的顏色。
    • set_facecolor:設定單元格內部的顏色。
    • set_fill:設定單元格是否填充。
    • set_hatch:設定單元格的填充圖案。
    • set_linestyle:設定單元格邊緣線的樣式。
    • set_linewidth:設定單元格邊緣線的寬度。
  • 字型屬性
    • set_fontcolor:設定字型的顏色。
    • set_fontfamily:設定字型的家族。
    • set_fontsize:設定字型的大小。
    • set_ha:設定文字的水平對齊方式。
    • set_ma:設定文字的垂直對齊方式。

範例程式碼如下:

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 範例化Table物件
tab = Table(d)
# 設定行號為1的行的背景顏色
tab.rows[1].set_facecolor("grey")
# 設定B列的字型顏色
tab.columns['B'].set_fontcolor("red")
Column(cells=[TextCell(xy=(2, 0), content=0.38, row_idx=0, col_idx=2), TextCell(xy=(2, 1), content=0.69, row_idx=1, col_idx=2), TextCell(xy=(2, 2), content=0.15, row_idx=2, col_idx=2), TextCell(xy=(2, 3), content=0.74, row_idx=3, col_idx=2), TextCell(xy=(2, 4), content=0.41, row_idx=4, col_idx=2)], index=2)

2 繪圖範例

2.1 多行樣式設定

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from plottable import Table

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)

fig, ax = plt.subplots(figsize=(6, 3))

# row_dividers顯示行的分割線
# odd_row_color奇數行顏色
# even_row_color偶數行顏色
tab = Table(d, row_dividers=False, odd_row_color="#f0f0f0", even_row_color="#e0f6ff")

plt.show()

fig.savefig("table.jpg",dpi=300,bbox_inches='tight')

2.2 自定義單元格效果

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap

from plottable import ColumnDefinition, Table
from plottable.formatters import decimal_to_percent
from plottable.plots import bar, percentile_bars, percentile_stars, progress_donut

# 自定義顏色
cmap = LinearSegmentedColormap.from_list(
    name="BuYl", colors=["#01a6ff", "#eafedb", "#fffdbb", "#ffc834"], N=256
)

fig, ax = plt.subplots(figsize=(6, 6))

d = pd.DataFrame(np.random.random((5, 4)), columns=["A", "B", "C", "D"]).round(2)

tab = Table(
    d,
    cell_kw={
        "linewidth": 0,
        "edgecolor": "k",
    },
    textprops={"ha": "center"},
    column_definitions=[
        ColumnDefinition("index", textprops={"ha": "left"}),
        ColumnDefinition("A", plot_fn=percentile_bars, plot_kw={"is_pct": True}),
        ColumnDefinition(
            "B", width=1.5, plot_fn=percentile_stars, plot_kw={"is_pct": True}
        ),
        ColumnDefinition(
            "C",
            plot_fn=progress_donut,
            plot_kw={
                "is_pct": True,
                "formatter": "{:.0%}"
                },
            ),
        ColumnDefinition(
            "D",
            width=1.25,
            plot_fn=bar,
            plot_kw={
                "cmap": cmap,
                "plot_bg_bar": True,
                "annotate": True,
                "height": 0.5,
                "lw": 0.5,
                "formatter": decimal_to_percent,
            },
        ),
    ],
)

plt.show()

2.3 熱圖

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap
# ColDef是ColumnDefinition的別名
from plottable import ColDef, Table

# 自定義顏色
cmap = LinearSegmentedColormap.from_list(
    name="bugw", colors=["#ffffff", "#f2fbd2", "#c9ecb4", "#93d3ab", "#35b0ab"], N=256
)
# 建立資料
cities = [
    "TORONTO",
    "VANCOUVER",
    "HALIFAX",
    "CALGARY",
    "OTTAWA",
    "MONTREAL",
    "WINNIPEG",
    "EDMONTON",
    "LONDON",
    "ST. JONES",
]
months = [
    "JAN",
    "FEB",
    "MAR",
    "APR",
    "MAY",
    "JUN",
    "JUL",
    "AUG",
    "SEP",
    "OCT",
    "NOV",
    "DEC",
]

data = np.random.random((10, 12)) + np.abs(np.arange(12) - 5.5)
data = (1 - data / (np.max(data)))
data.shape
(10, 12)
# 繪圖
d = pd.DataFrame(data, columns=months, index=cities).round(2)
fig, ax = plt.subplots(figsize=(14, 5))

# 自定義各列的繪圖效果
column_definitions = [
    ColDef(name, cmap=cmap, formatter=lambda x: "") for name in months
] + [ColDef("index", title="", width=1.5, textprops={"ha": "right"})]

tab = Table(
    d,
    column_definitions=column_definitions,
    row_dividers=False,
    col_label_divider=False,
    textprops={"ha": "center", "fontname": "Roboto"},
    # 設定各個單元格的效果
    cell_kw={
        "edgecolor": "black",
        "linewidth": 0,
    },
)


# 設定列標題文字和背景顏色
tab.col_label_row.set_facecolor("white")
tab.col_label_row.set_fontcolor("black")
# 設定行標題文字和背景顏色
tab.columns["index"].set_facecolor("black")
tab.columns["index"].set_fontcolor("white")
tab.columns["index"].set_linewidth(0)

plt.show()

2.4 女子世界盃預測資料展示

step1 準備資料

下載範例資料,所有範例資料在plottable-example_notebooks

# 下載資料集
# !wget https://raw.githubusercontent.com/znstrider/plottable/master/docs/example_notebooks/data/wwc_forecasts.csv
from pathlib import Path

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap

from plottable import ColumnDefinition, Table
from plottable.cmap import normed_cmap
from plottable.formatters import decimal_to_percent
from plottable.plots import circled_image # image
cols = [
    "team",
    "points",
    "group",
    "spi",
    "global_o",
    "global_d",
    "group_1",
    "group_2",
    "group_3",
    "make_round_of_16",
    "make_quarters",
    "make_semis",
    "make_final",
    "win_league",
]

# 讀取資料
df = pd.read_csv(
    "data/wwc_forecasts.csv",
    usecols=cols,
)

# 展示資料
df.head()
team group spi global_o global_d group_1 group_2 group_3 make_round_of_16 make_quarters make_semis make_final win_league points
0 USA F 98.32748 5.52561 0.58179 0.82956 0.17044 0.00000 1.0 0.78079 0.47307 0.35076 0.23618 6
1 France A 96.29671 4.31375 0.52137 0.99483 0.00515 0.00002 1.0 0.78367 0.42052 0.30038 0.19428 6
2 Germany B 93.76549 3.96791 0.67818 0.98483 0.01517 0.00000 1.0 0.89280 0.48039 0.27710 0.12256 6
3 Canada E 93.51599 3.67537 0.56980 0.38830 0.61170 0.00000 1.0 0.59192 0.36140 0.20157 0.09031 6
4 England D 91.92311 3.51560 0.63717 0.70570 0.29430 0.00000 1.0 0.68510 0.43053 0.16465 0.08003 6

此外,我們需要準備每個國家對應的國旗圖片,該資料也在plottable-example_notebooks下。

# 讀取圖片路徑
flag_paths = list(Path("data/country_flags").glob("*.png"))
country_to_flagpath = {p.stem: p for p in flag_paths}

step2 資料處理

該步需要合併資料,將其轉換為plottable可用的資料結構。

# 重置列名
colnames = [
    "Team",
    "Points",
    "Group",
    "SPI",
    "OFF",
    "DEF",
    "1st Place",
    "2nd Place",
    "3rd Place",
    "Make Rd Of 16",
    "Make Quarters",
    "Make Semis",
    "Make Finals",
    "Win World Cup",
]

col_to_name = dict(zip(cols, colnames))
col_to_name
{'team': 'Team',
 'points': 'Points',
 'group': 'Group',
 'spi': 'SPI',
 'global_o': 'OFF',
 'global_d': 'DEF',
 'group_1': '1st Place',
 'group_2': '2nd Place',
 'group_3': '3rd Place',
 'make_round_of_16': 'Make Rd Of 16',
 'make_quarters': 'Make Quarters',
 'make_semis': 'Make Semis',
 'make_final': 'Make Finals',
 'win_league': 'Win World Cup'}
df[["spi", "global_o", "global_d"]] = df[["spi", "global_o", "global_d"]].round(1)

df = df.rename(col_to_name, axis=1)
# 刪除Points列
df = df.drop("Points", axis=1)
# 插入列
df.insert(0, "Flag", df["Team"].apply(lambda x: country_to_flagpath.get(x)))
df = df.set_index("Team")
df.head()
Flag Group SPI OFF DEF 1st Place 2nd Place 3rd Place Make Rd Of 16 Make Quarters Make Semis Make Finals Win World Cup
Team
USA data/country_flags/USA.png F 98.3 5.5 0.6 0.82956 0.17044 0.00000 1.0 0.78079 0.47307 0.35076 0.23618
France data/country_flags/France.png A 96.3 4.3 0.5 0.99483 0.00515 0.00002 1.0 0.78367 0.42052 0.30038 0.19428
Germany data/country_flags/Germany.png B 93.8 4.0 0.7 0.98483 0.01517 0.00000 1.0 0.89280 0.48039 0.27710 0.12256
Canada data/country_flags/Canada.png E 93.5 3.7 0.6 0.38830 0.61170 0.00000 1.0 0.59192 0.36140 0.20157 0.09031
England data/country_flags/England.png D 91.9 3.5 0.6 0.70570 0.29430 0.00000 1.0 0.68510 0.43053 0.16465 0.08003

step3 繪圖

# 設定顏色
cmap = LinearSegmentedColormap.from_list(
    name="bugw", colors=["#ffffff", "#f2fbd2", "#c9ecb4", "#93d3ab", "#35b0ab"], N=256
)
team_rating_cols = ["SPI", "OFF", "DEF"]
group_stage_cols = ["1st Place", "2nd Place", "3rd Place"]
knockout_stage_cols = list(df.columns[-5:])

# 單獨設定每一列的繪製引數
col_defs = (
    # 繪製第一部分效果
    [
        ColumnDefinition(
            name="Flag",
            title="",
            textprops={"ha": "center"},
            width=0.5,
            # 設定自定義效果展示函數
            plot_fn=circled_image,
        ),
        ColumnDefinition(
            name="Team",
            textprops={"ha": "left", "weight": "bold"},
            width=1.5,
        ),
        ColumnDefinition(
            name="Group",
            textprops={"ha": "center"},
            width=0.75,
        ),
        ColumnDefinition(
            name="SPI",
            group="Team Rating",
            textprops={"ha": "center"},
            width=0.75,
        ),
        ColumnDefinition(
            name="OFF",
            width=0.75,
            textprops={
                "ha": "center",
                # 設定填充方式
                "bbox": {"boxstyle": "circle", "pad": 0.35},
            },
            cmap=normed_cmap(df["OFF"], cmap=matplotlib.cm.PiYG, num_stds=2.5),
            group="Team Rating",
        ),
        ColumnDefinition(
            name="DEF",
            width=0.75,
            textprops={
                "ha": "center",
                "bbox": {"boxstyle": "circle", "pad": 0.35},
            },
            cmap=normed_cmap(df["DEF"], cmap=matplotlib.cm.PiYG_r, num_stds=2.5),
            group="Team Rating",
        ),
    ]
    # 繪製第二部分效果
    + [
        ColumnDefinition(
            name=group_stage_cols[0],
            title=group_stage_cols[0].replace(" ", "\n", 1),
            formatter=decimal_to_percent,
            group="Group Stage Chances",
            # 設定邊框
            border="left",
        )
    ]
    + [
        ColumnDefinition(
            name=col,
            title=col.replace(" ", "\n", 1),
            formatter=decimal_to_percent,
            group="Group Stage Chances",
        )
        for col in group_stage_cols[1:]
    ]
    # 繪製第三部分效果
    + [
        ColumnDefinition(
            name=knockout_stage_cols[0],
            title=knockout_stage_cols[0].replace(" ", "\n", 1),
            formatter=decimal_to_percent,
            cmap=cmap,
            group="Knockout Stage Chances",
            border="left",
        )
    ]
    + [
        ColumnDefinition(
            name=col,
            title=col.replace(" ", "\n", 1),
            formatter=decimal_to_percent,
            cmap=cmap,
            group="Knockout Stage Chances",
        )
        for col in knockout_stage_cols[1:]
    ]
)
# 繪圖
fig, ax = plt.subplots(figsize=(18, 18))

table = Table(
    df,
    column_definitions=col_defs,
    row_dividers=True,
    footer_divider=True,
    ax=ax,
    textprops={"fontsize": 14},
    row_divider_kw={"linewidth": 1, "linestyle": (0, (1, 5))},
    col_label_divider_kw={"linewidth": 1, "linestyle": "-"},
    column_border_kw={"linewidth": 1, "linestyle": "-"},
).autoset_fontcolors(colnames=["OFF", "DEF"])


plt.show()
# 儲存圖片
fig.savefig("wwc_table.jpg", facecolor=ax.get_facecolor(), dpi=300,bbox_inches='tight')

2.5 德甲積分排名榜展示

step1 準備資料

from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from plottable import ColDef, Table
from plottable.plots import image
# 下載聯賽資料
# !wget https://projects.fivethirtyeight.com/soccer-api/club/spi_matches.csv
# !wget https://projects.fivethirtyeight.com/soccer-api/club/spi_matches_latest.csv
# 資料地址
FIVETHIRTYEIGHT_URLS = {
    "SPI_MATCHES": "https://projects.fivethirtyeight.com/soccer-api/club/spi_matches.csv",
    "SPI_MATCHES_LATEST": "https://projects.fivethirtyeight.com/soccer-api/club/spi_matches_latest.csv",
}

# 讀取資料
# df = pd.read_csv(FIVETHIRTYEIGHT_URLS["SPI_MATCHES_LATEST"])
df = pd.read_csv("data/spi_matches_latest.csv")
df.head()
season date league_id league team1 team2 spi1 spi2 prob1 prob2 ... importance1 importance2 score1 score2 xg1 xg2 nsxg1 nsxg2 adj_score1 adj_score2
0 2019 2019-03-01 1979 Chinese Super League Shandong Luneng Guizhou Renhe 48.22 37.83 0.5755 0.1740 ... 45.9 22.1 1.0 0.0 1.39 0.26 2.05 0.54 1.05 0.00
1 2019 2019-03-01 1979 Chinese Super League Shanghai Greenland Shanghai SIPG 39.81 60.08 0.2387 0.5203 ... 25.6 63.4 0.0 4.0 0.57 2.76 0.80 1.50 0.00 3.26
2 2019 2019-03-01 1979 Chinese Super League Guangzhou Evergrande Tianjin Quanujian 65.59 39.99 0.7832 0.0673 ... 77.1 28.8 3.0 0.0 0.49 0.45 1.05 0.75 3.15 0.00
3 2019 2019-03-01 1979 Chinese Super League Wuhan Zall Beijing Guoan 32.25 54.82 0.2276 0.5226 ... 35.8 58.9 0.0 1.0 1.12 0.97 1.51 0.94 0.00 1.05
4 2019 2019-03-01 1979 Chinese Super League Chongqing Lifan Guangzhou RF 38.24 40.45 0.4403 0.2932 ... 26.2 21.3 2.0 2.0 2.77 3.17 1.05 2.08 2.10 2.10

5 rows × 23 columns

# 篩選德甲聯賽資料,並刪除為空資料
bl = df.loc[df.league == "German Bundesliga"].dropna()
bl.head()
season date league_id league team1 team2 spi1 spi2 prob1 prob2 ... importance1 importance2 score1 score2 xg1 xg2 nsxg1 nsxg2 adj_score1 adj_score2
497 2022 2022-08-05 1845 German Bundesliga Eintracht Frankfurt Bayern Munich 68.47 91.75 0.1350 0.6796 ... 32.6 71.9 1.0 6.0 0.83 4.50 0.65 2.72 1.05 5.96
514 2022 2022-08-06 1845 German Bundesliga VfL Bochum Mainz 60.73 68.88 0.3568 0.3629 ... 33.5 34.5 1.0 2.0 1.00 1.62 0.96 0.86 1.05 2.10
515 2022 2022-08-06 1845 German Bundesliga Borussia Monchengladbach TSG Hoffenheim 69.38 66.77 0.4872 0.2742 ... 40.2 33.3 3.0 1.0 1.86 0.10 2.51 0.31 2.36 1.05
516 2022 2022-08-06 1845 German Bundesliga VfL Wolfsburg Werder Bremen 68.18 59.82 0.5319 0.2014 ... 30.2 33.3 2.0 2.0 0.81 0.97 1.07 1.25 2.10 2.10
517 2022 2022-08-06 1845 German Bundesliga 1. FC Union Berlin Hertha Berlin 69.98 59.70 0.5479 0.1860 ... 34.9 33.0 3.0 1.0 1.25 0.40 1.66 0.36 3.15 1.05

5 rows × 23 columns

step2 資料處理

# 統計得分
def add_points(df: pd.DataFrame) -> pd.DataFrame:
    # 三元表示式
    # df["score1"] > df["score2"],則返回3
    # np.where(df["score1"] == df["score2"],則返回1
    # 否則為0
    df["pts_home"] = np.where(
        df["score1"] > df["score2"], 3, np.where(df["score1"] == df["score2"], 1, 0)
    )
    df["pts_away"] = np.where(
        df["score1"] < df["score2"], 3, np.where(df["score1"] == df["score2"], 1, 0)
    )
    
    return df

# 統計得分資料
bl = add_points(bl)
# 總得分、總進球數、總助攻數和總黃牌數

# 以下程式碼先分別統計team1和team2的得分資料,然後將兩組資料相加
perform = (
    bl.groupby("team1")[[
        "pts_home",
        "score1",
        "score2",
        "xg1",
        "xg2",
    ]]
    .sum()
    .set_axis(
        [
            "pts",
            "gf",
            "ga",
            "xgf",
            "xga",
        ],
        axis=1,
    )
    .add(
        bl.groupby("team2")[[
            "pts_away",
            "score2",
            "score1",
            "xg2",
            "xg1",
        ]]
        .sum()
        .set_axis(
            [
                "pts",
                "gf",
                "ga",
                "xgf",
                "xga",
            ],
            axis=1,
        )
    )
)

# 由於python和pandas版本問題,如果上面的程式碼出問題,則使用下面程式碼
# t1= bl.groupby("team1")[["pts_home","score1","score2","xg1","xg2", ]]
# t1 = t1.sum()
# t1.set_axis( ["pts","gf","ga","xgf","xga", ], axis=1,)
# t2 = bl.groupby("team1")[["pts_home","score1","score2","xg1","xg2", ]]
# t2 = t2.sum()
# t2.set_axis( ["pts","gf","ga","xgf","xga", ], axis=1,)
# perform = (t1.add(t2))

perform.shape
(18, 5)
# 彙總得分表現資料
perform.index.name = "team"

perform["gd"] = perform["gf"] - perform["ga"]

perform = perform[
    [
        "pts",
        "gd",
        "gf",
        "ga",
        "xgf",
        "xga",
    ]
]

perform["games"] = bl.groupby("team1").size().add(bl.groupby("team2").size())
perform.head()
pts gd gf ga xgf xga games
team
1. FC Union Berlin 62 13.0 51.0 38.0 35.93 43.06 34
Bayer Leverkusen 50 8.0 57.0 49.0 53.62 48.20 34
Bayern Munich 71 54.0 92.0 38.0 84.93 40.12 34
Borussia Dortmund 71 39.0 83.0 44.0 75.96 42.69 34
Borussia Monchengladbach 43 -3.0 52.0 55.0 53.05 58.88 34
# 統計各隊的勝負資料
def get_wins_draws_losses(games: pd.DataFrame) -> pd.DataFrame:
    return (
        games.rename({"pts_home": "pts", "team1": "team"}, axis=1)
        .groupby("team")["pts"]
        .value_counts()
        .add(
            games.rename({"pts_away": "pts", "team2": "team"}, axis=1)
            .groupby("team")["pts"]
            .value_counts(),
            fill_value=0,
        )
        .astype(int)
        .rename("count")
        .reset_index(level=1)
        .pivot(columns="pts", values="count")
        .rename({0: "L", 1: "D", 3: "W"}, axis=1)[["W", "D", "L"]]
    )

wins_draws_losses = get_wins_draws_losses(bl)
wins_draws_losses.head()
pts W D L
team
1. FC Union Berlin 18 8 8
Bayer Leverkusen 14 8 12
Bayern Munich 21 8 5
Borussia Dortmund 22 5 7
Borussia Monchengladbach 11 10 13
# 合併得分和勝負資料
perform = pd.concat([perform, wins_draws_losses], axis=1)
perform.head()
pts gd gf ga xgf xga games W D L
team
1. FC Union Berlin 62 13.0 51.0 38.0 35.93 43.06 34 18 8 8
Bayer Leverkusen 50 8.0 57.0 49.0 53.62 48.20 34 14 8 12
Bayern Munich 71 54.0 92.0 38.0 84.93 40.12 34 21 8 5
Borussia Dortmund 71 39.0 83.0 44.0 75.96 42.69 34 22 5 7
Borussia Monchengladbach 43 -3.0 52.0 55.0 53.05 58.88 34 11 10 13

step3 對映隊標圖片

隊標圖片地址為:plottable-example_notebooks

# 建立隊名和隊標的索引資料
club_logo_path = Path("data/bundesliga_crests_22_23")
club_logo_files = list(club_logo_path.glob("*.png"))
club_logos_paths = {f.stem: f for f in club_logo_files}
perform = perform.reset_index()

# 新增新列
perform.insert(0, "crest", perform["team"])
perform["crest"] = perform["crest"].replace(club_logos_paths)

# 資料排序
perform = perform.sort_values(by=["pts", "gd", "gf"], ascending=False)

for colname in ["gd", "gf", "ga"]:
    perform[colname] = perform[colname].astype("int32")

perform["goal_difference"] = perform["gf"].astype(str) + ":" + perform["ga"].astype(str)

# 新增排名
perform["rank"] = list(range(1, 19))

perform.head()
crest team pts gd gf ga xgf xga games W D L goal_difference rank
2 data/bundesliga_crests_22_23/Bayern Munich.png Bayern Munich 71 54 92 38 84.93 40.12 34 21 8 5 92:38 1
3 data/bundesliga_crests_22_23/Borussia Dortmund... Borussia Dortmund 71 39 83 44 75.96 42.69 34 22 5 7 83:44 2
10 data/bundesliga_crests_22_23/RB Leipzig.png RB Leipzig 66 23 64 41 67.01 37.48 34 20 6 8 64:41 3
0 data/bundesliga_crests_22_23/1. FC Union Berli... 1. FC Union Berlin 62 13 51 38 35.93 43.06 34 18 8 8 51:38 4
11 data/bundesliga_crests_22_23/SC Freiburg.png SC Freiburg 59 7 51 44 53.11 52.25 34 17 8 9 51:44 5

step4 設定繪圖資料

# 設定顏色
row_colors = {
    "top4": "#2d3636",
    "top6": "#516362",
    "playoffs": "#8d9386",
    "relegation": "#c8ab8d",
    "even": "#627979",
    "odd": "#68817e",
}

bg_color = row_colors["odd"]
text_color = "#e0e8df"
# 確定繪圖列
table_cols = ["crest", "team", "games", "W", "D", "L", "goal_difference", "gd", "pts"]
perform[table_cols].head()
crest team games W D L goal_difference gd pts
2 data/bundesliga_crests_22_23/Bayern Munich.png Bayern Munich 34 21 8 5 92:38 54 71
3 data/bundesliga_crests_22_23/Borussia Dortmund... Borussia Dortmund 34 22 5 7 83:44 39 71
10 data/bundesliga_crests_22_23/RB Leipzig.png RB Leipzig 34 20 6 8 64:41 23 66
0 data/bundesliga_crests_22_23/1. FC Union Berli... 1. FC Union Berlin 34 18 8 8 51:38 13 62
11 data/bundesliga_crests_22_23/SC Freiburg.png SC Freiburg 34 17 8 9 51:44 7 59
# 定義各列繪圖效果
table_col_defs = [
    ColDef("rank", width=0.5, title=""),
    ColDef("crest", width=0.35, plot_fn=image, title=""),
    ColDef("team", width=2.5, title="", textprops={"ha": "left"}),
    ColDef("games", width=0.5, title="Games"),
    ColDef("W", width=0.5),
    ColDef("D", width=0.5),
    ColDef("L", width=0.5),
    ColDef("goal_difference", title="Goals"),
    ColDef("gd", width=0.5, title="", formatter="{:+}"),
    ColDef("pts", border="left", title="Points"),
]

step5 繪圖

fig, ax = plt.subplots(figsize=(14, 12))

plt.rcParams["text.color"] = text_color
# 設定繪圖字型
# plt.rcParams["font.family"] = "Roboto"

# 設定背景顏色
fig.set_facecolor(bg_color)
ax.set_facecolor(bg_color)

table = Table(
    perform,
    column_definitions=table_col_defs,
    row_dividers=True,
    col_label_divider=False,
    footer_divider=True,
    index_col="rank",
    columns=table_cols,
    even_row_color=row_colors["even"],
    footer_divider_kw={"color": bg_color, "lw": 2},
    row_divider_kw={"color": bg_color, "lw": 2},
    column_border_kw={"color": bg_color, "lw": 2},
    # 如果設定字型需要新增"fontname": "Roboto"
    textprops={"fontsize": 16, "ha": "center"},
)


# 設定不同行的顏色
for idx in [0, 1, 2, 3]:
    table.rows[idx].set_facecolor(row_colors["top4"])
    
for idx in [4, 5]:
    table.rows[idx].set_facecolor(row_colors["top6"])
    
table.rows[15].set_facecolor(row_colors["playoffs"])

for idx in [16, 17]:
    table.rows[idx].set_facecolor(row_colors["relegation"])
    table.rows[idx].set_fontcolor(row_colors["top4"])


fig.savefig(
    "bohndesliga_table_recreation.png",
    facecolor=fig.get_facecolor(),
    bbox_inches='tight',
    dpi=300,
)

3 參考