人工智慧AI技術早已深入到人們生活的每一個角落,君不見AI孫燕姿的歌聲此起彼伏,不絕於耳,但並不是每個人都擁有一塊N卡,沒有GPU的日子總是不好過的,但是沒關係,山人有妙計,本次我們基於Google的Colab免費雲端伺服器來搭建深度學習環境,製作AI特朗普,讓他高唱《國際歌》。
Colab(全名Colaboratory ),它是Google公司的一款基於雲端的基礎免費伺服器產品,可以在B端,也就是瀏覽器裡面編寫和執行Python程式碼,非常方便,貼心的是,Colab可以給使用者分配免費的GPU進行使用,對於沒有N卡的朋友來說,這已經遠遠超出了業界良心的範疇,簡直就是在做慈善事業。
Colab是基於Google雲盤的產品,我們可以將深度學習的Python指令碼、訓練好的模型、以及訓練集等資料直接存放在雲盤中,然後通過Colab執行即可。
首先存取Google雲盤:drive.google.com
隨後點選新建,選擇關聯更多應用:
接著安裝Colab即可:
至此,雲盤和Colab就關聯好了,現在我們可以新建一個指令碼檔案my_sovits.ipynb檔案,鍵入程式碼:
hello colab
隨後,按快捷鍵 ctrl + 回車,即可執行程式碼:
這裡需要注意的是,Colab使用的是基於Jupyter Notebook的ipynb格式的Python程式碼。
Jupyter Notebook是以網頁的形式開啟,可以在網頁頁面中直接編寫程式碼和執行程式碼,程式碼的執行結果也會直接在程式碼塊下顯示。如在程式設計過程中需要編寫說明檔案,可在同一個頁面中直接編寫,便於作及時的說明和解釋。
隨後設定一下顯示卡型別:
接著執行命令,檢視GPU版本:
!/usr/local/cuda/bin/nvcc --version
!nvidia-smi
程式返回:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Tue May 16 04:49:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 65C P8 13W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
這裡建議選擇Tesla T4的顯示卡型別,效能更突出。
至此Colab就設定好了。
下面我們設定so-vits環境,可以通過pip命令安裝一些基礎依賴:
!pip install pyworld==0.3.2
!pip install numpy==1.23.5
注意jupyter語言是通過歎號來執行命令。
注意,由於不是本地環境,有的時候colab會提醒:
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting numpy==1.23.5
Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 80.1 MB/s eta 0:00:00
Installing collected packages: numpy
Attempting uninstall: numpy
Found existing installation: numpy 1.22.4
Uninstalling numpy-1.22.4:
Successfully uninstalled numpy-1.22.4
Successfully installed numpy-1.23.5
WARNING: The following packages were previously imported in this runtime:
[numpy]
You must restart the runtime in order to use newly installed versions.
此時numpy庫需要重啟runtime才可以匯入操作。
重啟runtime後,需要再重新安裝一次,直到系統提示依賴已經存在:
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: numpy==1.23.5 in /usr/local/lib/python3.10/dist-packages (1.23.5)
隨後,克隆so-vits專案,並且安裝專案的依賴:
import os
import glob
!git clone https://github.com/effusiveperiscope/so-vits-svc -b eff-4.0
os.chdir('/content/so-vits-svc')
# install requirements one-at-a-time to ignore exceptions
!cat requirements.txt | xargs -n 1 pip install --extra-index-url https://download.pytorch.org/whl/cu117
!pip install praat-parselmouth
!pip install ipywidgets
!pip install huggingface_hub
!pip install pip==23.0.1 # fix pip version for fairseq install
!pip install fairseq==0.12.2
!jupyter nbextension enable --py widgetsnbextension
existing_files = glob.glob('/content/**/*.*', recursive=True)
!pip install --upgrade protobuf==3.9.2
!pip uninstall -y tensorflow
!pip install tensorflow==2.11.0
安裝好依賴之後,定義一些前置工具方法:
os.chdir('/content/so-vits-svc') # force working-directory to so-vits-svc - this line is just for safety and is probably not required
import tarfile
import os
from zipfile import ZipFile
# taken from https://github.com/CookiePPP/cookietts/blob/master/CookieTTS/utils/dataset/extract_unknown.py
def extract(path):
if path.endswith(".zip"):
with ZipFile(path, 'r') as zipObj:
zipObj.extractall(os.path.split(path)[0])
elif path.endswith(".tar.bz2"):
tar = tarfile.open(path, "r:bz2")
tar.extractall(os.path.split(path)[0])
tar.close()
elif path.endswith(".tar.gz"):
tar = tarfile.open(path, "r:gz")
tar.extractall(os.path.split(path)[0])
tar.close()
elif path.endswith(".tar"):
tar = tarfile.open(path, "r:")
tar.extractall(os.path.split(path)[0])
tar.close()
elif path.endswith(".7z"):
import py7zr
archive = py7zr.SevenZipFile(path, mode='r')
archive.extractall(path=os.path.split(path)[0])
archive.close()
else:
raise NotImplementedError(f"{path} extension not implemented.")
# taken from https://github.com/CookiePPP/cookietts/tree/master/CookieTTS/_0_download/scripts
# megatools download urls
win64_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win64.zip"
win32_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win32.zip"
linux_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-linux-x86_64.tar.gz"
# download megatools
from sys import platform
import os
import urllib.request
import subprocess
from time import sleep
if platform == "linux" or platform == "linux2":
dl_url = linux_url
elif platform == "darwin":
raise NotImplementedError('MacOS not supported.')
elif platform == "win32":
dl_url = win64_url
else:
raise NotImplementedError ('Unknown Operating System.')
dlname = dl_url.split("/")[-1]
if dlname.endswith(".zip"):
binary_folder = dlname[:-4] # remove .zip
elif dlname.endswith(".tar.gz"):
binary_folder = dlname[:-7] # remove .tar.gz
else:
raise NameError('downloaded megatools has unknown archive file extension!')
if not os.path.exists(binary_folder):
print('"megatools" not found. Downloading...')
if not os.path.exists(dlname):
urllib.request.urlretrieve(dl_url, dlname)
assert os.path.exists(dlname), 'failed to download.'
extract(dlname)
sleep(0.10)
os.unlink(dlname)
print("Done!")
binary_folder = os.path.abspath(binary_folder)
def megadown(download_link, filename='.', verbose=False):
"""Use megatools binary executable to download files and folders from MEGA.nz ."""
filename = ' --path "'+os.path.abspath(filename)+'"' if filename else ""
wd_old = os.getcwd()
os.chdir(binary_folder)
try:
if platform == "linux" or platform == "linux2":
subprocess.call(f'./megatools dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)
elif platform == "win32":
subprocess.call(f'megatools.exe dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)
except:
os.chdir(wd_old) # don't let user stop download without going back to correct directory first
raise
os.chdir(wd_old)
return filename
import urllib.request
from tqdm import tqdm
import gdown
from os.path import exists
def request_url_with_progress_bar(url, filename):
class DownloadProgressBar(tqdm):
def update_to(self, b=1, bsize=1, tsize=None):
if tsize is not None:
self.total = tsize
self.update(b * bsize - self.n)
def download_url(url, filename):
with DownloadProgressBar(unit='B', unit_scale=True,
miniters=1, desc=url.split('/')[-1]) as t:
filename, headers = urllib.request.urlretrieve(url, filename=filename, reporthook=t.update_to)
print("Downloaded to "+filename)
download_url(url, filename)
def download(urls, dataset='', filenames=None, force_dl=False, username='', password='', auth_needed=False):
assert filenames is None or len(urls) == len(filenames), f"number of urls does not match filenames. Expected {len(filenames)} urls, containing the files listed below.\n{filenames}"
assert not auth_needed or (len(username) and len(password)), f"username and password needed for {dataset} Dataset"
if filenames is None:
filenames = [None,]*len(urls)
for i, (url, filename) in enumerate(zip(urls, filenames)):
print(f"Downloading File from {url}")
#if filename is None:
# filename = url.split("/")[-1]
if filename and (not force_dl) and exists(filename):
print(f"{filename} Already Exists, Skipping.")
continue
if 'drive.google.com' in url:
assert 'https://drive.google.com/uc?id=' in url, 'Google Drive links should follow the format "https://drive.google.com/uc?id=1eQAnaoDBGQZldPVk-nzgYzRbcPSmnpv6".\nWhere id=XXXXXXXXXXXXXXXXX is the Google Drive Share ID.'
gdown.download(url, filename, quiet=False)
elif 'mega.nz' in url:
megadown(url, filename)
else:
#urllib.request.urlretrieve(url, filename=filename) # no progress bar
request_url_with_progress_bar(url, filename) # with progress bar
import huggingface_hub
import os
import shutil
class HFModels:
def __init__(self, repo = "therealvul/so-vits-svc-4.0",
model_dir = "hf_vul_models"):
self.model_repo = huggingface_hub.Repository(local_dir=model_dir,
clone_from=repo, skip_lfs_files=True)
self.repo = repo
self.model_dir = model_dir
self.model_folders = os.listdir(model_dir)
self.model_folders.remove('.git')
self.model_folders.remove('.gitattributes')
def list_models(self):
return self.model_folders
# Downloads model;
# copies config to target_dir and moves model to target_dir
def download_model(self, model_name, target_dir):
if not model_name in self.model_folders:
raise Exception(model_name + " not found")
model_dir = self.model_dir
charpath = os.path.join(model_dir,model_name)
gen_pt = next(x for x in os.listdir(charpath) if x.startswith("G_"))
cfg = next(x for x in os.listdir(charpath) if x.endswith("json"))
try:
clust = next(x for x in os.listdir(charpath) if x.endswith("pt"))
except StopIteration as e:
print("Note - no cluster model for "+model_name)
clust = None
if not os.path.exists(target_dir):
os.makedirs(target_dir, exist_ok=True)
gen_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,
filename = model_name + "/" + gen_pt) # this is a symlink
if clust is not None:
clust_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,
filename = model_name + "/" + clust) # this is a symlink
shutil.move(os.path.realpath(clust_dir), os.path.join(target_dir, clust))
clust_out = os.path.join(target_dir, clust)
else:
clust_out = None
shutil.copy(os.path.join(charpath,cfg),os.path.join(target_dir, cfg))
shutil.move(os.path.realpath(gen_dir), os.path.join(target_dir, gen_pt))
return {"config_path": os.path.join(target_dir,cfg),
"generator_path": os.path.join(target_dir,gen_pt),
"cluster_path": clust_out}
# Example usage
# vul_models = HFModels()
# print(vul_models.list_models())
# print("Applejack (singing)" in vul_models.list_models())
# vul_models.download_model("Applejack (singing)","models/Applejack (singing)")
print("Finished!")
這些方法可以幫助我們下載、解壓和載入模型。
接著將特朗普的音色模型和組態檔進行下載,下載地址是:
https://huggingface.co/Nardicality/so-vits-svc-4.0-models/tree/main/Trump18.5k
隨後模型檔案放到專案的models資料夾,組態檔則放入config資料夾。
接著將需要轉換的歌曲上傳到和專案平行的目錄中。
執行程式碼:
import os
import glob
import json
import copy
import logging
import io
from ipywidgets import widgets
from pathlib import Path
from IPython.display import Audio, display
os.chdir('/content/so-vits-svc')
import torch
from inference import infer_tool
from inference import slicer
from inference.infer_tool import Svc
import soundfile
import numpy as np
MODELS_DIR = "models"
def get_speakers():
speakers = []
for _,dirs,_ in os.walk(MODELS_DIR):
for folder in dirs:
cur_speaker = {}
# Look for G_****.pth
g = glob.glob(os.path.join(MODELS_DIR,folder,'G_*.pth'))
if not len(g):
print("Skipping "+folder+", no G_*.pth")
continue
cur_speaker["model_path"] = g[0]
cur_speaker["model_folder"] = folder
# Look for *.pt (clustering model)
clst = glob.glob(os.path.join(MODELS_DIR,folder,'*.pt'))
if not len(clst):
print("Note: No clustering model found for "+folder)
cur_speaker["cluster_path"] = ""
else:
cur_speaker["cluster_path"] = clst[0]
# Look for config.json
cfg = glob.glob(os.path.join(MODELS_DIR,folder,'*.json'))
if not len(cfg):
print("Skipping "+folder+", no config json")
continue
cur_speaker["cfg_path"] = cfg[0]
with open(cur_speaker["cfg_path"]) as f:
try:
cfg_json = json.loads(f.read())
except Exception as e:
print("Malformed config json in "+folder)
for name, i in cfg_json["spk"].items():
cur_speaker["name"] = name
cur_speaker["id"] = i
if not name.startswith('.'):
speakers.append(copy.copy(cur_speaker))
return sorted(speakers, key=lambda x:x["name"].lower())
logging.getLogger('numba').setLevel(logging.WARNING)
chunks_dict = infer_tool.read_temp("inference/chunks_temp.json")
existing_files = []
slice_db = -40
wav_format = 'wav'
class InferenceGui():
def __init__(self):
self.speakers = get_speakers()
self.speaker_list = [x["name"] for x in self.speakers]
self.speaker_box = widgets.Dropdown(
options = self.speaker_list
)
display(self.speaker_box)
def convert_cb(btn):
self.convert()
def clean_cb(btn):
self.clean()
self.convert_btn = widgets.Button(description="Convert")
self.convert_btn.on_click(convert_cb)
self.clean_btn = widgets.Button(description="Delete all audio files")
self.clean_btn.on_click(clean_cb)
self.trans_tx = widgets.IntText(value=0, description='Transpose')
self.cluster_ratio_tx = widgets.FloatText(value=0.0,
description='Clustering Ratio')
self.noise_scale_tx = widgets.FloatText(value=0.4,
description='Noise Scale')
self.auto_pitch_ck = widgets.Checkbox(value=False, description=
'Auto pitch f0 (do not use for singing)')
display(self.trans_tx)
display(self.cluster_ratio_tx)
display(self.noise_scale_tx)
display(self.auto_pitch_ck)
display(self.convert_btn)
display(self.clean_btn)
def convert(self):
trans = int(self.trans_tx.value)
speaker = next(x for x in self.speakers if x["name"] ==
self.speaker_box.value)
spkpth2 = os.path.join(os.getcwd(),speaker["model_path"])
print(spkpth2)
print(os.path.exists(spkpth2))
svc_model = Svc(speaker["model_path"], speaker["cfg_path"],
cluster_model_path=speaker["cluster_path"])
input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)
if f not in existing_files and
any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]
for name in input_filepaths:
print("Converting "+os.path.split(name)[-1])
infer_tool.format_wav(name)
wav_path = str(Path(name).with_suffix('.wav'))
wav_name = Path(name).stem
chunks = slicer.cut(wav_path, db_thresh=slice_db)
audio_data, audio_sr = slicer.chunks2audio(wav_path, chunks)
audio = []
for (slice_tag, data) in audio_data:
print(f'#=====segment start, '
f'{round(len(data)/audio_sr, 3)}s======')
length = int(np.ceil(len(data) / audio_sr *
svc_model.target_sample))
if slice_tag:
print('jump empty segment')
_audio = np.zeros(length)
else:
# Padding "fix" for noise
pad_len = int(audio_sr * 0.5)
data = np.concatenate([np.zeros([pad_len]),
data, np.zeros([pad_len])])
raw_path = io.BytesIO()
soundfile.write(raw_path, data, audio_sr, format="wav")
raw_path.seek(0)
_cluster_ratio = 0.0
if speaker["cluster_path"] != "":
_cluster_ratio = float(self.cluster_ratio_tx.value)
out_audio, out_sr = svc_model.infer(
speaker["name"], trans, raw_path,
cluster_infer_ratio = _cluster_ratio,
auto_predict_f0 = bool(self.auto_pitch_ck.value),
noice_scale = float(self.noise_scale_tx.value))
_audio = out_audio.cpu().numpy()
pad_len = int(svc_model.target_sample * 0.5)
_audio = _audio[pad_len:-pad_len]
audio.extend(list(infer_tool.pad_array(_audio, length)))
res_path = os.path.join('/content/',
f'{wav_name}_{trans}_key_'
f'{speaker["name"]}.{wav_format}')
soundfile.write(res_path, audio, svc_model.target_sample,
format=wav_format)
display(Audio(res_path, autoplay=True)) # display audio file
pass
def clean(self):
input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)
if f not in existing_files and
any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]
for f in input_filepaths:
os.remove(f)
inference_gui = InferenceGui()
此時系統會自動在根目錄,也就是content下尋找音樂檔案,包含但不限於wav、flac、mp3等等,隨後根據下載的模型進行推理,推理之前會自動對檔案進行背景音分離以及降噪和切片等操作。
推理結束之後,會自動播放轉換後的歌曲。
如果是剛開始使用Colab,預設分配的視訊記憶體是15G左右,完全可以勝任大多數訓練和推理任務,但是如果經常用它掛機運算,能分配到的顯示卡設定就會漸進式地降低,如果需要長時間並且相對穩定的GPU資源,還是需要付費訂閱Colab pro服務,另外Google雲盤的免費使用空間也是15G,如果模型下多了,導致雲盤空間不足,執行程式碼也會報錯,所以最好定期清理Google雲盤,以此保證深度學習任務的正常執行。