[.NET6]使用ML.NET+ONNX預訓練模型整活B站經典《華強買瓜》

最近在看微軟開源的機器學習框架ML.NET使用別人的預訓練模型(開放神經網路交換格式.onnx)來識別影象，然後逛github發現一個好玩的repo。決定整活一期部落格。

首先還是稍微科普一下機器學習相關的知識，這一塊.NET雖然很早就開源了ML.NET框架，甚至在官方的ML.NET開源之前，就有一些三方社群的開源實現比如早期的AForge.NET實現。以及後來的基於python著名的神經網路框架tensorflow遷移的tensorflow.net亦或者是pytorch遷移的torchsharp來實現C#版本的深度學習，但是畢竟C#確實天生並不適合用來搞機器學習/深度學習，AI這一塊也一直都是python的基本盤。但是不適合並不代表沒有方案，現在AI逐漸普及的今天，我們普通的開發者依然可以使用一些別人訓練好的模型來做一些應用落地。

今天我們會用到一些訓練好的模型來實現我們的目的，需要準備以下環境和工具：

　　1、安裝有.NET5或者6的windows開發環境

　　2、netron 用於解析模型的引數，下載地址：https://github.com/lutzroeder/netron/releases/tag/v5.8.9

　　3、ffmpeg 用於視訊處理下載地址：https://ffmpeg.org/download.html

　　4、onnx預訓練udnie、super-resolution

　　　　udnie模型下載地址：https://github.com/onnx/models/blob/main/vision/style_transfer/fast_neural_style/model/udnie-9.onnx

　　　　super-resolution模型下載地址：https://github.com/onnx/models/blob/main/vision/super_resolution/sub_pixel_cnn_2016/model/super-resolution-10.tar.gz （需要解壓提取內部的onnx檔案）

　　操作流程如下：

　　1、首先我們將目標視訊（我這裡就用B站經典短視訊《華強買瓜》為例）通過ffmpeg轉換成普通的一幀一幀的圖片

　　2、通過ML.NET載入【神經風格轉換預訓練模型】將每一幀原圖遷移到新的風格（藝術風格:udnie，抽象主義）。

　　3、由於2只能將圖片遷移到固定的240*240格式，所以我們還需要通過ML.NET載入【超解析度預訓練模型】將每一幀圖片進行超解析度放大得到一張672*672的圖片

　　4、通過ffmpeg將新的圖片合併成新的視訊

　　首先先看看成品（這裡我轉換成gif方便演示）：

　　接著我們開啟VS建立好的專案，把我們的兩個onnx模型引入進去。接著編寫如下程式碼：

　　首先定義一個session用於加下onnx模型

static InferenceSession styleTransferSession = new InferenceSession("model/udnie-9.onnx");

　　接著我們建立一個方法呼叫這個模型

public static Bitmap ProcessStyleTransfer(Bitmap originBmp)
        {
            //根據netron得到的input，我們在這裡構建對應的輸入張量
            var input = new DenseTensor<float>(new[] { 1, 3, 224, 224 });
            //將bitmap轉換成input
            Tool.BitmapToTensor(originBmp, 224, 224, ref input, true);
            //接著呼叫模型得到遷移後的張量output
            using var results = styleTransferSession.Run(new[] { NamedOnnxValue.CreateFromTensor("input1", input) });
            if (results.FirstOrDefault()?.Value is not Tensor<float> output)
                throw new ApplicationException("無法處理圖片");
            //由於模型輸出的是3*224*224的張量，所以這裡只能構建出224*224的圖片
            return Tool.TensorToBitmap(output, 224, 224);
        }

　　其實到這一步神經風格遷移就完成了，最後的bitmap就是遷移後的新圖片，我們只需要呼叫bitmap.save即可儲存到磁碟上

　　接著我們建立超解析度模型的方法來，其實同上面的呼叫非常類似的程式碼

　　這裡唯一需要注意的是超解析度提取並非採用RGB直接放大，而是用了YCbCr來放大，所以這裡需要有一個轉換，原文在這裡：https://github.com/onnx/models/tree/main/vision/super_resolution/sub_pixel_cnn_2016

static InferenceSession superResolutionSession = new InferenceSession("model/super_resolution.onnx");

        public static Bitmap ProcessSuperResolution(Bitmap originBmp)
        {
            //根據netron得到的input，我們在這裡構建對應的輸入張量,由於該模型並非採用RGB而是YCbCr，所以中間會做一些轉換，不過整體流程和上一個類似
            var input = new DenseTensor<float>(new[] { 1, 1, 224, 224 });
            //將bitmap轉換成input
            Tool.BitmapToTensor(originBmp, 224, 224, ref input, true);
            //由於模型處理Y值，剩下的Cb和Cr需要我們單獨呼叫System.Drawing.Common雙三次插值演演算法放大得到對應的Cb和Cr值
            var inputCbCr = new DenseTensor<float>(new[] { 1, 672, 672 });
            inputCbCr = Tool.ResizeGetCbCr(originBmp, 672, 672);
            //接著呼叫模型得到超分重建後的張量output
            using var results = superResolutionSession.Run(new[] { NamedOnnxValue.CreateFromTensor("input", input) });
            if (results.FirstOrDefault()?.Value is not Tensor<float> output)
                throw new ApplicationException("無法處理圖片");
            //建立一個新的bitmap用於填充遷移後的畫素,這裡需要通過Y+CbCr轉換為RGB填充
            return Tool.TensorToBitmap(output, 224, 224,false, inputCbCr);
        }

　　其實基本上到這兩步，我們的整個核心程式碼就完成了。剩餘的部分只是一些圖片處理的程式碼。接著我們要做的就是在Program.cs呼叫它得到遷移後的圖片

    Directory.CreateDirectory("new img path");
    foreach (var path in Directory.GetFiles("old img path"))
    {
        //由於ffmpeg拆幀後的圖片就是按照影格率從1開始排序好的圖片，所以我們只需要將上一層的資料夾名字修改一下即可得到要替換的新檔案路徑 like: D://img/1.jpeg -> D://newimg/1.jpeg
        var newpath = path.Replace("old img path", "new img path");
        using var originBitmap = new Bitmap(Image.FromFile(path));
        using var transferBitmap = OnnxModelManager.ProcessStyleTransfer(originBitmap);
        using var reSizeBitmap = OnnxModelManager.ProcessSuperResolution(transferBitmap);
        reSizeBitmap.Save(newpath);
    }

　　接著F5 run,然後靜待，一般要轉換20分鐘左右(cpu i5)基本就轉換完成了。最後我們只需要再使用工具合成新的視訊(或者gif)

./ffmpeg -f image2 -i newimg/%d.jpeg -i 1.aac -map 0:0 -map 1:a -r 25 -shortest output.mp4

　　整體程式碼基本就完成了，下面是Tool相關圖片轉換的程式碼參考：

  1 internal class Tool
  2     {
  3         /// <summary>
  4         /// 將bitmap轉換為tensor
  5         /// </summary>
  6         /// <param name="bitmap"></param>
  7         /// <returns></returns>
  8         public static void BitmapToTensor(Bitmap originBmp, int resizeWidth, int resizeHeight, ref DenseTensor<float> input, bool toRGB)
  9         {
 10             using var inputBmp = new Bitmap(resizeWidth, resizeHeight);
 11             using Graphics g = Graphics.FromImage(inputBmp);
 12             g.DrawImage(originBmp, 0, 0, resizeWidth, resizeHeight);
 13             g.Save();
 14             for (var y = 0; y < inputBmp.Height; y++)
 15             {
 16                 for (var x = 0; x < inputBmp.Width; x++)
 17                 {
 18                     var color = inputBmp.GetPixel(x, y);
 19                     if (toRGB)
 20                     {
 21                         input[0, 0, y, x] = color.R;
 22                         input[0, 1, y, x] = color.G;
 23                         input[0, 2, y, x] = color.B;
 24                     }
 25                     else
 26                     {
 27                         //將RGB轉成YCbCr,此處僅保留Y值用於超解析度放大
 28                         var ycbcr = RGBToYCbCr(color);
 29                         input[0, 0, y, x] = ycbcr.Y;
 30                     }
 31                 }
 32             }
 33         }
 34         /// <summary>
 35         /// 將tensor轉換成對應的bitmap
 36         /// </summary>
 37         /// <param name="output"></param>
 38         /// <returns></returns>
 39         public static Bitmap TensorToBitmap(Tensor<float> output, int width, int height, bool toRGB = true, Tensor<float> inputCbCr = null)
 40         {
 41             //建立一個新的bitmap用於填充遷移後的畫素
 42             var newBmp = new Bitmap(width, height);
 43             for (var y = 0; y < newBmp.Height; y++)
 44             {
 45                 for (var x = 0; x < newBmp.Width; x++)
 46                 {
 47                     if (toRGB)
 48                     {
 49                         //由於神經風格遷移可能存在異常值，所以我們需要將遷移後的RGB值確保只在0-255這個區間內，否則會報錯
 50                         var color = Color.FromArgb((byte)Math.Clamp(output[0, 0, y, x], 0, 255), (byte)Math.Clamp(output[0, 1, y, x], 0, 255), (byte)Math.Clamp(output[0, 2, y, x], 0, 255));
 51                         newBmp.SetPixel(x, y, color);
 52                     }
 53                     else
 54                     {
 55                         //分別將模型推理得出的Y值以及我們通過雙三次插值得到的Cr、Cb值轉換為對應的RGB色
 56                         var color = YCbCrToRGB(output[0, 0, y, x], inputCbCr[0, y, x], inputCbCr[1, y, x]);
 57                         newBmp.SetPixel(x, y, color);
 58                     }
 59                 }
 60             }
 61             return newBmp;
 62         }
 63         /// <summary>
 64         /// RGB轉YCbCr
 65         /// </summary>
 66         public static (float Y, float Cb, float Cr) RGBToYCbCr(Color color)
 67         {
 68             float fr = (float)color.R / 255;
 69             float fg = (float)color.G / 255;
 70             float fb = (float)color.B / 255;
 71             return ((float)(0.2989 * fr + 0.5866 * fg + 0.1145 * fb), (float)(-0.1687 * fr - 0.3313 * fg + 0.5000 * fb), (float)(0.5000 * fr - 0.4184 * fg - 0.0816 * fb));
 72         }
 73         /// <summary>
 74         /// YCbCr轉RGB
 75         /// </summary>
 76         public static Color YCbCrToRGB(float Y, float Cb, float Cr)
 77         {
 78             return Color.FromArgb((byte)Math.Clamp(Math.Max(0.0f, Math.Min(1.0f, (float)(Y + 0.0000 * Cb + 1.4022 * Cr))) * 255, 0, 255),
 79                 (byte)Math.Clamp(Math.Max(0.0f, Math.Min(1.0f, (float)(Y - 0.3456 * Cb - 0.7145 * Cr))) * 255, 0, 255),
 80                 (byte)Math.Clamp(Math.Max(0.0f, Math.Min(1.0f, (float)(Y + 1.7710 * Cb + 0.0000 * Cr))) * 255, 0, 255)
 81                 );
 82         }
 83         /// <summary>
 84         /// 雙三次插值提取CbCr值
 85         /// </summary>
 86         public static DenseTensor<float> ResizeGetCbCr(Bitmap original, int newWidth, int newHeight)
 87         {
 88             var cbcr = new DenseTensor<float>(new[] { 2, newWidth, newHeight });
 89             using var bitmap = new Bitmap(newWidth, newHeight);
 90             using var g = Graphics.FromImage(bitmap);
 91             g.InterpolationMode = InterpolationMode.HighQualityBicubic;
 92             g.SmoothingMode = SmoothingMode.HighQuality;
 93             g.DrawImage(original, new Rectangle(0, 0, newWidth, newHeight),
 94                 new Rectangle(0, 0, original.Width, original.Height), GraphicsUnit.Pixel);
 95             g.Dispose();
 96             for (var y = 0; y < bitmap.Width; y++)
 97             {
 98                 for (var x = 0; x < bitmap.Height; x++)
 99                 {
100                     var color = bitmap.GetPixel(x, y);
101                     var ycbcr = RGBToYCbCr(color);
102                     cbcr[0, y, x] = ycbcr.Cb;
103                     cbcr[1, y, x] = ycbcr.Cr;
104                 }
105             }
106             return cbcr;
107         }
108     }

Tools

　　這一期整活基本到此就結束了，雖然只是呼叫了兩個小模型搞著玩，但是其實只要能搞到業界主流的開源預訓練模型，其實可以解決很多實際的商業場景，比如我們最近在使用美團開源的yolov6模型做一些影象物件檢測來落地就是一個很好的例子這裡就不再展開。另外微軟也承諾ML.NET的RoadMap會包含對預訓練模型的遷移學習能力，這樣我們可以通過通用的預訓練模型根據我們自己的客製化化場景只需要提供小規模資料集即可完成特定場景的遷移學習來提高模型對特定場景問題的解決能力。今天就到這裡吧，下次再見。