Java-呼叫R語言和呼叫Python（前後端展示）

1. 背景

R語言和Python用於資料分析和資料處理，並生成相應的直方圖和散點圖
需要實現一個展示平臺，後端使用Java，分別呼叫R語言和呼叫Python，並返回資料和圖給前端顯示
這個平臺主要實現多維度資料的特徵選擇，以及資料集協變數偏移（Covariate shift）的糾正的功能
本質就是一個Java呼叫R語言以及Java呼叫Python的Demo，做得很簡單，大神勿噴

2. 技術棧

Java 用的是 Springboot
R語言
Python
前端用的是 Vue + ElementUI （前端只會點皮毛）
MySQL

3. Java呼叫R語言

3.1 R語言安裝Rserve伺服器

在這之前需要分別對Java和R做些準備，首先是R語言安裝Rserve伺服器
Java呼叫R語言時，Rserve需要啟動，可以通過CMD命令列 / RStudio 執行

# 安裝Rserve
install.packages("Rserve")
# 載入Rserve
library(Rserve())
# 啟動Rserve
Rserve()

這裡使用CMD命令列展示啟動Rserve，這樣完成了Java呼叫R語言的第一步

3.2 Springboot新增Rsession依賴

新增Rsession依賴之後就可以直接調包了

<dependency>
   <groupId>com.github.yannrichet</groupId>
   <artifactId>Rsession</artifactId>
   <version>1.8.3</version>
</dependency>

3.3 Java呼叫R常用命令

這裡演示一些我需求中Java呼叫R的一些方式，其中包括一些比較常用的方法
Java呼叫R的基本指令、R的圖片如何儲存並返回、R的結果如何獲取和過濾等

/**
* 這裡是Java呼叫R語言，R語言對多維度的資料進行特徵選擇，並將特徵選擇的結果返回，寫入MySQL
**/
public List<Map<String,String>> featureSelection(){
    RConnection c = null;// RConnection用於和Rserve建立連線
	try{
		c = new RConncetion();// 建立連線

		String RPath = "../featureSelection.R";// R檔案的地址

		c.assign("path",Rpath);// assign命令是將Rpath新增到R中，命名為path

		c.eval("source(path)");// eval命令是執行R命令，這裡則是執行source方法根據路徑載入R檔案

		String Dpath = fileMapper.selectFilePath("train",1);// 通過MySQL獲取資料集路徑

		String str = "rfProfile <- fsFunction('"+Dpath+"')";// R命令，執行我的R檔案中的方法

		c.eval(str);//執行

		// 出圖，因為是個Demo，圖片我就直接儲存在了本地，圖片以資料集名稱命名
		String fileName = fileMapper.selectFileName("train", 1);//檔名

		String imgPath = "D:/fileAndData/imgs/" + fileName + ".png";// 圖片儲存路徑

		c.assign("imgPath",imgPath);
		c.eval("png(imgPath)");// 使用R語言的png()方法儲存圖片
		c.assign("mainName",fileName);
		c.eval("print(plot(rfProfile,type='b',main=mainName))");// 想要出圖一定要套一個print()，不然會是空白
		c.eval("dev.off()");// 出圖這個也是必不可少，自行百度瞭解

		// 獲取特徵選擇的結果，結果使用String接收，需要通過正規表示式過濾一下我們需要的結果
		c.eval("features <- rfProfile$optVariables");
		// 獲取R的結果使用的是paste()以及capture.output()方法，相當於把輸出全捕獲過來了
		String feature = c.eval("paste(capture.output(features),collapse='\\n')").asString();
		// 獲取重要性得分
		c.eval("impt <- varImp(rfProfile)");
		String imptScores = c.eval("paste(capture.output(impt$Overall),collapse='\\n')").asString();

		// 寫了個工具類過濾R返回的結果，可以根據你的輸出結果去定義
		handlerRresults = new HandlerRresults();
		List<Map<String, String>> stringStringMapList = handlerRresults.catchAndHandlerR(feature, imptScores);
		fileMapper.deleteFileInfo(-1,"train");//-1 檔案已使用
		String featsStr = handlerRresults.getFeatsStr(feature);
		featMapper.insertFeat(featsStr);
		return stringStringMapList;
	} catch (RserveException | REXPMismatchException e) {
		e.printStackTrace();
	} finally {
		c.close(); // 一定要這一行！！！用完一定要關！！！
	}
	return null;
}

總結一個簡易的Java呼叫R的模板，R語言是按行執行的，無情eval()

public void JavaCallRDemo(){
	RConnection c = null;
	try{
		c = new RConnection();
		
		c.assign();//通過Java新增變數至R
		
		c.eval();//Java執行R命令
		
	} catch (RserveException | REXPMismatchException e) {
		e.printStackTrace();
	} finally {
		c.close();
	}
}

3.4 Java呼叫R的特徵選擇前端演示

我的資料集是30維的，結果選取了其中5個特徵（Best trade-off）
這裡將特徵及其對應的重要性得分通過表格的形式展示
圖片則是通過Base64轉碼的方式傳給前端

4 Java呼叫Python

4.1 Java呼叫Python程式碼部分

Java呼叫Python，我使用的是Process類並通過Runtime呼叫其他程序
Runtime可以呼叫cmd、shell等，這裡我以我的專案為例稍作演示

/**
* Java使用Runtime呼叫python
**/
public String callPy(){
	StringBuffer arr = new StringBuffer();// 用於獲取結果
	String basePath = "d://fileAndData/process/";// demo都是將檔案直接存本地了，圖方便
	
	// 以下為呼叫Python時傳遞的引數
	String featName = featMapper.getFeat();
	String trainPath = fileMapper.selectFilePath("train",-2);
	String ptrainPath = basePath + fileMapper.selectFileName("train",-2);
	String ptestPath = basePath + fileMapper.selectFileName("test",-2);
	
	Process proc; //宣告一下Process
	try{
		// 字串陣列儲存一下呼叫命令：1.使用python3 2.呼叫某個.py檔案 3-6.傳遞的引數
		String[] args = new String[]{"python3","../kmm.py",featName,trainPath,ptrainPath,ptestPath};
		
		proc = Runtime.getRuntime().exec(args);// 呼叫命令，cmd方式
		
		BufferedReader in = new BufferedReader(new InputStreamReader(proc.getInputStream()));// 得到輸入流
		
		String line = null;
		while((line = in.readLine())!=null){
			arr.append(line).append("\n");// 寫入
		}
		in.close();
		proc.waitFor();
	} catch (IOException | InterruptedException e) {
		e.printStackTrace();
	}
	return arr.toString();
}

由於在我這個Demo中，Python指令碼執行完成後的結果全是散點圖
我的做法是python直接把圖儲存本地，python執行完成後呼叫介面通過Base64格式傳給前端
後來發現其實也可以直接將返回的Base64格式的圖片丟給前端，不用那麼麻煩

/**
* 這裡是一個我用於獲取某個資料夾下所有檔案，並轉為Base64格式的方法
* 因為我資料夾下只會有圖片，我Demo也就只做了一個判空校驗，直接開幹
* Controller層
**/
public List<String> getPyFigsListBase64(HttpServletResponse response){
	String pyFilePath = "d://fileAndData/kmmImgs";// 圖片本地路徑
	
	List<String> res = new ArrayList<>();
	
	handlerPyresults = new HandlerPyresults();// 寫個了工具類
	
	List<File> pyFiles = handlerPyresults.getAllFile(pyFilePath);// 獲取所有檔案
	
	for(File file : pyFiles) {
		byte[] fig = handlerPyresults.file2Byte(file);// file型別轉為byte[]型別
		String base64str = Base64.encodeBase64String(fig);// byte[]轉為base64
		String img = "data:image/png;base64," + base64str;// 新增頭，告訴前端這是個圖片
		res.add(img);
	}
	return res;
}

/**
* file轉byte[]
**/
public byte[] file2Byte(File file){
        if(file == null){
            return null;
        }
        FileInputStream fileInputStream = null;
        ByteArrayOutputStream byteArrayOutputStream = null;
        try {
            fileInputStream = new FileInputStream(file);
            byteArrayOutputStream = new ByteArrayOutputStream();
            byte[] b = new byte[1024];
            int n;
            while ((n = fileInputStream.read(b))!=-1){
                byteArrayOutputStream.write(b,0,n);
            }
            return byteArrayOutputStream.toByteArray();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                fileInputStream.close();
                byteArrayOutputStream.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return null;
    }

4.2 Java呼叫Python結果演示

我的python指令碼主要是對資料集使用了KMM演演算法，是一種協變數偏移糾正的方法
通過散點圖反映測試集和訓練集之間的分佈情況和差異，這裡略...

5. 總結

這個專案是我碩士期間導師丟給我的一個需求，這裡說一下為什麼要用Java呼叫R語言和Python。

首先我有一個伽馬射線的二分類任務，通過R語言使用多個傳統機器學習模型實現。
在此之前使用R語言實現了多維度資料集的資料預處理、特徵選擇等功能，並且出圖方便，程式碼簡單。
Python則實現了資料集協變數偏移糾正的功能，最終得到的資料集用於丟進模型做分類。
這個平臺通過呼叫R和Python，整合了資料集預處理、協變數偏移糾正的方法，並且可以通過多個圖視覺化看到分析結果。平臺還實現了資料集上傳、下載等功能...
主要是針對Java呼叫R語言以及呼叫Python作一個記錄，實際上平臺有許多細節都沒有顧慮到，相當於一個學習筆記吧。