接上篇 通過一個範例形象地理解C# async await 非並行非同步、並行非同步、並行非同步的並行量控制
前些天寫了兩篇關於C# async await非同步的部落格,
第一篇部落格看的人多,點贊評論也多,我想應該都看懂了,比較簡單。
第二篇部落格看的人少,點讚的也少,沒有評論。
我很納悶,第二篇部落格才是重點,如此吊炸天的程式碼,居然沒人評論。
這個程式碼,就是.NET圈的頂級大佬也沒有寫過,為什麼這麼說,這就要說到C# async await的語法糖:
沒有語法糖,程式碼一樣寫,java8沒有語法糖,一樣寫出很厲害的程式碼。但有了C# async await語法糖,普通的水平一般的業務程式設計師,哪怕是菜B,也能寫出高吞吐高效能的程式碼,這就是意義!
所以我說頂級大佬沒寫過,因為他們水平高,腦力好,手段多,自然不需要這麼寫。但普通程式設計師要那樣寫程式碼,麻煩不說,BUG頻出。
標題我用了"探索"這個詞,有沒有更好的實踐,讓小白們都會寫的並行非同步的實踐?
程式碼的實用價值,是查詢es。
最近發現es的效能非常好!先給大家看個控制檯輸出的截圖。服務我是部署在伺服器上的,真實環境,不是自己電腦。
379次es查詢,僅需0.185秒,當然耗時會有波動,零點幾秒都是正常的,超過1秒也有可能。
es最怕的是什麼,是慢查詢,是條件複雜的查詢,是範圍查詢。
我的策略是多次精確查詢,這樣可以利用es極高的吞吐能力。
既然查詢次數多,單執行緒或者說同步肯定是不行的,必須並行。
並行程式碼,python能寫嗎?java能寫嗎?肯定能啊!
但我前同事寫的python多次查詢es寫的就是同步程式碼,為什麼不併行呢?並行肯定可以寫,但是能不寫就不寫,為什麼?因為寫起來複雜,不好寫。你以為自己技術好,腦力好沒問題,但別人呢?
重點是什麼?不僅要寫並行程式碼,還要寫的簡單,不破壞程式碼原有邏輯結構。
大家都會寫的,用async await就行了,很簡單,放個我寫的,程式碼主要是在雙迴圈中多次非同步請求(大致看一下先跳過):
/// <summary>
/// xxx查詢
/// </summary>
public async Task<List<AccompanyInfo>> Query2(string strStartTime, string strEndTime, int kpCountThreshold, int countThreshold, int distanceThreshold, int timeThreshold, List<PeopleCluster> peopleClusterList)
{
List<AccompanyInfo> resultList = new List<AccompanyInfo>();
Stopwatch sw = Stopwatch.StartNew();
//建立字典
Dictionary<string, PeopleCluster> clusterIdPeopleDict = new Dictionary<string, PeopleCluster>();
foreach (PeopleCluster peopleCluster in peopleClusterList)
{
foreach (string clusterId in peopleCluster.ClusterIds)
{
if (!clusterIdPeopleDict.ContainsKey(clusterId))
{
clusterIdPeopleDict.Add(clusterId, peopleCluster);
}
}
}
int queryCount = 0;
Dictionary<string, AccompanyInfo> dict = new Dictionary<string, AccompanyInfo>();
foreach (PeopleCluster people1 in peopleClusterList)
{
List<PeopleFeatureInfo> peopleFeatureList = await ServiceFactory.Get<PeopleFeatureQueryService>().Query(strStartTime, strEndTime, people1);
queryCount++;
foreach (PeopleFeatureInfo peopleFeatureInfo1 in peopleFeatureList)
{
DateTime capturedTime = DateTime.ParseExact(peopleFeatureInfo1.captured_time, "yyyyMMddHHmmss", CultureInfo.InvariantCulture);
string strStartTime2 = capturedTime.AddSeconds(-timeThreshold).ToString("yyyyMMddHHmmss");
string strEndTime2 = capturedTime.AddSeconds(timeThreshold).ToString("yyyyMMddHHmmss");
List<PeopleFeatureInfo> peopleFeatureList2 = await ServiceFactory.Get<PeopleFeatureQueryService>().QueryExcludeSelf(strStartTime2, strEndTime2, people1);
queryCount++;
if (peopleFeatureList2.Count > 0)
{
foreach (PeopleFeatureInfo peopleFeatureInfo2 in peopleFeatureList2)
{
string key = null;
PeopleCluster people2 = null;
string people2ClusterId = null;
if (clusterIdPeopleDict.ContainsKey(peopleFeatureInfo2.cluster_id.ToString()))
{
people2 = clusterIdPeopleDict[peopleFeatureInfo2.cluster_id.ToString()];
key = $"{string.Join(",", people1.ClusterIds)}_{string.Join(",", people2.ClusterIds)}";
}
else
{
people2ClusterId = peopleFeatureInfo2.cluster_id.ToString();
key = $"{string.Join(",", people1.ClusterIds)}_{string.Join(",", people2ClusterId)}";
}
double distance = LngLatUtil.CalcDistance(peopleFeatureInfo1.Longitude, peopleFeatureInfo1.Latitude, peopleFeatureInfo2.Longitude, peopleFeatureInfo2.Latitude);
if (distance > distanceThreshold) continue;
AccompanyInfo accompanyInfo;
if (dict.ContainsKey(key))
{
accompanyInfo = dict[key];
}
else
{
accompanyInfo = new AccompanyInfo();
dict.Add(key, accompanyInfo);
}
accompanyInfo.People1 = people1;
if (people2 != null)
{
accompanyInfo.People2 = people2;
}
else
{
accompanyInfo.ClusterId2 = people2ClusterId;
}
AccompanyItem accompanyItem = new AccompanyItem();
accompanyItem.Info1 = peopleFeatureInfo1;
accompanyItem.Info2 = peopleFeatureInfo2;
accompanyInfo.List.Add(accompanyItem);
accompanyInfo.Count++;
resultList.Add(accompanyInfo);
}
}
}
}
resultList = resultList.FindAll(a => (a.People2 != null && a.Count >= kpCountThreshold) || a.Count >= countThreshold);
//去重
int beforeDistinctCount = resultList.Count;
resultList = resultList.DistinctBy(a =>
{
string str1 = string.Join(",", a.People1.ClusterIds);
string str2 = a.People2 != null ? string.Join(",", a.People2.ClusterIds) : string.Empty;
string str3 = a.ClusterId2 ?? string.Empty;
StringBuilder sb = new StringBuilder();
foreach (AccompanyItem item in a.List)
{
var info2 = item.Info2;
sb.Append($"{info2.camera_id},{info2.captured_time},{info2.cluster_id}");
}
return $"{str1}_{str2}_{str3}_{sb}";
}).ToList();
sw.Stop();
string msg = $"xxx查詢,耗時:{sw.Elapsed.TotalSeconds:0.000} 秒,查詢次數:{queryCount},去重:{beforeDistinctCount}-->{resultList.Count}";
Console.WriteLine(msg);
LogUtil.Info(msg);
return resultList;
}
上述程式碼是沒有問題的,但也有問題。就是在雙迴圈中多次請求,雖然用了async await,但不是並行,耗時會很長,如何優化?請看如下程式碼:
/// <summary>
/// xxx查詢
/// </summary>
public async Task<List<AccompanyInfo>> Query(string strStartTime, string strEndTime, int kpCountThreshold, int countThreshold, int distanceThreshold, int timeThreshold, List<PeopleCluster> peopleClusterList)
{
List<AccompanyInfo> resultList = new List<AccompanyInfo>();
Stopwatch sw = Stopwatch.StartNew();
//建立字典
Dictionary<string, PeopleCluster> clusterIdPeopleDict = new Dictionary<string, PeopleCluster>();
foreach (PeopleCluster peopleCluster in peopleClusterList)
{
foreach (string clusterId in peopleCluster.ClusterIds)
{
if (!clusterIdPeopleDict.ContainsKey(clusterId))
{
clusterIdPeopleDict.Add(clusterId, peopleCluster);
}
}
}
//組織第一層迴圈task
Dictionary<PeopleCluster, Task<List<PeopleFeatureInfo>>> tasks1 = new Dictionary<PeopleCluster, Task<List<PeopleFeatureInfo>>>();
foreach (PeopleCluster people1 in peopleClusterList)
{
var task1 = ServiceFactory.Get<PeopleFeatureQueryService>().Query(strStartTime, strEndTime, people1);
tasks1.Add(people1, task1);
}
//計算第一層迴圈task並快取結果,組織第二層迴圈task
Dictionary<string, Task<List<PeopleFeatureInfo>>> tasks2 = new Dictionary<string, Task<List<PeopleFeatureInfo>>>();
Dictionary<PeopleCluster, List<PeopleFeatureInfo>> cache1 = new Dictionary<PeopleCluster, List<PeopleFeatureInfo>>();
foreach (PeopleCluster people1 in peopleClusterList)
{
List<PeopleFeatureInfo> peopleFeatureList = await tasks1[people1];
cache1.Add(people1, peopleFeatureList);
foreach (PeopleFeatureInfo peopleFeatureInfo1 in peopleFeatureList)
{
DateTime capturedTime = DateTime.ParseExact(peopleFeatureInfo1.captured_time, "yyyyMMddHHmmss", CultureInfo.InvariantCulture);
string strStartTime2 = capturedTime.AddSeconds(-timeThreshold).ToString("yyyyMMddHHmmss");
string strEndTime2 = capturedTime.AddSeconds(timeThreshold).ToString("yyyyMMddHHmmss");
var task2 = ServiceFactory.Get<PeopleFeatureQueryService>().QueryExcludeSelf(strStartTime2, strEndTime2, people1);
string task2Key = $"{strStartTime2}_{strEndTime2}_{string.Join(",", people1.ClusterIds)}";
tasks2.TryAdd(task2Key, task2);
}
}
//讀取第一層迴圈task快取結果,計算第二層迴圈task
Dictionary<string, AccompanyInfo> dict = new Dictionary<string, AccompanyInfo>();
foreach (PeopleCluster people1 in peopleClusterList)
{
List<PeopleFeatureInfo> peopleFeatureList = cache1[people1];
foreach (PeopleFeatureInfo peopleFeatureInfo1 in peopleFeatureList)
{
DateTime capturedTime = DateTime.ParseExact(peopleFeatureInfo1.captured_time, "yyyyMMddHHmmss", CultureInfo.InvariantCulture);
string strStartTime2 = capturedTime.AddSeconds(-timeThreshold).ToString("yyyyMMddHHmmss");
string strEndTime2 = capturedTime.AddSeconds(timeThreshold).ToString("yyyyMMddHHmmss");
string task2Key = $"{strStartTime2}_{strEndTime2}_{string.Join(",", people1.ClusterIds)}";
List<PeopleFeatureInfo> peopleFeatureList2 = await tasks2[task2Key];
if (peopleFeatureList2.Count > 0)
{
foreach (PeopleFeatureInfo peopleFeatureInfo2 in peopleFeatureList2)
{
string key = null;
PeopleCluster people2 = null;
string people2ClusterId = null;
if (clusterIdPeopleDict.ContainsKey(peopleFeatureInfo2.cluster_id.ToString()))
{
people2 = clusterIdPeopleDict[peopleFeatureInfo2.cluster_id.ToString()];
key = $"{string.Join(",", people1.ClusterIds)}_{string.Join(",", people2.ClusterIds)}";
}
else
{
people2ClusterId = peopleFeatureInfo2.cluster_id.ToString();
key = $"{string.Join(",", people1.ClusterIds)}_{string.Join(",", people2ClusterId)}";
}
double distance = LngLatUtil.CalcDistance(peopleFeatureInfo1.Longitude, peopleFeatureInfo1.Latitude, peopleFeatureInfo2.Longitude, peopleFeatureInfo2.Latitude);
if (distance > distanceThreshold) continue;
AccompanyInfo accompanyInfo;
if (dict.ContainsKey(key))
{
accompanyInfo = dict[key];
}
else
{
accompanyInfo = new AccompanyInfo();
dict.Add(key, accompanyInfo);
}
accompanyInfo.People1 = people1;
if (people2 != null)
{
accompanyInfo.People2 = people2;
}
else
{
accompanyInfo.ClusterId2 = people2ClusterId;
}
AccompanyItem accompanyItem = new AccompanyItem();
accompanyItem.Info1 = peopleFeatureInfo1;
accompanyItem.Info2 = peopleFeatureInfo2;
accompanyInfo.List.Add(accompanyItem);
accompanyInfo.Count++;
resultList.Add(accompanyInfo);
}
}
}
}
resultList = resultList.FindAll(a => (a.People2 != null && a.Count >= kpCountThreshold) || a.Count >= countThreshold);
//去重
int beforeDistinctCount = resultList.Count;
resultList = resultList.DistinctBy(a =>
{
string str1 = string.Join(",", a.People1.ClusterIds);
string str2 = a.People2 != null ? string.Join(",", a.People2.ClusterIds) : string.Empty;
string str3 = a.ClusterId2 ?? string.Empty;
StringBuilder sb = new StringBuilder();
foreach (AccompanyItem item in a.List)
{
var info2 = item.Info2;
sb.Append($"{info2.camera_id},{info2.captured_time},{info2.cluster_id}");
}
return $"{str1}_{str2}_{str3}_{sb}";
}).ToList();
//排序
foreach (AccompanyInfo item in resultList)
{
item.List.Sort((a, b) => -string.Compare(a.Info1.captured_time, b.Info1.captured_time));
}
sw.Stop();
string msg = $"xxx查詢,耗時:{sw.Elapsed.TotalSeconds:0.000} 秒,查詢次數:{tasks1.Count + tasks2.Count},去重:{beforeDistinctCount}-->{resultList.Count}";
Console.WriteLine(msg);
LogUtil.Info(msg);
return resultList;
}
List<PeopleFeatureInfo>[] listArray = await Task.WhenAll(tasks2.Values);
在雙迴圈體中,怎麼拿結果?肯定能寫,但又要思考怎麼寫了不是?
而我的寫法,在雙迴圈體中是可以直接拿結果的:
List<PeopleFeatureInfo> list = await tasks2[task2Key];
def get_es_multiprocess(index_list, people_list, core_percent, rev_clusterid_idcard_dict):
'''
多程序讀取es資料,轉為整個資料框,按時間排序
:return: 規模較大的資料框
'''
col_list = ["cluster_id", "camera_id", "captured_time"]
pool = Pool(processes=int(mp.cpu_count() * core_percent))
input_list = [(i, people_list, col_list) for i in index_list]
res = pool.map(get_es, input_list)
if not res:
return None
pool.close()
pool.join()
df_all = pd.DataFrame(columns=col_list+['longitude', 'latitude'])
for df in res:
df_all = pd.concat([df_all, df])
# 這裡強制轉換為字串!
df_all['cluster_id_'] = df_all['cluster_id'].apply(lambda x: rev_clusterid_idcard_dict[str(x)])
del df_all['cluster_id']
df_all.rename(columns={'cluster_id_': 'cluster_id'}, inplace=True)
df_all.sort_values(by='captured_time', inplace=True)
print('=' * 100)
print('整個資料(聚類前):')
print(df_all.info())
cluster_id_list = [(i, df) for i, df in df_all.groupby(['cluster_id'])]
cluster_id_list_split = [j for j in func(cluster_id_list, 1000000)]
# todo 縮小資料集,用於偵錯!
data_all = df_all.iloc[:, :]
return data_all, cluster_id_list_split
res = pool.map(get_es, input_list)
pool.join()
...省略
其中get_es是查詢es的方法,他寫的應該不是非同步方法,不過這個不是重點
2. res是查詢結果,通過並行的方式一把查出來,放到res中,然後把結果再解出來
3. 注意,這只是單迴圈,想想雙層迴圈怎麼寫
4. pool.join()是阻塞當前執行緒的,這個不好
5. 同事註釋中寫的是"多程序",是寫錯了嗎?實際是多執行緒?還是就是多程序?
6. 當然,python是有async await非同步寫法的,應該不比C#差,只是同事沒有用,可能是因為他用的python版本不夠新
7. python程式碼,字串太多,字串是最不好維護的。C#中的字串裡面都是強型別。