資料挖掘查詢語言 - tw511教學網

資料挖掘查詢語言提出由Han, Fu, Wang等DBMiner 資料挖掘系統。資料挖掘查詢語言實際上是基於結構化查詢語言（SQL）。資料挖掘查詢語言可以設計為支援ad hoc和互動式資料挖掘。DMQL提供的命令來指定原語。DMQL可以與資料庫中的資料倉庫正常工作。資料挖掘查詢語言可以用來定義資料挖掘任務。特別是我們研究如何定義資料挖掘查詢語言資料倉庫和資料集市。

任務相關的資料的語法規範

這裡是DMQL的指定任務相關的資料的語法：

use database database_name, 
or 
use data warehouse data_warehouse_name
in relevance to att_or_dim_list
from relation(s)/cube(s) [where condition]
order by order_list
group by grouping_list

指定型別的知識語法

在這裡，我們將討論的語法特徵，辨析，關聯，分類和預測。

表徵

特徵語法是：

mine characteristics [as pattern_name]
  analyze  {measure(s) }
The analyze clause, specifies aggregate measures, such as count, sum, or count%.
For example:
Description describing customer purchasing habits.
mine characteristics as customerPurchasing
analyze count%

判別

判別語法是：

mine comparison [as {pattern_name]}
For {target_class } where  {t arget_condition } 
{versus  {contrast_class_i }
where {contrast_condition_i}}  
analyze  {measure(s) }

例如，使用者可以定義bigSpenders作為購買物品的售價為100美元或以上的平均水平，budgetSpenders作為誰在低於100美元，平均購買商品的客戶的客戶。判別描述從每一類客戶的挖掘可以在DMQL作為被指定：

mine comparison as purchaseGroups
for bigSpenders where avg(I.price) ≥$100
versus budgetSpenders where avg(I.price)< $100
analyze count

關聯

關聯的語法是：

mine associations [ as {pattern_name} ]
{matching {metapattern} }

範例：

mine associations as buyingHabits
matching P(X:customer,W) ^ Q(X,Y) ≥ buys(X,Z)

註：其中，X是客戶關係的關鍵，P和Q是謂詞變數和W，Y和Z是物件變數。

分類

分類的語法是：

mine classification [as pattern_name]
analyze classifying_attribute_or_dimension

例如，礦山模式進行分類客戶信用評級，其中類由屬性credit_rating確定，礦山劃分為classifyCustomerCreditRating

analyze credit_rating

預測

預測的語法是：

mine prediction [as pattern_name]
analyze prediction_attribute_or_dimension
{set {attribute_or_dimension_i= value_i}}

概念層次規格語法

指定要使用什麼概念層次：

use hierarchy <hierarchy> for <attribute_or_dimension>

我們使用不同的語法來定義不同的型別層次結構，如：

-schema hierarchies
define hierarchy time_hierarchy on date as [date,month quarter,year]
-
set-grouping hierarchies
define hierarchy age_hierarchy for age on customer as
level1: {young, middle_aged, senior} < level0: all
level2: {20, ..., 39} < level1: young
level3: {40, ..., 59} < level1: middle_aged
level4: {60, ..., 89} < level1: senior
-operation-derived hierarchies
define hierarchy age_hierarchy  for age  on customer  as
{age_category(1), ..., age_category(5)} 
:= cluster(default, age, 5) < all(age)
-rule-based hierarchies
define hierarchy profit_margin_hierarchy  on item  as
level_1: low_profit_margin < level_0:  all
if (price - cost)< $50
   level_1:  medium-profit_margin < level_0:  all
if ((price - cost) > $50)  and ((price - cost) ≤ $250)) 
   level_1:  high_profit_margin < level_0:  all

興趣度度量規範語法

興趣度度量和閾值可通過指定的語句的使用者：

with <interest_measure_name>  threshold = threshold_value

範例：

with support threshold = 0.05
with confidence threshold = 0.7

格局呈報及視覺化規約語法

我們有自己的語法，它允許使用者指定一個或多個形式發現的模式的顯示。

display as <result_form>

範例：

display as table

DMQL全規格

作為一家公司的市場部經理，你想描繪誰購買售價不低於100美元的物品，WRT顧客的年齡，購買型別的專案，與發生在哪一個專案是做顧客的購買習慣。你想知道客戶具有該特性的百分比。特別是，只關心在加拿大製造，及與美國運通（“美國運通”）信用卡支付購買。你想檢視的一個表的形式所得到的描述。

use database AllElectronics_db
use hierarchy location_hierarchy for B.address
mine characteristics as customerPurchasing
analyze count%
in relevance to C.age,I.type,I.place_made
from customer C, item I, purchase P, items_sold S,  branch B
where I.item_ID = S.item_ID and P.cust_ID = C.cust_ID and
P.method_paid = "AmEx" and B.address = "Canada" and I.price ≥ 100
with noise threshold = 5%
display as table

資料挖掘語言的標準化

標準化的資料挖掘語言將達到以下目的：

資料挖掘解決方案的系統開發。
提高互操作性多個資料挖掘系統和功能之一。
推動教育。
推廣使用在行業和社會資料挖掘系統。