Open Science Research Excellence

Haitao Yang

Publications

2

Publications

2
7432
TSM: A Design Pattern to Make Ad-hoc BPMs Easy and Inexpensive in Workflow-aware MISs
Authors:
Abstract:

Despite so many years- development, the mainstream of workflow solutions from IT industries has not made ad-hoc workflow-support easy or inexpensive in MIS. Moreover, most of academic approaches tend to make their resulted BPM (Business Process Management) more complex and clumsy since they used to necessitate modeling workflow. To cope well with various ad-hoc or casual requirements on workflows while still keeping things simple and inexpensive, the author puts forth first the TSM design pattern that can provide a flexible workflow control while minimizing demand of predefinitions and modeling workflow, which introduces a generic approach for building BPM in workflow-aware MISs (Management Information Systems) with low development and running expenses.

Keywords:
Ad-hoc workflow, BPM, Design pattern, TSM
1
10007684
Regression Approach for Optimal Purchase of Hosts Cluster in Fixed Fund for Hadoop Big Data Platform
Abstract:

Given a fixed fund, purchasing fewer hosts of higher capability or inversely more of lower capability is a must-be-made trade-off in practices for building a Hadoop big data platform. An exploratory study is presented for a Housing Big Data Platform project (HBDP), where typical big data computing is with SQL queries of aggregate, join, and space-time condition selections executed upon massive data from more than 10 million housing units. In HBDP, an empirical formula was introduced to predict the performance of host clusters potential for the intended typical big data computing, and it was shaped via a regression approach. With this empirical formula, it is easy to suggest an optimal cluster configuration. The investigation was based on a typical Hadoop computing ecosystem HDFS+Hive+Spark. A proper metric was raised to measure the performance of Hadoop clusters in HBDP, which was tested and compared with its predicted counterpart, on executing three kinds of typical SQL query tasks. Tests were conducted with respect to factors of CPU benchmark, memory size, virtual host division, and the number of element physical host in cluster. The research has been applied to practical cluster procurement for housing big data computing.

Keywords:
Hadoop platform planning, optimal cluster scheme at fixed-fund, performance empirical formula, typical SQL query tasks.