This research paper presents a framework on how to build up malware dataset.Many researchers took longer time to clean the dataset from any noise or to transform the dataset into a format that can be used straight away for testing. Therefore, this research is proposing a framework to help researchers to speed up the malware dataset cleaningprocesses which later can be used for testing. It is believed, an efficient malware dataset cleaning processes, can improved the quality of the data, thus help to improve the accuracy and the efficiency of the subsequent analysis. Apart from that, an in-depth understanding of the malware taxonomy is also important prior and during the dataset cleaning processes. A new Trojan classification has been proposed to complement this framework.This experiment has been conducted in a controlled lab environment and using the dataset from VxHeavens dataset. This framework is built based on the integration of static and dynamic analyses, incident response method and knowledge database discovery (KDD) processes.This framework can be used as the basis guideline for malware researchers in building malware dataset.