42
10011847
Image Ranking to Assist Object Labeling for Training Detection Models
Abstract: Training a machine learning model for object detection
that generalizes well is known to benefit from a training dataset
with diverse examples. However, training datasets usually contain
many repeats of common examples of a class and lack rarely seen
examples. This is due to the process commonly used during human
annotation where a person would proceed sequentially through a
list of images labeling a sufficiently high total number of examples.
Instead, the method presented involves an active process where, after
the initial labeling of several images is completed, the next subset
of images for labeling is selected by an algorithm. This process of
algorithmic image selection and manual labeling continues in an
iterative fashion. The algorithm used for the image selection is a
deep learning algorithm, based on the U-shaped architecture, which
quantifies the presence of unseen data in each image in order to find
images that contain the most novel examples. Moreover, the location
of the unseen data in each image is highlighted, aiding the labeler in
spotting these examples. Experiments performed using semiconductor
wafer data show that labeling a subset of the data, curated by this
algorithm, resulted in a model with a better performance than a
model produced from sequentially labeling the same amount of data.
Also, similar performance is achieved compared to a model trained
on exhaustive labeling of the whole dataset. Overall, the proposed
approach results in a dataset that has a diverse set of examples per
class as well as more balanced classes, which proves beneficial when
training a deep learning model.
Digital Article Identifier (DOI):
41
10011862
Bayesian Deep Learning Algorithms for Classifying COVID-19 Images
Abstract: The study investigates the accuracy and loss of deep learning algorithms with the set of coronavirus (COVID-19) images dataset by comparing Bayesian convolutional neural network and traditional convolutional neural network in low dimensional dataset. 50 sets of X-ray images out of which 25 were COVID-19 and the remaining 20 were normal, twenty images were set as training while five were set as validation that were used to ascertained the accuracy of the model. The study found out that Bayesian convolution neural network outperformed conventional neural network at low dimensional dataset that could have exhibited under fitting. The study therefore recommended Bayesian Convolutional neural network (BCNN) for android apps in computer vision for image detection.
Digital Article Identifier (DOI):
40
10011865
A Context-Centric Chatbot for Cryptocurrency Using the Bidirectional Encoder Representations from Transformers Neural Networks
Abstract: Inspired by the recent movement of digital currency,
we are building a question answering system concerning the subject
of cryptocurrency using Bidirectional Encoder Representations from
Transformers (BERT). The motivation behind this work is to
properly assist digital currency investors by directing them to
the corresponding knowledge bases that can offer them help and
increase the querying speed. BERT, one of newest language models
in natural language processing, was investigated to improve the
quality of generated responses. We studied different combinations of
hyperparameters of the BERT model to obtain the best fit responses.
Further, we created an intelligent chatbot for cryptocurrency using
BERT. A chatbot using BERT shows great potential for the further
advancement of a cryptocurrency market tool. We show that the
BERT neural networks generalize well to other tasks by applying
it successfully to cryptocurrency.
Digital Article Identifier (DOI):
39
10011884
Malaria Parasite Detection Using Deep Learning Methods
Abstract: Malaria is a serious disease which affects hundreds of
millions of people around the world, each year. If not treated in time,
it can be fatal. Despite recent developments in malaria diagnostics,
the microscopy method to detect malaria remains the most common.
Unfortunately, the accuracy of microscopic diagnostics is dependent
on the skill of the microscopist and limits the throughput of malaria
diagnosis. With the development of Artificial Intelligence tools and
Deep Learning techniques in particular, it is possible to lower the cost,
while achieving an overall higher accuracy. In this paper, we present a
VGG-based model and compare it with previously developed models
for identifying infected cells. Our model surpasses most previously
developed models in a range of the accuracy metrics. The model has
an advantage of being constructed from a relatively small number of
layers. This reduces the computer resources and computational time.
Moreover, we test our model on two types of datasets and argue
that the currently developed deep-learning-based methods cannot
efficiently distinguish between infected and contaminated cells. A
more precise study of suspicious regions is required.
Digital Article Identifier (DOI):
38
10011740
Improved Rare Species Identification Using Focal Loss Based Deep Learning Models
Abstract: The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.
Digital Article Identifier (DOI):
37
10011779
Deep Learning Based 6D Pose Estimation for Bin-Picking Using 3D Point Clouds
Abstract: Estimating the 6D pose of objects is a core step for robot bin-picking tasks. The problem is that various objects are usually randomly stacked with heavy occlusion in real applications. In this work, we propose a method to regress 6D poses by predicting three points for each object in the 3D point cloud through deep learning. To solve the ambiguity of symmetric pose, we propose a labeling method to help the network converge better. Based on the predicted pose, an iterative method is employed for pose optimization. In real-world experiments, our method outperforms the classical approach in both precision and recall.
Digital Article Identifier (DOI):
36
10011791
Facial Emotion Recognition with Convolutional Neural Network Based Architecture
Abstract: Neural networks are appealing for many applications since they are able to learn complex non-linear relationships between input and output data. As the number of neurons and layers in a neural network increase, it is possible to represent more complex relationships with automatically extracted features. Nowadays Deep Neural Networks (DNNs) are widely used in Computer Vision problems such as; classification, object detection, segmentation image editing etc. In this work, Facial Emotion Recognition task is performed by proposed Convolutional Neural Network (CNN)-based DNN architecture using FER2013 Dataset. Moreover, the effects of different hyperparameters (activation function, kernel size, initializer, batch size and network size) are investigated and ablation study results for Pooling Layer, Dropout and Batch Normalization are presented.
Digital Article Identifier (DOI):
35
10011629
A Survey of Response Generation of Dialogue Systems
Abstract: An essential task in the field of artificial intelligence is
to allow computers to interact with people through natural language.
Therefore, researches such as virtual assistants and dialogue systems
have received widespread attention from industry and academia. The
response generation plays a crucial role in dialogue systems, so to
push forward the research on this topic, this paper surveys various
methods for response generation. We sort out these methods into
three categories. First one includes finite state machine methods,
framework methods, and instance methods. The second contains
full-text indexing methods, ontology methods, vast knowledge base
method, and some other methods. The third covers retrieval methods
and generative methods. We also discuss some hybrid methods based
knowledge and deep learning. We compare their disadvantages and
advantages and point out in which ways these studies can be improved
further. Our discussion covers some studies published in leading
conferences such as IJCAI and AAAI in recent years.
Digital Article Identifier (DOI):
34
10011630
A Survey of Sentiment Analysis Based on Deep Learning
Abstract: Sentiment analysis is a very active research topic.
Every day, Facebook, Twitter, Weibo, and other social media,
as well as significant e-commerce websites, generate a massive
amount of comments, which can be used to analyse peoples
opinions or emotions. The existing methods for sentiment analysis
are based mainly on sentiment dictionaries, machine learning, and
deep learning. The first two kinds of methods rely on heavily
sentiment dictionaries or large amounts of labelled data. The third
one overcomes these two problems. So, in this paper, we focus
on the third one. Specifically, we survey various sentiment analysis
methods based on convolutional neural network, recurrent neural
network, long short-term memory, deep neural network, deep belief
network, and memory network. We compare their futures, advantages,
and disadvantages. Also, we point out the main problems of
these methods, which may be worthy of careful studies in the
future. Finally, we also examine the application of deep learning in
multimodal sentiment analysis and aspect-level sentiment analysis.
Digital Article Identifier (DOI):
33
10011653
On Dialogue Systems Based on Deep Learning
Abstract: Nowadays, dialogue systems increasingly become the
way for humans to access many computer systems. So, humans
can interact with computers in natural language. A dialogue
system consists of three parts: understanding what humans say in
natural language, managing dialogue, and generating responses in
natural language. In this paper, we survey deep learning based
methods for dialogue management, response generation and dialogue
evaluation. Specifically, these methods are based on neural network,
long short-term memory network, deep reinforcement learning,
pre-training and generative adversarial network. We compare these
methods and point out the further research directions.
Digital Article Identifier (DOI):
32
10011686
A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators
Abstract: With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.
Digital Article Identifier (DOI):
31
10011546
SNR Classification Using Multiple CNNs
Abstract: Noise estimation is essential in today wireless systems
for power control, adaptive modulation, interference suppression and
quality of service. Deep learning (DL) has already been applied in the
physical layer for modulation and signal classifications. Unacceptably
low accuracy of less than 50% is found to undermine traditional
application of DL classification for SNR prediction. In this paper,
we use divide-and-conquer algorithm and classifier fusion method
to simplify SNR classification and therefore enhances DL learning
and prediction. Specifically, multiple CNNs are used for classification
rather than a single CNN. Each CNN performs a binary classification
of a single SNR with two labels: less than, greater than or equal.
Together, multiple CNNs are combined to effectively classify over a
range of SNR values from −20 ≤ SNR ≤ 32 dB.We use pre-trained
CNNs to predict SNR over a wide range of joint channel parameters
including multiple Doppler shifts (0, 60, 120 Hz), power-delay
profiles, and signal-modulation types (QPSK,16QAM,64-QAM). The
approach achieves individual SNR prediction accuracy of 92%,
composite accuracy of 70% and prediction convergence one order
of magnitude faster than that of traditional estimation.
Digital Article Identifier (DOI):
30
10011557
A Deep-Learning Based Prediction of Pancreatic Adenocarcinoma with Electronic Health Records from the State of Maine
Authors: Xiaodong Li,
Peng Gao,
Chao-Jung Huang,
Shiying Hao,
Xuefeng B. Ling,
Yongxia Han,
Yaqi Zhang,
Le Zheng,
Chengyin Ye,
Modi Liu,
Minjie Xia,
Changlin Fu,
Bo Jin,
Karl G. Sylvester,
Eric Widen
Abstract: Predicting the risk of Pancreatic Adenocarcinoma (PA) in advance can benefit the quality of care and potentially reduce population mortality and morbidity. The aim of this study was to develop and prospectively validate a risk prediction model to identify patients at risk of new incident PA as early as 3 months before the onset of PA in a statewide, general population in Maine. The PA prediction model was developed using Deep Neural Networks, a deep learning algorithm, with a 2-year electronic-health-record (EHR) cohort. Prospective results showed that our model identified 54.35% of all inpatient episodes of PA, and 91.20% of all PA that required subsequent chemoradiotherapy, with a lead-time of up to 3 months and a true alert of 67.62%. The risk assessment tool has attained an improved discriminative ability. It can be immediately deployed to the health system to provide automatic early warnings to adults at risk of PA. It has potential to identify personalized risk factors to facilitate customized PA interventions.
Digital Article Identifier (DOI):
29
10011571
Churn Prediction for Telecommunication Industry Using Artificial Neural Networks
Abstract: Telecommunication service providers demand accurate
and precise prediction of customer churn probabilities to increase the
effectiveness of their customer relation services. The large amount of
customer data owned by the service providers is suitable for analysis
by machine learning methods. In this study, expenditure data of
customers are analyzed by using an artificial neural network (ANN).
The ANN model is applied to the data of customers with different
billing duration. The proposed model successfully predicts the churn
probabilities at 83% accuracy for only three months expenditure data
and the prediction accuracy increases up to 89% when the nine month
data is used. The experiments also show that the accuracy of ANN
model increases on an extended feature set with information of the
changes on the bill amounts.
Digital Article Identifier (DOI):
28
10011599
Personal Information Classification Based on Deep Learning in Automatic Form Filling System
Abstract: Recently, the rapid development of deep learning makes
artificial intelligence (AI) penetrate into many fields, replacing
manual work there. In particular, AI systems also become a research
focus in the field of automatic office. To meet real needs in automatic
officiating, in this paper we develop an automatic form filling system.
Specifically, it uses two classical neural network models and several
word embedding models to classify various relevant information
elicited from the Internet. When training the neural network models,
we use less noisy and balanced data for training. We conduct a series
of experiments to test my systems and the results show that our
system can achieve better classification results.
Digital Article Identifier (DOI):
27
10011440
Deep Learning Based, End-to-End Metaphor Detection in Greek with Recurrent and Convolutional Neural Networks
Abstract: This paper presents and benchmarks a number of
end-to-end Deep Learning based models for metaphor detection in
Greek. We combine Convolutional Neural Networks and Recurrent
Neural Networks with representation learning to bear on the metaphor
detection problem for the Greek language. The models presented
achieve exceptional accuracy scores, significantly improving the
previous state-of-the-art results, which had already achieved accuracy
0.82. Furthermore, no special preprocessing, feature engineering or
linguistic knowledge is used in this work. The methods presented
achieve accuracy of 0.92 and F-score 0.92 with Convolutional
Neural Networks (CNNs) and bidirectional Long Short Term Memory
networks (LSTMs). Comparable results of 0.91 accuracy and 0.91
F-score are also achieved with bidirectional Gated Recurrent Units
(GRUs) and Convolutional Recurrent Neural Nets (CRNNs). The
models are trained and evaluated only on the basis of training tuples,
the related sentences and their labels. The outcome is a state-of-the-art
collection of metaphor detection models, trained on limited labelled
resources, which can be extended to other languages and similar
tasks.
Digital Article Identifier (DOI):
26
10011384
Deep Learning Application for Object Image Recognition and Robot Automatic Grasping
Abstract: Since the vision system application in industrial environment for autonomous purposes is required intensely, the image recognition technique becomes an important research topic. Here, deep learning algorithm is employed in image system to recognize the industrial object and integrate with a 7A6 Series Manipulator for object automatic gripping task. PC and Graphic Processing Unit (GPU) are chosen to construct the 3D Vision Recognition System. Depth Camera (Intel RealSense SR300) is employed to extract the image for object recognition and coordinate derivation. The YOLOv2 scheme is adopted in Convolution neural network (CNN) structure for object classification and center point prediction. Additionally, image processing strategy is used to find the object contour for calculating the object orientation angle. Then, the specified object location and orientation information are sent to robotic controller. Finally, a six-axis manipulator can grasp the specific object in a random environment based on the user command and the extracted image information. The experimental results show that YOLOv2 has been successfully employed to detect the object location and category with confidence near 0.9 and 3D position error less than 0.4 mm. It is useful for future intelligent robotic application in industrial 4.0 environment.
Digital Article Identifier (DOI):
25
10011399
NANCY: Combining Adversarial Networks with Cycle-Consistency for Robust Multi-Modal Image Registration
Abstract: Multimodal image registration is a profoundly complex
task which is why deep learning has been used widely to address it in
recent years. However, two main challenges remain: Firstly, the lack
of ground truth data calls for an unsupervised learning approach,
which leads to the second challenge of defining a feasible loss
function that can compare two images of different modalities to judge
their level of alignment. To avoid this issue altogether we implement a
generative adversarial network consisting of two registration networks
GAB, GBA and two discrimination networks DA, DB connected by
spatial transformation layers. GAB learns to generate a deformation
field which registers an image of the modality B to an image of the
modality A. To do that, it uses the feedback of the discriminator DB
which is learning to judge the quality of alignment of the registered
image B. GBA and DA learn a mapping from modality A to modality
B. Additionally, a cycle-consistency loss is implemented. For this,
both registration networks are employed twice, therefore resulting in
images ˆA, ˆB which were registered to ˜B, ˜A which were registered
to the initial image pair A, B. Thus the resulting and initial images
of the same modality can be easily compared. A dataset of liver
CT and MRI was used to evaluate the quality of our approach and
to compare it against learning and non-learning based registration
algorithms. Our approach leads to dice scores of up to 0.80 ± 0.01
and is therefore comparable to and slightly more successful than
algorithms like SimpleElastix and VoxelMorph.
Digital Article Identifier (DOI):
24
10011141
Automatic Number Plate Recognition System Based on Deep Learning
Abstract: In the last few years, Automatic Number Plate Recognition (ANPR) systems have become widely used in the safety, the security, and the commercial aspects. Forethought, several methods and techniques are computing to achieve the better levels in terms of accuracy and real time execution. This paper proposed a computer vision algorithm of Number Plate Localization (NPL) and Characters Segmentation (CS). In addition, it proposed an improved method in Optical Character Recognition (OCR) based on Deep Learning (DL) techniques. In order to identify the number of detected plate after NPL and CS steps, the Convolutional Neural Network (CNN) algorithm is proposed. A DL model is developed using four convolution layers, two layers of Maxpooling, and six layers of fully connected. The model was trained by number image database on the Jetson TX2 NVIDIA target. The accuracy result has achieved 95.84%.
Digital Article Identifier (DOI):
23
10011084
A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification
Abstract: Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.
Digital Article Identifier (DOI):
22
10011026
Author Profiling: Prediction of Learners’ Gender on a MOOC Platform Based on Learners’ Comments
Abstract: The more an educational system knows about a learner, the more personalised interaction it can provide, which leads to better learning. However, asking a learner directly is potentially disruptive, and often ignored by learners. Especially in the booming realm of MOOC Massive Online Learning platforms, only a very low percentage of users disclose demographic information about themselves. Thus, in this paper, we aim to predict learners’ demographic characteristics, by proposing an approach using linguistically motivated Deep Learning Architectures for Learner Profiling, particularly targeting gender prediction on a FutureLearn MOOC platform. Additionally, we tackle here the difficult problem of predicting the gender of learners based on their comments only – which are often available across MOOCs. The most common current approaches to text classification use the Long Short-Term Memory (LSTM) model, considering sentences as sequences. However, human language also has structures. In this research, rather than considering sentences as plain sequences, we hypothesise that higher semantic - and syntactic level sentence processing based on linguistics will render a richer representation. We thus evaluate, the traditional LSTM versus other bleeding edge models, which take into account syntactic structure, such as tree-structured LSTM, Stack-augmented Parser-Interpreter Neural Network (SPINN) and the Structure-Aware Tag Augmented model (SATA). Additionally, we explore using different word-level encoding functions. We have implemented these methods on Our MOOC dataset, which is the most performant one comparing with a public dataset on sentiment analysis that is further used as a cross-examining for the models' results.
Digital Article Identifier (DOI):
21
10011088
The Layout Analysis of Handwriting Characters and the Fusion of Multi-style Ancient Books’ Background
Abstract: Ancient books are significant culture inheritors and their background textures convey the potential history information. However, multi-style texture recovery of ancient books has received little attention. Restricted by insufficient ancient textures and complex handling process, the generation of ancient textures confronts with new challenges. For instance, training without sufficient data usually brings about overfitting or mode collapse, so some of the outputs are prone to be fake. Recently, image generation and style transfer based on deep learning are widely applied in computer vision. Breakthroughs within the field make it possible to conduct research upon multi-style texture recovery of ancient books. Under the circumstances, we proposed a network of layout analysis and image fusion system. Firstly, we trained models by using Deep Convolution Generative against Networks (DCGAN) to synthesize multi-style ancient textures; then, we analyzed layouts based on the Position Rearrangement (PR) algorithm that we proposed to adjust the layout structure of foreground content; at last, we realized our goal by fusing rearranged foreground texts and generated background. In experiments, diversified samples such as ancient Yi, Jurchen, Seal were selected as our training sets. Then, the performances of different fine-turning models were gradually improved by adjusting DCGAN model in parameters as well as structures. In order to evaluate the results scientifically, cross entropy loss function and Fréchet Inception Distance (FID) are selected to be our assessment criteria. Eventually, we got model M8 with lowest FID score. Compared with DCGAN model proposed by Radford at el., the FID score of M8 improved by 19.26%, enhancing the quality of the synthetic images profoundly.
Digital Article Identifier (DOI):
20
10010671
Performance Evaluation of Distributed Deep Learning Frameworks in Cloud Environment
Abstract: 2016 has become the year of the Artificial Intelligence explosion. AI technologies are getting more and more matured that most world well-known tech giants are making large investment to increase the capabilities in AI. Machine learning is the science of getting computers to act without being explicitly programmed, and deep learning is a subset of machine learning that uses deep neural network to train a machine to learn features directly from data. Deep learning realizes many machine learning applications which expand the field of AI. At the present time, deep learning frameworks have been widely deployed on servers for deep learning applications in both academia and industry. In training deep neural networks, there are many standard processes or algorithms, but the performance of different frameworks might be different. In this paper we evaluate the running performance of two state-of-the-art distributed deep learning frameworks that are running training calculation in parallel over multi GPU and multi nodes in our cloud environment. We evaluate the training performance of the frameworks with ResNet-50 convolutional neural network, and we analyze what factors that result in the performance among both distributed frameworks as well. Through the experimental analysis, we identify the overheads which could be further optimized. The main contribution is that the evaluation results provide further optimization directions in both performance tuning and algorithmic design.
Digital Article Identifier (DOI):
19
10010586
Foot Recognition Using Deep Learning for Knee Rehabilitation
Abstract: The use of foot recognition can be applied in many medical fields such as the gait pattern analysis and the knee exercises of patients in rehabilitation. Generally, a camera-based foot recognition system is intended to capture a patient image in a controlled room and background to recognize the foot in the limited views. However, this system can be inconvenient to monitor the knee exercises at home. In order to overcome these problems, this paper proposes to use the deep learning method using Convolutional Neural Networks (CNNs) for foot recognition. The results are compared with the traditional classification method using LBP and HOG features with kNN and SVM classifiers. According to the results, deep learning method provides better accuracy but with higher complexity to recognize the foot images from online databases than the traditional classification method.
Digital Article Identifier (DOI):
18
10010605
Classification Based on Deep Neural Cellular Automata Model
Abstract: Deep learning structure is a branch of machine learning science and greet achievement in research and applications. Cellular neural networks are regarded as array of nonlinear analog processors called cells connected in a way allowing parallel computations. The paper discusses how to use deep learning structure for representing neural cellular automata model. The proposed learning technique in cellular automata model will be examined from structure of deep learning. A deep automata neural cellular system modifies each neuron based on the behavior of the individual and its decision as a result of multi-level deep structure learning. The paper will present the architecture of the model and the results of simulation of approach are given. Results from the implementation enrich deep neural cellular automata system and shed a light on concept formulation of the model and the learning in it.
Digital Article Identifier (DOI):
17
10010623
Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion
Abstract: Tracking sports players is a widely challenging
scenario, specially in single-feed videos recorded in tight courts,
where cluttering and occlusions cannot be avoided. This paper
presents an analysis of several geometric and semantic visual features
to detect and track basketball players. An ablation study is carried
out and then used to remark that a robust tracker can be built with
Deep Learning features, without the need of extracting contextual
ones, such as proximity or color similarity, nor applying camera
stabilization techniques. The presented tracker consists of: (1) a
detection step, which uses a pretrained deep learning model to
estimate the players pose, followed by (2) a tracking step, which
leverages pose and semantic information from the output of a
convolutional layer in a VGG network. Its performance is analyzed
in terms of MOTA over a basketball dataset with more than 10k
instances.
Digital Article Identifier (DOI):
16
10010350
Deep Learning Based Fall Detection Using Simplified Human Posture
Abstract: Falls are one of the major causes of injury and death
among elderly people aged 65 and above. A support system to
identify such kind of abnormal activities have become extremely
important with the increase in ageing population. Pose estimation
is a challenging task and to add more to this, it is even more
challenging when pose estimations are performed on challenging
poses that may occur during fall. Location of the body provides a
clue where the person is at the time of fall. This paper presents
a vision-based tracking strategy where available joints are grouped
into three different feature points depending upon the section they are
located in the body. The three feature points derived from different
joints combinations represents the upper region or head region,
mid-region or torso and lower region or leg region. Tracking is always
challenging when a motion is involved. Hence the idea is to locate
the regions in the body in every frame and consider it as the tracking
strategy. Grouping these joints can be beneficial to achieve a stable
region for tracking. The location of the body parts provides a crucial
information to distinguish normal activities from falls.
Digital Article Identifier (DOI):
15
10010226
Vision-Based Collision Avoidance for Unmanned Aerial Vehicles by Recurrent Neural Networks
Abstract: Due to the sensor technology, video surveillance has become the main way for security control in every big city in the world. Surveillance is usually used by governments for intelligence gathering, the prevention of crime, the protection of a process, person, group or object, or the investigation of crime. Many surveillance systems based on computer vision technology have been developed in recent years. Moving target tracking is the most common task for Unmanned Aerial Vehicle (UAV) to find and track objects of interest in mobile aerial surveillance for civilian applications. The paper is focused on vision-based collision avoidance for UAVs by recurrent neural networks. First, images from cameras on UAV were fused based on deep convolutional neural network. Then, a recurrent neural network was constructed to obtain high-level image features for object tracking and extracting low-level image features for noise reducing. The system distributed the calculation of the whole system to local and cloud platform to efficiently perform object detection, tracking and collision avoidance based on multiple UAVs. The experiments on several challenging datasets showed that the proposed algorithm outperforms the state-of-the-art methods.
Digital Article Identifier (DOI):
14
10009593
Classification of Computer Generated Images from Photographic Images Using Convolutional Neural Networks
Abstract: This paper presents a deep-learning mechanism for classifying computer generated images and photographic images. The proposed method accounts for a convolutional layer capable of automatically learning correlation between neighbouring pixels. In the current form, Convolutional Neural Network (CNN) will learn features based on an image's content instead of the structural features of the image. The layer is particularly designed to subdue an image's content and robustly learn the sensor pattern noise features (usually inherited from image processing in a camera) as well as the statistical properties of images. The paper was assessed on latest natural and computer generated images, and it was concluded that it performs better than the current state of the art methods.
Digital Article Identifier (DOI):
13
10009710
MITOS-RCNN: Mitotic Figure Detection in Breast Cancer Histopathology Images Using Region Based Convolutional Neural Networks
Abstract: Studies estimate that there will be 266,120 new cases
of invasive breast cancer and 40,920 breast cancer induced deaths
in the year of 2018 alone. Despite the pervasiveness of this
affliction, the current process to obtain an accurate breast cancer
prognosis is tedious and time consuming. It usually requires a
trained pathologist to manually examine histopathological images and
identify the features that characterize various cancer severity levels.
We propose MITOS-RCNN: a region based convolutional neural
network (RCNN) geared for small object detection to accurately
grade one of the three factors that characterize tumor belligerence
described by the Nottingham Grading System: mitotic count. Other
computational approaches to mitotic figure counting and detection
do not demonstrate ample recall or precision to be clinically viable.
Our models outperformed all previous participants in the ICPR 2012
challenge, the AMIDA 2013 challenge and the MITOS-ATYPIA-14
challenge along with recently published works. Our model achieved
an F- measure score of 0.955, a 6.11% improvement in accuracy from
the most accurate of the previously proposed models.
Digital Article Identifier (DOI):