8

8

436

A Systems Approach to Gene Ranking from DNA Microarray Data of Cervical Cancer

In this paper we present a method for gene ranking
from DNA microarray data. More precisely, we calculate the correlation
networks, which are unweighted and undirected graphs, from
microarray data of cervical cancer whereas each network represents
a tissue of a certain tumor stage and each node in the network
represents a gene. From these networks we extract one tree for
each gene by a local decomposition of the correlation network. The
interpretation of a tree is that it represents the n-nearest neighbor
genes on the n-th level of a tree, measured by the Dijkstra distance,
and, hence, gives the local embedding of a gene within the correlation
network. For the obtained trees we measure the pairwise similarity
between trees rooted by the same gene from normal to cancerous
tissues. This evaluates the modification of the tree topology due to
progression of the tumor. Finally, we rank the obtained similarity
values from all tissue comparisons and select the top ranked genes.
For these genes the local neighborhood in the correlation networks
changes most between normal and cancerous tissues. As a result
we find that the top ranked genes are candidates suspected to be
involved in tumor growth and, hence, indicates that our method
captures essential information from the underlying DNA microarray
data of cervical cancer.

Graph similarity, DNA microarray data, cancer.

7

4875

Ranking Genes from DNA Microarray Data of Cervical Cancer by a local Tree Comparison

The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to tumor progression. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth. This indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Graph similarity, generalized trees, graph alignment, DNA microarray data, cervical cancer.

6

6825

Influence of Noise on the Inference of Dynamic Bayesian Networks from Short Time Series

In this paper we investigate the influence of external
noise on the inference of network structures. The purpose of our
simulations is to gain insights in the experimental design of microarray
experiments to infer, e.g., transcription regulatory networks
from microarray experiments. Here external noise means, that the
dynamics of the system under investigation, e.g., temporal changes of
mRNA concentration, is affected by measurement errors. Additionally
to external noise another problem occurs in the context of microarray
experiments. Practically, it is not possible to monitor the mRNA
concentration over an arbitrary long time period as demanded by the
statistical methods used to learn the underlying network structure. For
this reason, we use only short time series to make our simulations
more biologically plausible.

Dynamic Bayesian networks, structure learning, gene networks, Markov chain Monte Carlo, microarray data.

5

10653

Towards Clustering of Web-based Document Structures

Methods for organizing web data into groups in order
to analyze web-based hypertext data and facilitate data availability
are very important in terms of the number of documents available
online. Thereby, the task of clustering web-based document structures
has many applications, e.g., improving information retrieval on the
web, better understanding of user navigation behavior, improving web
users requests servicing, and increasing web information accessibility.
In this paper we investigate a new approach for clustering web-based
hypertexts on the basis of their graph structures. The hypertexts will
be represented as so called generalized trees which are more general
than usual directed rooted trees, e.g., DOM-Trees. As a important
preprocessing step we measure the structural similarity between the
generalized trees on the basis of a similarity measure d. Then,
we apply agglomerative clustering to the obtained similarity matrix
in order to create clusters of hypertext graph patterns representing
navigation structures. In the present paper we will run our approach
on a data set of hypertext structures and obtain good results in
Web Structure Mining. Furthermore we outline the application of
our approach in Web Usage Mining as future work.

Clustering methods, graph-based patterns, graph similarity,
hypertext structures, web structure mining

4

11557

Protein Graph Partitioning by Mutually Maximization of cycle-distributions

The classification of the protein structure is commonly
not performed for the whole protein but for structural domains, i.e.,
compact functional units preserved during evolution. Hence, a first
step to a protein structure classification is the separation of the
protein into its domains. We approach the problem of protein domain
identification by proposing a novel graph theoretical algorithm. We
represent the protein structure as an undirected, unweighted and
unlabeled graph which nodes correspond the secondary structure
elements of the protein. This graph is call the protein graph. The
domains are then identified as partitions of the graph corresponding
to vertices sets obtained by the maximization of an objective function,
which mutually maximizes the cycle distributions found in the
partitions of the graph. Our algorithm does not utilize any other kind
of information besides the cycle-distribution to find the partitions. If
a partition is found, the algorithm is iteratively applied to each of
the resulting subgraphs. As stop criterion, we calculate numerically
a significance level which indicates the stability of the predicted
partition against a random rewiring of the protein graph. Hence,
our algorithm terminates automatically its iterative application. We
present results for one and two domain proteins and compare our
results with the manually assigned domains by the SCOP database
and differences are discussed.

Graph partitioning, unweighted graph, protein domains.

3

15299

Application of a Similarity Measure for Graphs to Web-based Document Structures

Due to the tremendous amount of information provided
by the World Wide Web (WWW) developing methods for mining
the structure of web-based documents is of considerable interest. In
this paper we present a similarity measure for graphs representing
web-based hypertext structures. Our similarity measure is mainly
based on a novel representation of a graph as linear integer strings,
whose components represent structural properties of the graph. The
similarity of two graphs is then defined as the optimal alignment of
the underlying property strings. In this paper we apply the well known
technique of sequence alignments for solving a novel and challenging
problem: Measuring the structural similarity of generalized trees.
In other words: We first transform our graphs considered as high
dimensional objects in linear structures. Then we derive similarity
values from the alignments of the property strings in order to
measure the structural similarity of generalized trees. Hence, we
transform a graph similarity problem to a string similarity problem for
developing a efficient graph similarity measure. We demonstrate that
our similarity measure captures important structural information by
applying it to two different test sets consisting of graphs representing
web-based document structures.

Graph similarity, hierarchical and directed graphs,hypertext, generalized trees, web structure mining.

2

15602

First Studies of the Influence of Single Gene Perturbations on the Inference of Genetic Networks

Inferring the network structure from time series data
is a hard problem, especially if the time series is short and noisy.
DNA microarray is a technology allowing to monitor the mRNA
concentration of thousands of genes simultaneously that produces
data of these characteristics. In this study we try to investigate the
influence of the experimental design on the quality of the result.
More precisely, we investigate the influence of two different types of
random single gene perturbations on the inference of genetic networks
from time series data. To obtain an objective quality measure for
this influence we simulate gene expression values with a biologically
plausible model of a known network structure. Within this framework
we study the influence of single gene knock-outs in opposite to
linearly controlled expression for single genes on the quality of the
infered network structure.

Dynamic Bayesian networks, microarray data, structure learning, Markov chain Monte Carlo.

1

15928

Measuring the Structural Similarity of Web-based Documents: A Novel Approach

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.

Graph similarity, hierarchical and directed graphs, hypertext, generalized trees, web structure mining.