The Computational Systems Biology Lab is
interested in developing statistical machine learning
techniques and multimedia/ multimodal human/computer interfaces
to advance biological and biomedical research. One of
Dr. Hong's main research activities is the development
of new computational methods for dissecting signal transduction
networks by integrating heterogeneous biological data-for
example, cellular images, transcriptional profiles, bioliterature,
and so on. He is also aware of the fact that the quality
of computational results can be far from ideal and require
nontrivial manual examination. However, biologists often
feel overwhelmed by the huge amount, and the great diversity,
of biological information. Therefore, Dr. Hong's laboratory
is developing novel human/computer interfaces to help
biologists effectively and efficiently navigate through
the complicated landscape of biomedical information and
manipulate various computational tools.
In his talk, Dr. Hong demonstrated his lab's
recent progress on a visual data exploration interface
for large cellular image databases and a technique for
bioliterature categorization. First, using image- processing
techniques his lab developed, they extract quantitative
information from high-content screening images that are
rich in cellular phenotypic information. The extracted
information is analyzed by unsupervised pattern-discovering
methods. The results can then be visualized in their visual
data exploration interfaces. This allows biologists to
be directly involved in the data-mining process-combining
the flexibility, creativity, and general knowledge of
the human race with the enormous storage capacity and
the computational power of computers. This process is
especially useful when little is known about the data
and the exploration goals are vague. Second, they applied
Bayesian networks to automatically associate PubMed abstracts
with Gene Ontology terms so that the annotated abstracts
can be searched semantically. This research is particularly
useful for interpreting data generated by high-throughput
technologies such as the DNA microarray.