Various making use of data related to the

Various data mining
algorithms have been applied by astronomers in
like most of the different applications in astronomy.
But long-term researches and several mining projects
have  been made by experts in this field of data
mining making use of data related to the study of
astronomy because astronomy
has created numerous magnificent datasets
that are flexible to the approach along with
numerous other areas like as medicine and high energy science of physics.
Instances of some numerous projects are the SKICAT-Sky Image Cataloging
and Analysis System for catalog formation and analysis technique of the catalog from digitized sky surveys importantly the scans given by the second Palomar Observatory
Sky Survey; the JAR Tool- Jet Propulsion
Laboratory Adaptive Recognition Tool used for
recognition of volcanoes formed in over 30,000 images of
Venus which came by the Magellan mission; the following  and
more general Diamond
and the Lawrence Livermore National Laboratory Sapphire project work.  
Object classification  Classification is an crucial preliminary step
in the scientific method as it gives a way for arranging information
in a method that may be used to make
hypotheses and compare easily with models. The
two most useful concepts in object classification are
the completeness and the efficiency, also known as
recall and precision. They are generally defined in terms
of  true and false positives (TP and FP) and
true and false negatives (TN and FN). The completeness is the fraction of those objects that are in reality of a given typethat are  classified as that type: and the
efficiency is the fraction of objects generally classified as a given type that are genuinely of
that type These two quantities are interesting astrophysically because, while one requires both higher completeness and efficiency, there is most coomonly a tradeoff involved. The
paramount importance of each often the mostly depends on the application, for instance, an investigation of such rare objects
generally requires high completeness while allowing like some contamination (lower efficiency) but statistical
clustering of cosmological objects requires high
efficiency even at the cost of completeness.  
Star-Galaxy Separation  Due to the physical size in comparison to their distance
from us,most of the stars are unresolved in datasets
relating to photometry, and therefore appear
as point sources. Galaxies despite being further away, generally
subtend a larger angle and appear as extended
sources. However, other astrophysical objects such as quasars
and supernovae, are also seen as as point sources. Thus, the separation of photometric catalog into stars and numerous galaxies,
or more generally, stars, galaxies and other objects, is an important problem.
The number of galaxies and numerous stars in typical essential surveys (of the order
of 108 or above) requires that such separation must be automated.
This problem is a well studied one and automated approaches
were specifically employed before the current data mining
algorithms became famous, mostly for instance, during digitization done by the scanning of the various photographic plates by machines
such as the APM and DPOSS.Several data mining
algorithms have been applied, including ANN, DT,mixture
modelling and SOM with most algorithms achieving
over efficiency around 95%. Typically, this is performed using a set of measured morphological parameters
that are made from the survey photometry,
with perhaps colors or other information, such as the seeing. The advantage of  data mining general
approach is that all such information about each object is easily incorporated.  Galaxy Morphology Galaxies come in a range
of numerous sizes and shapes, or more collectively, morphology. The
most well-known system for the morphological classification of galaxies is the Hubble Sequence
of elliptical, spiral, barred
spiral, and irregular, along with various subclasses. This
system correlates to many physical properties known
to be crucial in
the formation and formation of galaxies. Because
galaxy morphology is a tough and complex phenomenon
that correlates to the underlying the subject of
physics, but is not unique to any one given process, the
Hubble sequence has shown, despite it being rather
subjective and based on visible-light 
morphology originally created from blue-biased photographic plates. The Hubble sequence has been extended in various other methods, and for data
mining purposes the T system has
been extensively taken into consideration. This system
maps the categorical Hubble types E, S0, Sa, Sb, Sc, Sd, and Irr onto the numerical values -5 to
10. One can train a supervised algorithm to allotT types to images for which measured parameters are made available. Such parameters can be completely morphological, or comprise of other information such as color. A
series of papers written by Lahav and collaborators do exactly the same, by applying ANNs to predict theT type of galaxies at low redshift, and finding
equal amount of the real accuracy to human experts. ANNs have also
been applied to higher redshift data to
distinguish between normal and unique galaxies and the fundamentally
topological and unsupervised SOM ANN has been used to
classify various galaxies from Hubble Space Telescope
images, where the initial distribution of various classes
is unknown. Likewise, ANNs have
been used to obtain the morphological types from galaxy spectra.
Photometric redshifts An area of astrophysics that has
greatly increased in popularity in the last few years is
the estimation of redshifts from photometric data
(photo-zs). This is because, although the distances are less
accurate than the ones obtained with spectra,
the sheer number of objects with photometric measurementscan often make up for the reduction in individual accuracy by suppressing the statistical
noise of an ensemble calculation. The two most common
approaches to photo-zs are the template method and the empirical training the
set method. The template approach has many difficult issues, comprising calibration, zero-points, priors, multi-wavelength performance (e.g., poor in
the mid-infrared), and difficulty handling missing or incomplete training data. We pay attention in
this review on the empirical approach, as it is an implementation of supervised
learning. 3.2.1. Galaxies At low redshifts, the calculation
of photometric redshifts for normal galaxies
is quite straightforward due to the break in the typical
galaxy spectrum at 4000A. Thus, as a galaxy is redshifted with
increasing distance, the color (measured as a difference in magnitudes) changes
relatively smoothly. As a result, both template and empirical
photo-z approaches obtain similar outcomes, a
root-mean-square deviation of ~ 0.02 in redshift,
which is near to the best possible result given
the intrinsic spread in the properties. This has been
shown with ANNs SVM
DT, kNN, empirical polynomial relations, numerous template-based studies, and
several other procedures. At
higher redshifts, acheiving accurate results becomes more
difficult because the 4000A break is shifted redward of the
optical, galaxies are fainter and thus spectral data
are sparser, and galaxies intrinsically evolve over
time. While supervised learning has been successfully used,
beyond the spectral regime the obvious limitation arises
that in order to reach the limiting magnitude of the photometric portions of
surveys, extrapolation would be required. In this regime, or where only small
training sets are available, template-based results can be used, but without
spectral information, the templates themselves are being extrapolated. However,
the extrapolation of the templates is being done in a more
physically motivated
manner. It is likely that the more general
hybrid method of using empirical data to iteratively
improve the templates or the semi-supervised procedure described in will
ultimately provide a more elegant solution. Another issue at higher redshift is
that the available numbers of objects can become quite small (in the hundreds or lesser),thus reintroducing the curse of dimensionality by a
simple lack of objects in comparison to measured wavebands. The methods
of dimension reduction can help to mitigate this
effectVarious data mining algorithms have been applied
by astronomers in like most of the
different applications in astronomy.
But long-term researches and several mining projects
have  been made by experts in this field of data
mining making use of data related to the study of
astronomy because astronomy
has created numerous magnificent datasets
that are flexible to the approach along with
numerous other areas like as medicine and high
energy physics. Instances of such numerous
projects are the SKICAT-Sky Image Cataloging and
Analysis System for catalog production and analysis of the catalog
from digitized sky surveys particularly the scans given by the
second Palomar Observatory
Sky Survey; the JAR Tool- Jet Propulsion
Laboratory Adaptive Recognition Tool used for
recognition of volcanoes formed in over 30,000 images of
Venus which came by the Magellan mission; the following  and
more general Diamond and the Lawrence Livermore National Laboratory Sapphire project
work.   Object classification  Classification
is an crucial preliminary step
in the scientific method as it provides a way for
arranging information in a method that may be used to
make hypotheses and compare easily with models. The
two most useful concepts in object classification are
the completeness and the efficiency, also known as
recall and precision. They are generally defined in terms
of  true and false positives (TP and FP) and
true and false negatives (TN and FN). The
completeness is the fraction of those objects that
are in reality of a given type that are
 classified as that type: and the
efficiency is the fraction of objects generally classified
as a given type that are truly of that type These two quantities are interesting astrophysically
because, while one wants both higher completeness and
efficiency, there is
mostly a tradeoff involved. The importance of each
often mostly depends on the application, for instance,
an investigation of such rare objects generally
requires higher amount of completeness while allowing some contamination (lower efficiency) but statistical clustering of numerous cosmological objects requires high efficiency even at the cost of completeness.  
Star-Galaxy Separation  Due to their physical
size in comparison to their distance from us,
almost all the stars are unresolved in photometric
datasets, and therefore appear as point sources. Galaxies despite being further
away, generally subtend a larger angle and appear as extended sources. However, other astrophysical
objects such as quasars and supernovae, are also seen
as as point sources. Thus, the separation of
photometric catalog into stars and galaxies, or more generally,
stars, galaxies and other objects, is an important problem. The
number of galaxies and stars in
typical surveys (of order 108 or above) requires that such separation should be automated. This problem is a well studied one and automated approaches were employed before current data mining algorithms became famous, for instance, during digitization
done by the scanning of the various
photographic plates by machines such as the APM and DPOSS.Several data mining algorithms have been applied, including ANN,DT,mixture modelling and SOM with most algorithms achieving over efficiency around 95%. Typically, this is performed using a set of measured
morphological parameters that are made
from the survey photometry,
with perhaps colors or other information,
such as the seeing. The advantage of  data mining approach is that all
such information about each object is easily used.  Galaxy Morphology Galaxies come in a range of numerous sizes and shapes,
or more collectively, morphology. The most well-known
system for the morphological classification of galaxies is the Hubble Sequence of elliptical, spiral, barred spiral,
and irregular, along with various different  subclasses. This
system correlates to many physical properties known
to be crucial in
the formation and formation of galaxies. Because galaxy morphology
is a tough and complex phenomenon that correlates to the underlying the
subject of physics, but isnot unique to any one given process, the Hubble sequence has
shown, despite it being rather subjective and based on
visible-light  morphology originally created from blue-biased
photographic plates. The Hubble sequence has been extended in various other methods, and for data mining
purposes the T system has been extensively taken into consideration. This system
maps the categorical Hubble types E, S0, Sa, Sb, Sc,
Sd, and Irr onto
the numerical values -5 to 10. One can train a supervised algorithm to allot
T types to images for which measured
parameters are made available. Such parameters
can be completely morphological, or comprise of other information such
as color. A series of papers written by Lahav and collaborators do exactly the same, by applying ANNs to predict the T type
of galaxies at low redshift, and finding equal amount of accuracy to
human experts. ANNs have also
been applied to higher redshift data to distinguish between
normal and unique galaxies and the fundamentally topological
and unsupervised SOM ANN has been used to classify
various galaxies from Hubble Space Telescope images, where the
initial distribution of various classes is unknown. Likewise,ANNs have been used to obtain the morphological types from galaxy spectra.
Photometric redshifts An area of astrophysics that has
greatly increased in popularity in the last few years is
the estimation of redshifts from photometric data (photo-zs). This is because, although the different
distances are less accurate than the ones obtained with spectra, the sheer number
of objects with photometric measurements can often make up for the reduction in
individual accuracy by suppressing the statistical
noise of an ensemble calculation. The two common approaches to photo-zs
are the template method and the empirical training the set method. The template approach has many difficult
issues, including calibration, zero-points, priors, multi-wavelength performance (e.g., poor in the mid-infrared), and difficulty
handling missing or incomplete training data. We focus in this review on the empirical approach, as it is an implementation of supervised learning.
3.2.1. Galaxies At low redshifts, the calculation of photometric redshifts for normal galaxies
is quite straightforward due to the break in the typical
galaxy spectrum at 4000A. Thus, as a galaxy is redshifted with
increasing distance, the color (measured as a difference in magnitudes) changes relatively smoothly. As a result, both template and empirical
photo-z approaches obtain similar outcomes, a
root-mean-square deviation of ~ 0.02 in redshift, which is
near to the best possible result given the intrinsic spread in the
properties. This
has been shown with ANNs SVM
DT, kNN, empirical polynomial relations, numerous template-based studies, and
several other procedures. At
higher redshifts, acheiving accurate results becomes more tough  because the 4000A break
is shifted redward of the optical, galaxies are fainter and thus spectral data
are sparser, and galaxies intrinsically evolve over time. While supervised learning has
been successfully used, beyond the
spectral regime the obvious limitation arises that in order to
reach the limiting magnitude of the photometric portions of surveys, extrapolation would be required. In this regime, or
where only small training sets are available, template-based results can be
used, but without spectral information, the templates themselves are being
extrapolated. However, the extrapolation of the templates is being
done in a more physically motivated
manner. It is likely that the more general
hybrid method of
using empirical data to iteratively improve the
templates or the semi-supervised procedure
described in will ultimately provide a more elegant solution. Another
issue at higher redshift is that the available numbers of objects can become
quite small (in the hundreds or fewer), thus reintroducing the curse
of dimensionality by a simple lack of objects in
comparison to measured wavebands. The methods of dimension
reduction can help to mitigate this effect 

x

Hi!
I'm Mack!

Would you like to get a custom essay? How about receiving a customized one?

Check it out