Analyzing Exoplanet Data


One of the newest and most interesting areas of astrophysics is the search and study of extrasolar planets. However, the idea of other planets beyond our own solar system is not a new one. As early as the sixtieth century several scientists and philosophers suspected that planets beyond our own solar system existed, but there was no possible way from them to prove that at the time. For example, the Italian philosopher Giordano Bruno suggested that fixed stars in the night sky are similar to our sun and also supported planets. This view was later supported by Sir Issac Newton in his Philosophiæ Naturalis Principia Mathematica in the eighteenth century when he derived his laws of gravity.

The current official working definition¹ of an exoplanet as defined by the International Astronomical Union (IAU) is an object with a mass below the threshold for the fusion of deuterium (hydrogen with one extra neutron) that orbits a star, brown dwarf, or stellar remnant. For reference, the mass limit of nuclear fusion ignition is approximately 13 Jupiter masses for sun-like objects. Additionally, the minimum mass requirement for an exoplanet should be the same as that used to define planets in our Solar System. Interestingly enough, the requirement for a planet to clear its orbital path is not in the current official working definition of exoplanets. This is the same addition to the definition of a planet in our own solar system that led to the demotion of Pluto to dwarf planetary status in 2006².

The first solid evidence of the existence an exoplanet came in 1917 from Dr. Walter Sydney Adams who produced a peculiar spectrum of van Maanen’s Star³. Today it is thought that the spectrum he measured could have been caused by the debris of a nearby exoplanet that had fallen into the star after it had been crushed by its gravity. The first confirmed exoplanet detection⁴ was in 1992 when it was discovered that several terrestrial sized planets orbiting pulsar PSR B1257+12 were disrupting its pulsation period every 25 and 67 days. Pulsars are neutron stars that give off regular radio frequency pulses. This discovery was surprising to the astronomical community because it was expected that exoplanets would only be found around main-sequence stars similar to our own sun. An exoplanet around a main-sequence star wasn’t confirmed until 1995 by Didier Queloz⁵ and Michel Mayor who found a giant planet in a very short four-day orbit around the star 51 Pegasi, a sun-like star only 50 light-years from Earth.

The most popular methods of detecting exoplanets are Transit photometry and Doppler spectroscopy. Transit photometry discovers planets by analyzing the light curve of its host star. If a planet passes in front of the star during its orbit the brightness of the star will dim during that time and will repeat the next time the planet completes its orbit around its host star. Doppler spectroscopy detects planets by using radial-velocity measurements of the doppler shifts in the spectrum of the planet’s host star. The spectrum of star will exhibit slight doppler shifts toward and away from us at a regular interval as a planet orbits it’s parent star. A caveat of these two methods are that they are biased in the sense that it is only reliable at detecting planets close to the host star.

Some other methods of detecting exoplanets are direct imaging, microlensing, and timing. The first method is self-explanatory, it is directly imaging an exoplanet in the optical or infrared. This method has mostly been used on imaging systems very close to use with planets that are very large. The second method is microlensing, which detects exoplanets by looking for tiny distortions in spacetime (gravitational lensing). Lastly, timing is method used to find variations in the timing of a transit. In systems with a high density of planets, the gravitational pull among each of the planets can cause acceleration and deceleration of the planets along their orbital paths. All methods discussed here are new very sensitive methods. As such there are only, a few detections of exoplanets using these other three methods. Only 3% of all known exoplanets were discovered using these techniques.

According to the Extrasolar Planets Encyclopaedia⁶, there are currently 4,414 confirmed exoplanets that are in 3,257 star systems. Of those star systems, there are 722 systems that have more than one planet. Every year we have an increasing number of exoplanet discoveries, and it is exciting to see what we might find next on the cutting edge of science. Perhaps life outside of our own solar system.

In this data story I plan to explore the currently available exoplanet data to answer the following questions posed in order of complexity:

  1. How are the planets distributed on the sky?
  2. Can we observe Kepler’s Third Law? Is it different for different discovery methods?
  3. Why do we observe so many Hot-Jupiters?
  4. Of all the exoplanets discovered to date, what type of stars do the planets typically orbit?
  5. How many planets are located in the habitable zone? Are any of them like Earth?

The Data

The data that I will be working with is from the Open Exoplanet Catalogue, which contains information on all discovered extra-solar planets and is updated very regularly. The data is available on this GitHub Repository and is redistributable under the MIT license. This means that the data is open for use and redistribution provided that proper credit is given. That dataset is maintained by Hanno Rein from the Institute for Advanced Study at Princeton.

The dataset contains information on the following variables:

  • name — Primary identifier of the planet.
  • binaryflag — Binary flag [0=no known stellar binary companion; 1=P-type binary (circumbinary); 2=S-type binary; 3=orphan planet (no star)].
  • mass — Planetary mass in units of Jupiter masses.
  • radius — Planetary radius in units of Jupiter radii.
  • period — Orbital period in days.
  • semimajoraxis — Semi-major axis of the planet’s orbit in Astronomical Units (AU).
  • eccentricity — Orbital eccentricity.
  • periastron — Periastron (degrees).
  • longitude — Longitude (degrees).
  • ascendingnode — Ascending node (degrees).
  • inclination — Inclination (degrees).
  • temperature — Surface or equilibrium temperature of the planet in kelvin.
  • age — The estimated age of the planet age in Gyr.
  • discoverymethod — Discovery method.
  • discoveryyear — Discovery year (yyyy).
  • lastupdate — Last updated (yy/mm/dd).
  • system_rightascension — Right ascension in the (hh-mm-ss) format.
  • system_declination — Declination in the (+/-dd mm ss) format.
  • system_distance — Distance from Sun in parsecs.
  • hoststar_mass — Host star mass in units of solar masses.
  • hoststar_radius — Host star radius in units of solar radii.
  • hoststar_metallicity — Host star metallicity (log relative to solar).
  • hoststar_temperature — Host star temperature in Kelvin.
  • hoststar_age — Host star age in Gyrs.
  • list — A list of lists the planet is on.

The dataset was already very clean. So no other processing was required here. I did however have to convert the provided coordinates into degrees using Python’s astropy package. I also did some other unit conversions where relevant.

How are the planets distributed on the sky?

I first wanted to start off my exploratory analysis by looking at a general overview of the overall data. I was curious if there any particular region of the sky that most of the planets were being discovered in. If any if the planets were discovered close to star forming regions.

I used astropy to parse the given ICRS coordinates and then used matplotlib to create a plot of the host stars using the given right ascension and declination coordinates using a mollweide projection to visualize how those systems would appear on the sky.

The distribution of the exoplanet host stars as viewed on the night sky.

In general, the stars are distributed isotopically across the sky. However, there are a couple interesting clusters. One is located around 19h and +45 degrees. At first I thought this may be a globular cluster. But globular clusters are far too dense with stars to be good place to hunt for planets as it would be too bright. This may be the open cluster NGC 679, located at 119h 20.9m and +37° 46′. There are also several images from the Kepler Space Telescope of the cluster. The Kepler Space Telescope is one of several telescopes that has been used to hunt for exoplanets.

An image of the great circles projected on the celestial sphere. Image Source.

We also see clusters of stars that follow an S-shaped that stars from the left side of the plot all the way over to the right side of the plot. This seems to follow the ecliptic line, this is the line formed when projecting Earth’s orbit onto the celestial sphere. The maxima and minima nodes represent the vernal and autumnal equinoxes. So, it makes sense why we would find many exoplanets located along this line.

Histogram of the number of exoplanets discovered each year.

Above is a histogram of the number of exoplanets found each year. We can see that very few planets were found before 2010, and it wasn’t until 2014, 5 years after the launch of the Kepler Space Telescope in 2009 that we start to see a huge increase in the number of exoplanets discovered. The two spikes we see at 2014 and 2016 are due to the release of the data collect by the Kepler space telescope.

Distance distribution of the exoplanets discovered to date.

I also examined the distance distribution of how many planets were discovered at various distances from Earth. Unsurprisingly, most of the planets found so far are very close to our own solar system as they would be easier to detect.

A histogram of the orbital period of exoplanets in Earth days.

Most of the planets that have been discovered (almost 50%) have very short orbital periods, and by extension orbit very small semi-major axes. We will return to the reason why we find so many planets in this manner later.

Kepler’s Third Law

My second topic of investigation aims to examine Kepler’s Third Law. German astronomer Johannes Kepler derived three different laws of planetary motion during his analysis of observations of the seven known planets made by Danish astronomer Tycho Brahe in the 1600s. Kepler’s Third Law states that the square of the planet’s period is directly proportional to the cubes of the planet’s semi-major axis from the sun.

Kepler’s Laws were written long before we knew of the existence of exoplanets. So, a natural question to ask is whether Kepler’s Laws hold in other systems, or if our solar system is somehow unique this way. I investigated this by plotting the semi-major axis of the planet’s orbit in astronomical units (AU) to the period of the planet’s orbit in days. An astronomical unit is equal to the distance from the Earth to the Sun. The resulting graph created in Tableau is shown below.

A plot demonstrating Kepler’s Third Law. The x axis is the semi-major axis of the planets orbit in astronomical units and the y axis is of the orbital period of the planet. Colors and shapes denote the method used to discover the planet. The plot is on a log-log scale.
Orbital Period of the planet plotted as a function of the semi-major axis of its orbit. The data is group by color and shape to distinguish the method that each planet was discovered.

The immediate plot itself isn’t too surprising. Kepler’s Third law is a power law and is a straight line as viewed in log space, and as anticipated we find that Kepler’s law does indeed hold. However, what’s more interesting in this plot is that there is a noticeable divide in where most of the currently known exoplanets are on the graph. Most of the planets nearest to their host star were discovered using the orbital transit method of detection. While planets that are farther away from their parent star are detected by the radial velocity (doppler spectroscopy) method. This is an interesting topic that I will return to in the next section.

This plot demonstrating Kepler’s Third Law is important because Kepler’s Third Law is the primary method used to determine the distance between an exoplanet and its host star. Moreover, this can also be used to determine if the planet is inside the habitable zone. I will also analyze this point later.

Bias of Detecting Hot-Jupiters

An interesting type of exoplanet is what is known as a Hot Jupiter. Hot Jupiter’s are simply gas giant exoplanets that are similar to the size and mass of Jupiter, but are found to orbit very close to their parent star with very short orbital periods of around ten days. Being so close to their host star they would receive a lot of the star’s thermal radiation, making them hot. For context, these planets are closer than Mercury is to our own Sun, usually about 1/6 of the orbital distance of Mercury. These types of exoplanets have the highest number of discoveries to date. This is perplexing because this goes contrary to what we find in our own solar system, where small terrestrial planets orbit closer to the Sun, with the gas giants orbiting much further out. This puts into question how common each type of planetary distribution is in the universe. I wanted to find out if there is a bias in the type of planets we find based on the methods we use to discover exoplanets. If there is indeed a bias, then we should be able to see different clustering of the planets grouped via their method of detection and possibly other differences in their properties.

To investigate this question, I created another scatter plot in Tableau of the mass of the planet. I then create a new field in tableau that is the mass of the planet multiplied by the sin of the planet’s orbital inclination. The plot places both axes on the logarithmic scale, and I use different shapes and colours to distinguish between different the discovery methods.

A plot of the mass of the planet compared to the semi-major axis of the planet’s orbit for different detection methods.

In the plot, we can see that there are two main clusters of planets. The green diamonds represent planets that were discovered using orbital transits, and the red crosses represent planets detected using the radial velocity method. On the plot, the green diamonds are clustered around the mass of Jupiter at very small semi-major axis of around 0.05 AU. This is where most of the Hot Jupiter’s are being found. In the red cross cluster, we also see a bunch of Jupiter mass objects but are located much further out where we would typically expect gas giants to be found (5–10 AUs). Additionally, we see some smaller sized planets detected using both methods. Despite this, there is a clear disparity between what planets each method detects.

I wanted to explore the reasoning behind this further. So, I made a similar plot to the one above, but this time I group the data by the eccentricity of the planets orbit. Eccentricity describes how close the orbit is to a circle or an ellipse. An eccentricity of 0 is a perfect circle while an eccentricity of 1 is a very elongated ellipse.

A plot of the mass of the planet compared to the semi-major axis of the planet’s orbit. The size of the circle represents the eccentricity of the planets orbit.

In the new plot we can see that most planets discovered using the radial velocity (doppler spectroscopy) method have highly eccentric orbits. With planets discovered using the orbital transit method are usually circular. Therefore, it seems that the current methods used to detect exoplanets are biased towards what planets those methods will end up finding.

Exoplanet Host Stars

For my next question of investigation, I wanted to know what type of stars the currently detected exoplanets orbit. To answer this question, it is useful to know a bit of background on the introductory properties of stars.

A blackbody is an idealized object that is a perfect absorber and reflector of radiation at all wavelengths. The radiation emitted by these objects is thus known as blackbody radiation. Similarly, stars are very dense objects consisting of ionized plasma that is opaque to most forms of radiation and emits radiation isotopically at all wavelengths. The spectrum of a blackbody for a given temperature is defined by the Planck function and each spectra will have its will have its own color depending on its temperature. Therefore, stars are often modelled to behave approximately as black bodies, with some notable exceptions (i.e. neutron stars, brown dwarfs, etc).

An example of the blackbody color temperature scale (in kelvin). Source: Wikimedia Commons.
Blackbody spectra for various temperatures. The dashed lines represent the portion of the electromagnetic spectrum that corresponds to visible light spectrum.

I approximated the spectrum of the star using a black body model with the stars reported surface temperature as input for the model. From the resulting spectrum I then compute the approximate total luminosity of the star by multiplying the integrated area of the spectrum by the stars square of the star’s radius and 4 pi. I did this with astropy’s black body modelling function.

Using the resulting total luminosity and the reported star’s mass from the dataset I can create a Hertzsprung–Russell diagram (HR) diagram of the stars as shown below. The HR diagram shows a relationship between a star’s luminosity and its surface temperature, also known as effective temperature. In the HR diagram, we can infer many qualities of a star such as the star’s size, mass, type of star, and where the star is on it’s evolutionary path. From the definition of luminosity, we know that the surface temperature of a star is proportional to the it’s luminosity to the power of 4 for a given radius. So, I overlay dashed lines that represent constant stellar radius to help provide a sense of scale.

A Hertzsprung–Russell diagram of the exoplanet parent stars (red dots). The black star represents where our Sun is on the HR diagram. Dashed lines represent constant size/radius.

In the graph above we can see that most stars approximately follow the main sequence line. This comes as no surprise for several reasons, first of all because stars spend most of their life in the main sequence phase, which can last from 5 to a 100 billion years depending on the star. A star that remains stable in the main sequence longer would be ideal for the conditions of life to emerge and evolve. Stars with masses less than or equal to our Sun account for about 89% of all the stars in the galaxy whereas stars with masses greater than 8 solar masses account for less than 1% of them.

We also see a fair number of sub-giants and giants in the top middle to upper right corner of the plot. These stars likely account for some of the Jupiter sized planets with fast orbital periods we saw earlier. Moving downwards we can see some red dwarf stars towards the tail of the main sequence stars. If you look carefully, you will also see one data point in the lower middle of the plot that might be a white dwarf star.

The model that I used to compute the stellar luminosity of the host star is relatively simple, but it gives a good approximation of what kind of stars we are dealing with in the dataset. More accurate models would include corrections for interstellar extinction and other factors. This modelling would require more computational time that I unfortunately do not have access to.

There are a couple of outliers to the far left not shown on the plot that might be explained away by the limitations of this model. Another possible explanation for this is that those are exotic stars such as neutron stars that don’t follow a blackbody spectrum.

Planet Habitability

For my final and most exciting analysis question, I wanted to know how many exoplanets approximately fall into the habitable zone. The habitable zone was first defined in 1953 by Richard Hugget and is defined to be the range of distances from a star where conditions for the formation of life, particularly liquid water. This distance represents the range where the amount of radiation received by planet from the host star is not too hot such that water boils and evaporates out into space, and not too cold such that water freezes.

To answer this question, I had to rely on the stellar luminosity calculations I had computed in the previous section using the simple blackbody model. To compute the inner and outer bounds of the habitable zone, I then referred to a paper by in 1993 by James Kasting, Daniel Whitmire, and Ray Reynolds. I approximated the inner and outer radius of the stars habitable zones using the following equations.

Equation to compute the inner boundary of the habitable zone.


Equation to compute the outer boundary of the habitable zone

where L is the luminosity of the star.

I then computed the difference in the distance that each planet is from the center of the habitable zone around that particular star. All of this was done in Python and I created and plot in matplotlib of the mass of the planet versus the distance from the center of the habitable zone in AUs. The results are shown below.

A plot of the mass of the planet (log scale) in relation to the distance from the center of the habitable zone. The horizontal dashed lines represent the mass of Earth and Jupiter. The dashed vertical lines indicate a distance of 1 AU away from the center of the habitability zone in AUs. The red circle indicates the areas where possible Earth are sized planets that reside in the habitable zone.

From the resulting plot, we can see that most of the planets are of Jupiter mass that are at varying points from the center of the habitable zone. However we see . Planets that are between 0.75 and 10 times the mass of Earth that lie inside the habitable zone are widely considered to be ideal for life. I found that 188 planets were found to fit this criteria. But only 73 of those lie at the center of the habitable zone. This is actually reasonably close to the true number of possible earth analogs mentioned in the literature⁹ which is currently 24. However, this estimate is still off by an order of magnitude and is subject to error as the equations above are only an approximation.


In conclusion, we have developed a sense of the big picture of the data set and what we currently know about exoplanets, where they are found, how often they are discovered, and how far away they are. I found that the laws of Planetary motion that were derived centuries ago still hold up very well today. I also found that there appears to be a bias in the planets that each of our detection methods use. With our present technology Jupiter sized planets are the easiest for us to detect and are usually found using the transit method and close to their host stars. The method of radial velocity seems to excel at detecting Jupiter sized planets farther away from their host stars. Most of the stars with exoplanets are main-sequence stars that are similar to our own sun. We also identified a few candidates of Earth-size planets that could lie inside the habitable zone of their star with the possibly that they could support liquid water, one of the building blocks of life.

Ideally, I would use more complete and thorough models to determine both the properties of the exoplanet’s host stars as well as the habitable zones around each of those stars. The approximations I used are only good for main sequence stars and thus likely have an uncertainty that is larger than we would desire as scientists. Other affects such as interstellar extinction, and solar winds and conditions of the stellar atmosphere need to be accounted for as well. I would also like to have spent time modeling various planetary atmospheric conditions to simulate what possible conditions might exist on those candidates that may be similar to Earth.


[1] “IAU 2006 General Assembly: Result of the IAU Resolution votes”. 2006.

[2] “Official Working Definition of an Exoplanet”. IAU position statement.

[3] Landau, Elizabeth (2017). “Overlooked Treasure: The First Evidence of Exoplanets”. NASA.

[4] Wolszczan, A.; Frail, D. A. (1992). “A planetary system around the millisecond pulsar PSR1257 + 12”. Nature. 355 (6356): 145–147.

[5] Mayor, Michael; Queloz, Didier (1995). “A Jupiter-mass companion to a solar-type star”. Nature. 378 (6555): 355–359.

[6] Schneider, J. “Interactive Extra-solar Planets Catalog”. The Extrasolar Planets Encyclopedia.

[7] Kasting, James; Whitmire, Daniel; and Reynolds, Ray (1993). Habitable zones around main sequence stars. Icarus 101: 108–128.

[8] Whitmire, Daniel; Reynolds, Ray, (1996). Circumstellar habitable zones: astronomical considerations. In: Doyle, Laurence (ed.). Circumstellar Habitable Zones, 117–142. Travis House Publications, Menlo Park.

[9] Schulze-Makuch, Dirk; Heller, Rene; Guinan, Edward (18 September 2020). “In Search for a Planet Better than Earth: Top Contenders for a Superhabitable World”. Astrobiology.

BSc Astrophysics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store