Sunday, June 16, 2019

Airport Distances

From Python Programmer's (Gile's) youtube video titled "Can you LEARN DATA SCIENCE for FREE? YES! I'll show you HOW!", I discovered an online book about Python by Christian Hill.
In Chapter 6, there was a coding problem on the topic of airport distances. The problem gives a data set of airports, their location, their latitudes and their longitudes. I had to write code to find the distance between two airports based on their latitudes and longitudes.

The Importance of Reading the Question
Here is the problem, word for word:


The file busiest_airports.txt provides details of the 30 busiest airports in the world in 2014. The tab-delimited fields are: three-letter IATA code, airport name, airport location, latitude and longitude (both in degrees).
Write a program to determine the distance between two airports identified by their three-letter IATA code, using the Haversine formula (see, for example, Exercise 4.4.2) and assuming a spherical Earth of radius 6378.1 km).

I spent a lot of time parsing through the data set because I did not read the second sentence in the question carefully. It says that the fields in the data set are "tab-delimited". Now, I understand that means that columns are separated by tabs. At first, I thought columns were separated by commas, so I had a hard time even reading the file with Pandas. 

Importing modules
The first step is to import pandas and numPy. Pandas will be used to read the data set. I usually import numPy together with pandas, but I am not very familiar with the module numPy yet. Later on, I also imported some math functions from the math module, like sqrt, sin, asin, and cos. That is because I will be using some form of the Haversine formula to calculate distances given latitude and longitude of airports.
Derived from the Haversine formula, we have the distance between two points given its latitude and longitude to be distance =
where 
  • φ1φ2: latitude of point 1 and latitude of point 2 (in radians),
  • λ1λ2: longitude of point 1 and longitude of point 2 (in radians). (Wikipedia)
Using iloc to choose certain columns
The next step, was to convert the latitude and longitude degrees to radians, since the Haversine formula works on radian measurements. I knew that to convert degrees to radians, we multiply the degree measurement by pi and divide by 180 degrees. I estimated pi to be 3.14. 
The harder part is to know how to isolate the column that has a series of latitudes and then the column that has a series of longitudes to apply the mathematical formula. 
I used iloc to choose certain columns from my data set. iloc works by index. The code I used to convert the latitude degrees to radians is as follows : 

data['Latitude in Radians'] = ((data.iloc[:, 3:4]*3.14)/180)

This line of code creates a new column named "Latitude in Radians", which is each element in the series of the 4th column of the dataframe multiplied by 3.14 and divided by 180. The third column of the dataframe is the latitude in degrees of the airports. The iloc[:, 3:4] picks out all the rows ( : ) specifically for the column starting at the 4th column and ending at the beginning of the 5th column (3:4). Since iloc uses indexes and we count starting from index 0, the 4th column is denoted by 3.
Similar code was used to convert the longitudes in degrees to radians.

Dictionary to map airport to their respective latitudes and longitudes
Given two three-letter IATA codes that represents two airports, I had to pull out their respective latitudes and longitudes to apply a formula to find the distance between the two airports.
This sounds like a dictionary is required. A dictionary can easily map two elements together. I wanted to create a dictionary that mapped the IATA code to the airport's latitude and another dictionary that mapped the IATA code to the airport's longitude.
I thought that I could create a list of IATA codes, a list of latitudes and a list of longitudes and create two dictionaries that way. 
But first, I needed to create those lists.

List of lists to a single list
To create a list of IATA codes, a list of latitudes, and a list of longitudes from three series objects, I had to use the values attribute and the tolist attribute. The code I used to create a list of individual IATA codes is as follows: 

dftolistIATA = data.iloc[:, 0:1].values.tolist()

The IATA code is the 1st column and the values attribute takes all the values in a series and the tolist attribute converts those values into a list. The tolist attribute actually created a LIST OF LISTS. So, each individual IATA code was a separate list. Lists are denoted with brackets, so a list of letters would be something like [a, b, c, d, e], but a list of lists would have brackets inside brackets, like so: [[a], [b], [c], [d], [e]]. 
I really just need one list, so I needed a flattened list. How was I supposed to do that? I used stackoverflow to find the code to create a flattened list. It is as follows (following the code from above): 

flattened_list = []
flattened_list = [y for x in dftolistIATA for y in x]

This code is getting the elements of the elements in dftolistIATA (the list of lists), which are the individual IATA codes, to form a single list themselves. I still have to read more on flattened list to really understand how this code works, though.

So, after I created my flattened lists, I was able to create my dictionaries using the following code:

mydictionary= dict(zip(flattened_list,  flattened_list2))
mydictionary2= dict(zip(flattened_list, flattened_list3))

Essentially, the first line of code created a dictionary called mydictionary which mapped the flattened_list (list of IATA codes) to flattened_list2 (list of latitudes). 

Writing code for a form of the Haversine Formula
The last step was to create a function where I input two IATA codes and it spits out the distance in kilometers between the two airports. This was not that difficult, as I had the form of the Haversine Formula that explicitly told me the distance between two points given their latitudes and longitudes. I just had to write it out in code.

The skeleton for a function in Python is as follows:
def functionname(variable(s)):
    function code if necessary
    return (something)

Testing out my function
Lastly, I had to test out my function. I printed out distance(JFK, AMS). My function required me to have two variables and it returned the distance. distance(JFK, AMS) returned 5854.025421753566. On Google, the distance is 5850 km. Close enough!

Final Thoughts
I found this problem to be very interesting and worthwhile. I love to travel, so the topic resonated with me. I learned about the Haversine formula and gained a deeper understanding of iloc, parsing through data sets, and dictionaries. I don't still yet understand the code for flattened lists, but that will come with time. 
This problem took me a while, probably 5 hours, which could have been cut short if I read the question carefully and if I just understood more Python. But, now that I have more of a taste of what Python and math is capable of doing, I am excited to keep on learning and experimenting. 

Github code - Airport_distances.py

No comments:

Post a Comment