Homework 10 - More with Files
Due Date: November 29, 2023
#%%
on the first line of your solution. These three characters create a code block in Spyder. The assignment is due by 11:59 pm on the due date.When opening a text file with the
open()
function, it is best to include the encoding='utf-8'
option, for example:
with open('us-city-coords.txt', 'r', encoding='utf-8') as us_cities_file:
# do something with us_cities_file
Download the files us-city-coords.txt and us-city-populations.txt, and save them in the same directory where you save the Python script for this homework assignment. The files contain population data on cities across the USA. Below is a description of the contents of each file.
Structure of file us-city-coords.txt
Each line of the file has the following format:
Each line contains the name of a city, the state the city is in, and the \((x,y)\) latitude and longitude coordinates of the city. Notice that the data on each line is separated by a colon.
Structure of file us-city-populations.txt
Each line of the file has the following format:
Each line contains the 2019 population (number1
), 2010 population (number2
), land area in square miles (number3
), and population density (number4
) of the city with coordinates \((x,y)\). Each city's coordinates appears once in each file. Notice that number1
and number2
contain commas, and number3
and number4
contain units.
Problems
Using the
open()
function in Python, open and read the data in the files and create a list calledus_cities
where each element of the list is a dictionary containing all the data of each city. The key-value pairs of each dictionary should be:Key Value name Name of the city state State the city is in population City's 2019 population (as an int
)census City's 2010 census population (as an int
)area City's land area (as a float
)density City's population density (as a float
)IMPORTANT: Each file contains the same number of lines (and thus the same number of cities) but the city information on line \(k\) in the first file is not the information for the same city on line \(k\) of the second file. Thus, part of the problem is to match the coordinates data of each file in order to create each dictionary.- Using the
us_cities
list, find the total population and average population of all the cities. Find these values for both the 2019 population and the 2010 census population. - Using the
us_cities
list, find the total land area and average population density of all the cities. How many of the cities in the data are in Florida? To find out, create a new dictionary, called
by_state
, whose keys are the states in the data and a value is the number of cities in the state. Then, write code that creates a file calledby_state.txt
where each line of the file contains the state name and number of cities in the state separated by a colon, that is, each line is of the formstate_name:number_of_citiesThe lines should be in descending order by
number_of_cities
, that is, the first line should list the state with the highest number of cities in the data and the last line corresponds to the state with the fewest number of cities. Your script should create the fileby_state.txt
as described above, however, upload only your Python script.- Find the 2019 population distribution of all the cities using the following bins: \begin{align*} I_1&=[100000,199999]\\ I_2&=[200000,299999]\\ I_3&=[300000,499999]\\ I_4&=[500000,999999]\\ I_5&=[1000000,10000000]. \end{align*} Save the distribution in a list \(d = [d_1,d_2,d_3,d_4,d_5]\), that is, \(d_k\) is the percentage of cities in the data whose 2019 population is in the interval \(I_k\).