Category: Python

User-Defined Module Conversion Program “Unit Conversion Program with Python: A Step-by-Step Guide to Using If/Else and Elif Statements”

Assignment Requirements
In this assignment, you will write a program that demonstrates the use of user-defined modules by invoking functions from imported modules. Your program will accept a distance value and unit of length (miles or kilometers) and will call functions within your user-defined modules to convert the distance to the other unit of length. You will create two modules that contain the functions: one to convert miles to kilometers and one to convert kilometers to miles. Your program should continue converting entered distances until the user chooses to discontinue the program.
Assignment Directions You will begin by creating the modules that hold the conversion functions. Start a new Python file and build each function to do the appropriate conversions.Create a module named miles.
Create a function within this module named convertToMiles that accepts one value and returns the converted value when called.
Create a second module named kilometers.
Create a function within this module named convertToKilometers that accepts one value and returns the converted value when called.Conversion formulas for your reference:Metric Conversions (Kilometers-to-miles)Metric Conversions (Miles-to-kilometers)
Hint: Modules are created as Python files with the same .py file extension as your main program file. Your modules must be within the same folder/directory as your program file in order to invoke functions from them. Reference: Refer back to Learn to Program With Python 3: A Step-by-Step Guide to Programming, 2nd edition, Chapter 4: Definition of a Function; and Programming in Python Chapter 7: Python Modules / 7.2 Module Definition / 7.3 Creating a Module.
Next you will create your Python program to accept user input of a distance value and unit of length (miles or kilometers). You will start by including the import statements at the top of this program. You will import both the miles and kilometers modules you just created.Reminder: These three Python files must exist within the same directory.Reference: Refer to Programming in Python Chapter 7: Python Modules / 7.2 7.5 7.6 Importing Modules.
Initialize variables. You should create two flag variables and initialize them to a value of true: one for when an invalid value is entered and one for when the user chooses to stop the processing (i.e., validValue, processing).
Create a conditional loop that will continue the processing until the user chooses to exit.Hint: Use the processing flag you just created in step 3 as your test condition.Reference: Refer back to Learn to Program With Python 3: A Step-by-Step Guide to Programming, 2nd edition, Chapter 6: Loops / The While Statement. Accept a distance value and unit of length from the user and store these two values in variables. Use prompts such as:Please enter a distance value:What is the unit of length (M = miles, KM = kilometers):Hint: Preface your input statement with float to ensure the value they enter is stored as a decimal number for use in the conversion formulas.Reference: Refer back to your Unit 1 readings in Learn to Program With Python 3: A Step-by-Step Guide to Programming, 2nd edition, Chapter 3: Built-In Functions / Getting Input From the User.
Incorporate a decision structure to check the unit of length entered by the user and call the appropriate conversion function from your user-defined modules.If the user entered “M”, call the convertToKilometers function passing in the user entered distance. Store the result of the function call in a new variable (convertedDistance).
If the user entered “KM”, call the convertToMiles function passing in the user entered distance. Store the result of the function call in a new variable (convertedDistance).
Use an else statement for any other values and set the validValue flag to false to indicate an invalid value was entered.
Hint: Use the string function, upper, to convert the user input to uppercase and check for a value of M or KM only.Reference: Refer back to Learn to Program With Python 3: A Step-by-Step Guide to Programming, 2nd edition, Chapter 5: The if Statement / Comparison Operators / The elif Statement.
If the user entered a valid unit of length, display the distance they entered, the unit of measurement and the converted distance/unit. Otherwise, display an error message. (i.e.. Your distance of 26.2 miles is equivalent to 42.1 kilometers or You entered an invalid unit of length.)Hint: Use an IF/ELSE decision structure here. Concatenate the results into one string.Reference: Refer back to your Unit 1 readings in Learn to Program With Python 3: A Step-by-Step Guide to Programming, 2nd edition, Chapter 2: Print Statements; Chapter 3: Built-In Functions / Concatenation; Chapter 5: The if Statement.
Add another prompt to ask the user if they would like to continue. (i.e., Press X to quit or enter to continue processing.)
If the user entered a value to discontinue processing, set the processing flag to false to conclude the conditional looping and end the program.
Note: These small programming examples include very minimal data validation. Feel free to expand upon this and add additional layers of validation.
Example Output Please enter a distance value: 26.2
What is the unit of length (M = miles, KM = kilometers) : m
Your distance of 26.2 miles is equivalent to 42.16 kilometers
Press X to quit or enter to continue processing.
Please enter a distance value: 5
What is the unit of length (M = miles, KM = kilometers) : km
Your distance of 5.0 kilometers is equivalent to 3.10 miles
Press X to quit or enter to continue processing. X
End processing of distances.

April 22, 2025
Assignment: Halloween Candy Exploring Candy Preferences in the United States “Analyzing Survey Data on Halloween Candy Preferences” Data Cleaning and Manipulation Cleaning Candy Data: Removing Duplicates and Unnecessary Columns ENTS, Unnamed: 113, Click Coordinates (x, y) Cleaning and Preparing Data for Halloween Candy Survey Analysis Data Cleaning and Standardization: Country Variable Assignment 3: Cleaning and Binning Data Cleaning and Binning Data “Data Cleaning and Feature Engineering in Pandas: Preparing a Candy Dataset for Machine Learning”

Assignment: Halloween Candy
Name:
Overview
In this assignment, your job will be to clean and wrangle data from a survey of Halloween candy to prepare it for a machine learning project.
Once you have completed all the tasks in the notebook, save your notebook as halloween_candy, and verify that all the tests pass in CodeGrade.
For those of you that are more comfortable with using Pandas, I have also provided extra optional data analysis questions at the end of the assignment for more practice.
Data Set
The data set that we will be using is the 2017 Halloween Candy Hierarchy data set as discussed in this boingboing article. You can also read more about the data in the Science Creative Quarterly.
The following are the rating instructions from the survey:
Basically, consider that feeling you get when you receive this item in your Halloween haul. Does it make you really happy (JOY)? Or is it something that you automatically place in the junk pile (DESPAIR)? MEH for indifference, and you can leave blank if you have no idea what the item is.Note that the original data set has been slightly altered from its original state, and if you wanted to perform any analysis for future projects, you would need to download the data directly from the links above.
This data is a great example of a messy data set, especially since they allowed respondents to enter text for a number of the fields. Also, note that some of the comments in the file might be considered inappropriate to some readers but cleaning this type of data is normal in a lot of data science projects.
Note
Show Work
Remember that you must show your work. Students submissions are spot checked manually throughout the term to verify that they are not hard coding the answer from looking only in the file or in CodeGrade’s expected output. If this is seen, the student’s answer will be manually marked wrong and their grade will be changed to reflect this.
For example, if the answer to Q1, the mean of a specific column, is 22:
# correct way Q1 = df['column_name'].mean() # incorrect way Q1 = 22Our End Goal
Our end goal for this project is to clean the data so that we could then create a machine learning model. We want to see if we are able to predict a person’s gender based purely on their candy preferences. Although, you will not be creating a model for this assignment, only cleaning the data. The results of the models that I used after cleaning the data are provided at the end of this notebook.
Initial Import & Exploration
In [18]:
# initial importsimport pandas as pdimport numpy as np# Do not change this option; This allows the CodeGrade auto grading to function correctlypd.set_option(‘display.max_columns’, 20)Let’s start by importing our data and creating a DataFrame called candy. We need to include encoding=’iso-8859-1′ during the import because there are special characters in the data that Pandas doesn’t recognize. This happens a lot when attempting to import data where the public is able to input answers, especially if there are foreign language characters included. The normal encoding for Pandas is utf-8, so changing the encoding allows Pandas to recognize those special characters.
Run the following code, with the encoding argument, and it should import correctly.
In [19]:
# read_csv with iso-8859-1 encoding; using latin-1 would also work herecandy_full = pd.read_csv(‘candy.csv’, encoding=‘iso-8859-1’)# copy to new DF so that we can have a copy of the original import if neededcandy = candy_full.copy()Let’s take a brief look at the data by using head().
In [20]:
# first five rowscandy.head()Out[20]:
Internal IDQ1: GOING OUT?Q2: GENDERQ3: AGEQ4: COUNTRYQ5: STATE, PROVINCE, COUNTY, ETCQ6 | 100 Grand BarQ6 | Anonymous brown globs that come in black and orange wrapperst(a.k.a. Mary Janes)Q6 | Any full-sized candy barQ6 | Black Jacks…Q8: DESPAIR OTHERQ9: OTHER COMMENTSQ10: DRESSUnnamed: 113Q11: DAYQ12: MEDIA [Daily Dish]Q12: MEDIA [Science]Q12: MEDIA [ESPN]Q12: MEDIA [Yahoo]Click Coordinates (x, y)
090258773NaNNaNNaNNaNNaNNaNNaNNaNNaN…NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
190272821NoMale44USANMMEHDESPAIRJOYMEH…NaNBottom line is Twix is really the only candy w…White and goldNaNSundayNaN1.0NaNNaN(84, 25)
290272829NaNMale49USAVirginiaNaNNaNNaNNaN…NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
390272840NoMale40usorMEHDESPAIRJOYMEH…NaNRaisins can go to hellWhite and goldNaNSundayNaN1.0NaNNaN(75, 23)
490272841NoMale23usaexton paJOYDESPAIRJOYDESPAIR…NaNNaNWhite and goldNaNFridayNaN1.0NaNNaN(70, 10)
5 rows × 120 columns
Next, run the following code to see information about the DataFrame.
In [21]:
# check info about the DataFramecandy.info()
RangeIndex: 2479 entries, 0 to 2478
Columns: 120 entries, Internal ID to Click Coordinates (x, y)
dtypes: float64(4), int64(1), object(115)
memory usage: 2.3+ MB
Notice that this did not print the columns as you might be used to seeing. According to the Pandas documentation: “If the DataFrame has more than max_cols columns, the truncated output is used. By default, the setting in pandas.options.display.max_info_columns is used.”
We can make the columns display by setting the max_cols argument equal to the number of columns in the data set.
In [22]:
# check info, set max_colscandy.info(max_cols=120)
RangeIndex: 2479 entries, 0 to 2478
Data columns (total 120 columns):
# Column Non-Null Count Dtype — —— ————– —– 0 Internal ID 2479 non-null int64 1 Q1: GOING OUT? 2368 non-null object 2 Q2: GENDER 2437 non-null object 3 Q3: AGE 2394 non-null object 4 Q4: COUNTRY 2414 non-null object 5 Q5: STATE, PROVINCE, COUNTY, ETC 2377 non-null object 6 Q6 | 100 Grand Bar 1728 non-null object 7 Q6 | Anonymous brown globs that come in black and orange wrappers (a.k.a. Mary Janes) 1741 non-null object 8 Q6 | Any full-sized candy bar 1803 non-null object 9 Q6 | Black Jacks 1517 non-null object 10 Q6 | Bonkers (the candy) 1482 non-null object 11 Q6 | Bonkers (the board game) 1469 non-null object 12 Q6 | Bottle Caps 1709 non-null object 13 Q6 | Box’o’Raisins 1787 non-null object 14 Q6 | Broken glow stick 1767 non-null object 15 Q6 | Butterfinger 1793 non-null object 16 Q6 | Cadbury Creme Eggs 1792 non-null object 17 Q6 | Candy Corn 1796 non-null object 18 Q6 | Candy that is clearly just the stuff given out for free at restaurants 1784 non-null object 19 Q6 | Caramellos 1723 non-null object 20 Q6 | Cash, or other forms of legal tender 1795 non-null object 21 Q6 | Chardonnay 1732 non-null object 22 Q6 | Chick-o-Sticks (we donÕt know what that is) 1528 non-null object 23 Q6 | Chiclets 1764 non-null object 24 Q6 | Coffee Crisp 1621 non-null object 25 Q6 | Creepy Religious comics/Chick Tracts 1771 non-null object 26 Q6 | Dental paraphenalia 1783 non-null object 27 Q6 | Dots 1746 non-null object 28 Q6 | Dove Bars 1773 non-null object 29 Q6 | Fuzzy Peaches 1652 non-null object 30 Q6 | Generic Brand Acetaminophen 1744 non-null object 31 Q6 | Glow sticks 1778 non-null object 32 Q6 | Goo Goo Clusters 1595 non-null object 33 Q6 | Good N’ Plenty 1741 non-null object 34 Q6 | Gum from baseball cards 1759 non-null object 35 Q6 | Gummy Bears straight up 1778 non-null object 36 Q6 | Hard Candy 1780 non-null object 37 Q6 | Healthy Fruit 1781 non-null object 38 Q6 | Heath Bar 1763 non-null object 39 Q6 | Hershey’s Dark Chocolate 1802 non-null object 40 Q6 | HersheyÕs Milk Chocolate 1803 non-null object 41 Q6 | Hershey’s Kisses 1797 non-null object 42 Q6 | Hugs (actual physical hugs) 1762 non-null object 43 Q6 | Jolly Rancher (bad flavor) 1781 non-null object 44 Q6 | Jolly Ranchers (good flavor) 1780 non-null object 45 Q6 | JoyJoy (Mit Iodine!) 1449 non-null object 46 Q6 | Junior Mints 1777 non-null object 47 Q6 | Senior Mints 1532 non-null object 48 Q6 | Kale smoothie 1731 non-null object 49 Q6 | Kinder Happy Hippo 1528 non-null object 50 Q6 | Kit Kat 1801 non-null object 51 Q6 | LaffyTaffy 1739 non-null object 52 Q6 | LemonHeads 1745 non-null object 53 Q6 | Licorice (not black) 1789 non-null object 54 Q6 | Licorice (yes black) 1790 non-null object 55 Q6 | Lindt Truffle 1757 non-null object 56 Q6 | Lollipops 1784 non-null object 57 Q6 | Mars 1750 non-null object 58 Q6 | Maynards 1450 non-null object 59 Q6 | Mike and Ike 1746 non-null object 60 Q6 | Milk Duds 1782 non-null object 61 Q6 | Milky Way 1787 non-null object 62 Q6 | Regular M&Ms 1800 non-null object 63 Q6 | Peanut M&MÕs 1804 non-null object 64 Q6 | Blue M&M’s 1748 non-null object 65 Q6 | Red M&M’s 1746 non-null object 66 Q6 | Green Party M&M’s 1711 non-null object 67 Q6 | Independent M&M’s 1662 non-null object 68 Q6 | Abstained from M&M’ing. 1532 non-null object 69 Q6 | Minibags of chips 1751 non-null object 70 Q6 | Mint Kisses 1699 non-null object 71 Q6 | Mint Juleps 1664 non-null object 72 Q6 | Mr. Goodbar 1735 non-null object 73 Q6 | Necco Wafers 1731 non-null object 74 Q6 | Nerds 1752 non-null object 75 Q6 | Nestle Crunch 1777 non-null object 76 Q6 | Now’n’Laters 1657 non-null object 77 Q6 | Peeps 1765 non-null object 78 Q6 | Pencils 1766 non-null object 79 Q6 | Pixy Stix 1753 non-null object 80 Q6 | Real Housewives of Orange County Season 9 Blue-Ray 1722 non-null object 81 Q6 | ReeseÕs Peanut Butter Cups 1796 non-null object 82 Q6 | Reese’s Pieces 1784 non-null object 83 Q6 | Reggie Jackson Bar 1460 non-null object 84 Q6 | Rolos 1761 non-null object 85 Q6 | Sandwich-sized bags filled with BooBerry Crunch 1699 non-null object 86 Q6 | Skittles 1769 non-null object 87 Q6 | Smarties (American) 1750 non-null object 88 Q6 | Smarties (Commonwealth) 1573 non-null object 89 Q6 | Snickers 1785 non-null object 90 Q6 | Sourpatch Kids (i.e. abominations of nature) 1737 non-null object 91 Q6 | Spotted Dick 1593 non-null object 92 Q6 | Starburst 1782 non-null object 93 Q6 | Sweet Tarts 1767 non-null object 94 Q6 | Swedish Fish 1760 non-null object 95 Q6 | Sweetums (a friend to diabetes) 1472 non-null object 96 Q6 | Take 5 1557 non-null object 97 Q6 | Tic Tacs 1761 non-null object 98 Q6 | Those odd marshmallow circus peanut things 1739 non-null object 99 Q6 | Three Musketeers 1767 non-null object 100 Q6 | Tolberone something or other 1769 non-null object 101 Q6 | Trail Mix 1767 non-null object 102 Q6 | Twix 1785 non-null object 103 Q6 | Vials of pure high fructose corn syrup, for main-lining into your vein 1683 non-null object 104 Q6 | Vicodin 1686 non-null object 105 Q6 | Whatchamacallit Bars 1652 non-null object 106 Q6 | White Bread 1718 non-null object 107 Q6 | Whole Wheat anything 1728 non-null object 108 Q6 | York Peppermint Patties 1770 non-null object 109 Q7: JOY OTHER 917 non-null object 110 Q8: DESPAIR OTHER 723 non-null object 111 Q9: OTHER COMMENTS 389 non-null object 112 Q10: DRESS 1730 non-null object 113 Unnamed: 113 9 non-null object 114 Q11: DAY 1750 non-null object 115 Q12: MEDIA [Daily Dish] 85 non-null float64
116 Q12: MEDIA [Science] 1376 non-null float64
117 Q12: MEDIA [ESPN] 99 non-null float64
118 Q12: MEDIA [Yahoo] 67 non-null float64
119 Click Coordinates (x, y) 1619 non-null object dtypes: float64(4), int64(1), object(115)
memory usage: 2.3+ MB
Of course, if you are just looking for the column names, you can just use a simple for loop.
In [23]:
# print a list of column namesfor col in candy.columns: print(col)Internal ID
Q1: GOING OUT?
Q2: GENDER
Q3: AGE
Q4: COUNTRY
Q5: STATE, PROVINCE, COUNTY, ETC
Q6 | 100 Grand Bar
Q6 | Anonymous brown globs that come in black and orange wrappers (a.k.a. Mary Janes)
Q6 | Any full-sized candy bar
Q6 | Black Jacks
Q6 | Bonkers (the candy)
Q6 | Bonkers (the board game)
Q6 | Bottle Caps
Q6 | Box’o’Raisins
Q6 | Broken glow stick
Q6 | Butterfinger
Q6 | Cadbury Creme Eggs
Q6 | Candy Corn
Q6 | Candy that is clearly just the stuff given out for free at restaurants
Q6 | Caramellos
Q6 | Cash, or other forms of legal tender
Q6 | Chardonnay
Q6 | Chick-o-Sticks (we donÕt know what that is)
Q6 | Chiclets
Q6 | Coffee Crisp
Q6 | Creepy Religious comics/Chick Tracts
Q6 | Dental paraphenalia
Q6 | Dots
Q6 | Dove Bars
Q6 | Fuzzy Peaches
Q6 | Generic Brand Acetaminophen
Q6 | Glow sticks
Q6 | Goo Goo Clusters
Q6 | Good N’ Plenty
Q6 | Gum from baseball cards
Q6 | Gummy Bears straight up
Q6 | Hard Candy
Q6 | Healthy Fruit
Q6 | Heath Bar
Q6 | Hershey’s Dark Chocolate
Q6 | HersheyÕs Milk Chocolate
Q6 | Hershey’s Kisses
Q6 | Hugs (actual physical hugs)
Q6 | Jolly Rancher (bad flavor)
Q6 | Jolly Ranchers (good flavor)
Q6 | JoyJoy (Mit Iodine!)
Q6 | Junior Mints
Q6 | Senior Mints
Q6 | Kale smoothie
Q6 | Kinder Happy Hippo
Q6 | Kit Kat
Q6 | LaffyTaffy
Q6 | LemonHeads
Q6 | Licorice (not black)
Q6 | Licorice (yes black)
Q6 | Lindt Truffle
Q6 | Lollipops
Q6 | Mars
Q6 | Maynards
Q6 | Mike and Ike
Q6 | Milk Duds
Q6 | Milky Way
Q6 | Regular M&Ms
Q6 | Peanut M&MÕs
Q6 | Blue M&M’s
Q6 | Red M&M’s
Q6 | Green Party M&M’s
Q6 | Independent M&M’s
Q6 | Abstained from M&M’ing.
Q6 | Minibags of chips
Q6 | Mint Kisses
Q6 | Mint Juleps
Q6 | Mr. Goodbar
Q6 | Necco Wafers
Q6 | Nerds
Q6 | Nestle Crunch
Q6 | Now’n’Laters
Q6 | Peeps
Q6 | Pencils
Q6 | Pixy Stix
Q6 | Real Housewives of Orange County Season 9 Blue-Ray
Q6 | ReeseÕs Peanut Butter Cups
Q6 | Reese’s Pieces
Q6 | Reggie Jackson Bar
Q6 | Rolos
Q6 | Sandwich-sized bags filled with BooBerry Crunch
Q6 | Skittles
Q6 | Smarties (American)
Q6 | Smarties (Commonwealth)
Q6 | Snickers
Q6 | Sourpatch Kids (i.e. abominations of nature)
Q6 | Spotted Dick
Q6 | Starburst
Q6 | Sweet Tarts
Q6 | Swedish Fish
Q6 | Sweetums (a friend to diabetes)
Q6 | Take 5
Q6 | Tic Tacs
Q6 | Those odd marshmallow circus peanut things
Q6 | Three Musketeers
Q6 | Tolberone something or other
Q6 | Trail Mix
Q6 | Twix
Q6 | Vials of pure high fructose corn syrup, for main-lining into your vein
Q6 | Vicodin
Q6 | Whatchamacallit Bars
Q6 | White Bread
Q6 | Whole Wheat anything
Q6 | York Peppermint Patties
Q7: JOY OTHER
Q8: DESPAIR OTHER
Q9: OTHER COMMENTS
Q10: DRESS
Unnamed: 113
Q11: DAY
Q12: MEDIA [Daily Dish]
Q12: MEDIA [Science]
Q12: MEDIA [ESPN]
Q12: MEDIA [Yahoo]
Click Coordinates (x, y)
This data set is pretty messy. Your goal is now to perform the following actions to get it to the point where it can be passed to a machine learning model.
Note: Unless the instructions ask you to do something different, please always update the original candy DataFrame for the exercises below. The automatic grading in CodeGrade will check your final DataFrame and ensure that you have performed all required data manipulations. Also, feel free to add additional cells as needed.
Data Cleaning
Exercise1: Taking a look at the column names, you may notice that some include the character Õ. This should instead be an apostrophe ‘ mark. Rename the column names that include the Õ character and replace it with an apostrophe.
Remember that you should be updating the candy DataFrame for the tasks listed as unless told differently.
In [24]:
candy_columns=candy.replace(‘Õ’, “‘”, inplace=True)candy_columnsQ1: How many duplicated rows are there in the file? Assume that a duplicate is any row that is exactly the same as another one. Save this number as Q1.
In [25]:
Q1= candy.duplicated(keep=False).sum()Q1Out[25]:
34Q2: How many duplicated rows are there in the file if we were to assume that a duplicate is any row with the same Internal ID number as another. In other words, even if the other values are different, a row would count as a duplicate if it had the same Internal ID as another. Save this number as Q2.
In [26]:
Q2= candy[‘Internal ID’].duplicated(keep=False).sum()Q2Out[26]:
38Exercise2: Drop any duplicates from the candy DataFrame. Duplicates are to be defined as any row with the same Internal ID as another. Use the default setting that keeps the first record from the duplicates.
In [27]:
candy= candy.drop_duplicates(subset=‘Internal ID’)candyOut[27]:
Internal IDQ1: GOING OUT?Q2: GENDERQ3: AGEQ4: COUNTRYQ5: STATE, PROVINCE, COUNTY, ETCQ6 | 100 Grand BarQ6 | Anonymous brown globs that come in black and orange wrapperst(a.k.a. Mary Janes)Q6 | Any full-sized candy barQ6 | Black Jacks…Q8: DESPAIR OTHERQ9: OTHER COMMENTSQ10: DRESSUnnamed: 113Q11: DAYQ12: MEDIA [Daily Dish]Q12: MEDIA [Science]Q12: MEDIA [ESPN]Q12: MEDIA [Yahoo]Click Coordinates (x, y)
090258773NaNNaNNaNNaNNaNNaNNaNNaNNaN…NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
190272821NoMale44USANMMEHDESPAIRJOYMEH…NaNBottom line is Twix is really the only candy w…White and goldNaNSundayNaN1.0NaNNaN(84, 25)
290272829NaNMale49USAVirginiaNaNNaNNaNNaN…NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
390272840NoMale40usorMEHDESPAIRJOYMEH…NaNRaisins can go to hellWhite and goldNaNSundayNaN1.0NaNNaN(75, 23)
490272841NoMale23usaexton paJOYDESPAIRJOYDESPAIR…NaNNaNWhite and goldNaNFridayNaN1.0NaNNaN(70, 10)
…………………………………………………………
247490314359NoMale24USAMDJOYDESPAIRMEHDESPAIR…Fruit Stripe GumNaNWhite and goldNaNFridayNaNNaNNaNNaNNaN
247590314580NoFemale33USANew YorkMEHDESPAIRJOYNaN…CapersNaNBlue and blackNaNFridayNaN1.0NaNNaN(70, 26)
247690314634NoFemale26USATennesseeMEHDESPAIRJOYDESPAIR…NaNNaNBlue and blackNaNFridayNaN1.0NaNNaN(67, 35)
247790314658NoMale58UsaNorth CarolinaNaNNaNNaNNaN…NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
247890314802NoFemale66usaPennsylvaniaDESPAIRDESPAIRJOYDESPAIR…NaNYou hit all my chocolate highlights, and broug…White and goldNaNSunday1.0NaNNaNNaN(19, 26)
2460 rows × 120 columns
Exercise3: Your next task is to remove the following columns from the candy DataFrame as we will not use these columns for this project. You are welcome to do further analysis on these columns but do not save your analysis in this notebook.
Remove the following columns: Internal ID, Q5: STATE, PROVINCE, COUNTY, ETC, Q7: JOY OTHER, Q8: DESPAIR OTHER, Q9: OTHER COMMENTS, Unnamed: 113, Click Coordinates (x, y).
In [28]:
candy_columns_remove=[ ‘Internal ID’, ‘Q5: STATE, PROVINCE, COUNTY, ETC’, ‘Q7: JOY OTHER’, ‘Q8: DESPAIR OTHER’, ‘Q9: OTHER COMMENTS’, ‘Unnamed: 113’, ‘Click Coordinates (x, y)’]candy.drop(columns=candy_columns_remove,inplace=True)Code Check: As a check for the above exercises, the shape of your data should now be: (2460, 113)
In [30]:
candy.shapeOut[30]:
(2460, 113)Exercise4: Let’s now take a look at the Q2: GENDER column since this will be what we are trying to predict. Take a look at the value counts for this column.
In [31]:
candy=candy[‘Q2: GENDER’].value_counts()candyOut[31]:
Q2: GENDER
Male 1466
Female 839
I’d rather not say 83
Other 30
Name: count, dtype: int64Q3: How many missing values are in the Q2: GENDER column? Save this as Q3.
In [37]:
Q3 = candy[‘Q2: GENDER’].isnull().sum()Q3—————————————————————————
KeyError Traceback (most recent call last)
File ~anaconda3Libsite-packagespandascoreindexesbase.py:3653, in Index.get_loc(self, key)
3652 try:
-> 3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
File ~anaconda3Libsite-packagespandas_libsindex.pyx:147, in pandas._libs.index.IndexEngine.get_loc()
File ~anaconda3Libsite-packagespandas_libsindex.pyx:176, in pandas._libs.index.IndexEngine.get_loc()
File pandas_libshashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas_libshashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ‘Q2: GENDER’
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[37], line 1
—-> 1 Q3 = candy[‘Q2: GENDER’].isnull().sum()
2 Q3
File ~anaconda3Libsite-packagespandascoreseries.py:1007, in Series.__getitem__(self, key)
1004 return self._values[key]
1006 elif key_is_scalar:
-> 1007 return self._get_value(key)
1009 if is_hashable(key):
1010 # Otherwise index.get_value will raise InvalidIndexError
1011 try:
1012 # For labels that don’t resolve as scalars like tuples and frozensets
File ~anaconda3Libsite-packagespandascoreseries.py:1116, in Series._get_value(self, label, takeable)
1113 return self._values[label]
1115 # Similar to Index.get_value, but we do not fall back to positional
-> 1116 loc = self.index.get_loc(label)
1118 if is_integer(loc):
1119 return self._values[loc]
File ~anaconda3Libsite-packagespandascoreindexesbase.py:3655, in Index.get_loc(self, key)
3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
-> 3655 raise KeyError(key) from err
3656 except TypeError:
3657 # If we have a listlike key, _check_indexing_error will raise
3658 # InvalidIndexError. Otherwise we fall through and re-raise
3659 # the TypeError.
3660 self._check_indexing_error(key)
KeyError: ‘Q2: GENDER’
Exercise5: Using the candy DataFrame, remove all rows with a missing value in the Q2: GENDER column. (This should overwrite and be saved as candy like you have been doing for the previous exercises.)
In [ ]:
### ENTER CODE HERE ###Exercise6: For this project, we want to use binary classification, which predicts one of two classes. We want to predict between Male or Female. Because of this, select only the rows that contain either Male or Female in the Q2: GENDER column.
In [ ]:
### ENTER CODE HERE ###Code Check: As a check for the above exercises, the shape of your data should now be: (2305, 113)
In [ ]:
### ENTER CODE HERE ###Now, let’s work on filling some of the missing data. There are easier ways to do this with the sklearn library which you will learn about more in the machine learning classes, but for now, let’s try to practice our Pandas skills.
Q4: How many missing values are in the Q1: GOING OUT? column? Save this number as Q4.
In [ ]:
### ENTER CODE HERE ###Exercise7: For a future analysis question, we are interested in those that we know will definitely go out for Halloween. Because of this, fill all missing values in the Q1: GOING OUT? column with a No value.
In [ ]:
### ENTER CODE HERE ###Code Check: Double check your above work and look at the value counts for the Q1: GOING OUT? column. Make sure that you only have “Yes” and No” values and that they add up to 2305, which is the number of rows you should have at this step in the assignment.
In [ ]:
### ENTER CODE HERE ###Exercise8: To get ready for the next step, let’s practice selecting all the columns: going from Q6 | 100 Grand Bar to Q11: DAY. Save this slice as candy_slice.
In [ ]:
### ENTER CODE HERE ###Exercise9: Now that you know how to slice the data, fill any missing values in the candy DataFrame for those columns (going from Q6 | 100 Grand Bar to Q11: DAY) with the string NO_ANSWER. Make sure you are working with the candy DataFrame and not the candy_slice DataFrame.
In [ ]:
### ENTER CODE HERE ###Exercise10: For all four Q12: Media columns in the candy DataFrame, fill the missing values with 0.0.
In [ ]:
### ENTER CODE HERE ###Code Check: As a check for the above code, make sure that there are no missing values left for the Q6 to Q12 columns.
In [ ]:
### ENTER CODE HERE ###Now, let’s look at the very messy Q4: COUNTRY column and see what we can do about it. First, run the code below to look at the different unique values in the data.
In [ ]:
# check unique valuescandy[‘Q4: COUNTRY’].unique()Code Check: As a check for the Country column, check to see how many unique values are in the data. You should have 115 different unique values for the Q4: COUNTRY column. If you have less or more than this number, double check your work above.
In [ ]:
# check the Q4: COUNTRY number of unique valuescandy[‘Q4: COUNTRY’].nunique()We want to clean up this data to only include four areas: USA, Canada, Europe (the continent, not necessarily the European Union), and Other.
There are different ways to do this, but I would suggest that you look at the way we handled the property_type column in the vienna data set and the code in the amenities_to_columns() function in the module notebook. These might be a little harder than those examples but they should give you a good baseline approach.
You could use replace() for this step, and it is fine if you ultimately decide to do this, but I would suggest that you come up with a solution similar to what was shown in the vienna data cleaning notebook. This method would be much more robust if you had many more values in your data.
I suggest the following order for this section to make it easier:
Fill in all missing values with Other
Code Australia as Other (doing this step will help when trying to use us in the next step if you use string methods)
Combine all USA entries together as USA
Combine Canadian entries as CA
Combine European entries as EU
Everything else gets coded as Other
Exercise11: Fill the missing values in the Q4: COUNTRY column with Other.
In [ ]:
### ENTER CODE HERE ###Code Check: Double check that there are no missing values in the Q4: COUNTRY column. Also, double check the unique values to make sure that “Other” was added. This should mean that you now have 116 unique values for this column.
In [ ]:
# check missing Q4 values### ENTER CODE HERE ###In [ ]:
# check unique values### ENTER CODE HERE ###Exercise12: Combine all Australia entries into Other. Watch out for capitalization issues. You should have 114 unique values after this step.
In [ ]:
### ENTER CODE HERE ###In [ ]:
# check number of unique values### ENTER CODE HERE ###Exercise13: Combine all United States entries together into USA. These would include the following:
'USA ', 'USA', 'us', 'usa', 'Us', 'US', 'Murica', 'United States', 'united states', 'Usa', 'United States ', 'United staes', 'United States of America', 'United states', 'u.s.a.', 'United States of America ', 'america', 'U.S.A.', 'unhinged states', 'united states of america', 'US of A', 'The United States', 'North Carolina ', 'Unied States', 'U S', 'u.s.', 'The United States of America', 'unite states','U.S.', 'USA? Hard to tell anymore..', "'merica", 'United State', 'United Sates', 'California', 'Unites States', 'USa', 'I pretend to be from Canada, but I am really from the United States.', 'Usa ', 'United Stated', 'New Jersey', 'United ststes', 'America', 'United Statss', 'murrika', 'USA! USA! USA!', 'USAA', 'united States ', 'N. America', 'USSA', 'U.S. ', 'u s a', 'United Statea', 'united ststes', 'USA USA USA!!!!'In [ ]:
### ENTER CODE HERE ###Code Check: You should be merging the above values together into 1 (USA) and be left with 61 unique values after this step (including the USA value).
In [ ]:
# check unique values### ENTER CODE HERE ###Exercise14: Combine the Canadian entries (both upper and lower case) and label them as CA. Be careful as there are extra spaces, characters, and misspellings (Can, Canae).
These values include:
'canada', 'Canada', 'canada ', 'Canada ', 'Can', 'Canae', 'Canada`', 'CANADA'In [ ]:
### ENTER CODE HERE ###Code Check: You should be merging 8 values together into 1 (CA) and be left with 54 unique values after this step (including the CA value).
In [ ]:
# check unique values### ENTER CODE HERE ###Exercise15: Combine the European entries and label them as EU. Again, we are looking at the continent of Europe and not necessarily the countries that are a part of the European Union.
These values include:
'uk', 'United Kingdom', 'England', 'UK', 'france', 'finland', 'Netherlands', 'germany', 'Europe', 'U.K. ', 'Greece', 'France', 'Ireland', 'Uk', 'Germany', 'Scotland', 'UK ', 'Denmark', 'France ', 'Switzerland', 'Scotland ', 'The Netherlands', 'Ireland ', 'spain', 'Sweden', 'United kingdom'In [ ]:
### ENTER CODE HERE ###Code Check: You should be merging 26 entries together and be left with 29 unique values after this step (including the EU value).
In [ ]:
# check unique values### ENTER CODE HERE ###Exercise16: Finally, combine the other entries and label them as Other.
In [ ]:
### ENTER CODE HERE ###Code Check: Double check that you only have four unique values in the Q4: COUNTRY column: USA, Other, CA, and EU
In [ ]:
# check values### ENTER CODE HERE ###Q5: To double check that everything was coded correctly, save the value counts of the Q4: COUNTRY column as Q5. You can check this once you run your CodeGrade check.
In [ ]:
### ENTER CODE HERE ###We now want to look at the Q3: AGE column. Let’s look at all the unique values.
In [ ]:
# check unique age values### ENTER CODE HERE ###Again, this is a pretty messy column of data. This is a good example of why those that create online surveys shouldn’t allow the individual to just put any value into the field. But it is now our job to clean this up.
Exercise17: Your task is to put these values into the following categorical bins: unknown, 17 and under, 18-25, 26-35, 36-45, 46-55, and 56+.
The category labels should exactly match the above.
Missing values should be replaced with the unknown category
To make things easier and avoid ambiguity, let’s say that any value with text, even if we could determine the age, will be binned with the unknown category. For example: sixty-nine should be coded as unknown, 45-55 should be coded as unknown, 59 on the day after Halloween should be coded as unknown, etc.
Ensure that the category labels are unordered but reorder the categories so that ‘unknown’ is listed in the first position. This is not really needed but will help us grade your assignment. The categories should be listed as follows: Index([‘unknown’, ’17 and under’, ’18-25′, ’26-35′, ’36-45′, ’46-55′, ’56+’], dtype=’object’)
First, we will replace any non-numeric value (those with text as mentioned above) with a missing value. This will allow you to turn the other values into floats so that you can bin them. Just don’t forget to code the missing values as unknown when you are done. To replace the non-numeric values, run the following code:
In [ ]:
# create True/False indexage_index = candy[‘Q3: AGE’].str.isnumeric()# for the index, fill missing values with Falseage_index = age_index.fillna(False)# select Age column for only those False values from index and code as missingcandy.loc[~age_index, ‘Q3: AGE’] = np.nanIn [ ]:
### ENTER REST OF CODE HERE ###Exercise18: Double check yourself by checking the categories for the Q3: AGE column. It should output: Index([‘unknown’, ’17 and under’, ’18-25′, ’26-35′, ’36-45′, ’46-55′, ’56+’], dtype=’object’)
In [ ]:
# double check categories### ENTER CODE HERE ###Code Check: To double check your above binning worked correctly, your value counts (sorted by the index) should be as follows:
unknown: 60 17 and under: 49 18-25: 85 26-35: 520 36-45: 768 46-55: 525 56+: 298In [ ]:
### ENTER CODE HERE ###You can also double check some of your work up to this point by making sure that there are no missing values in the data set anymore.
Code Check: Check to see if there are any missing values in the data set. Your output should show 0.
In [ ]:
### ENTER CODE HERE ###Exercise19: Before you move on to the next section, reset the index for candy ensuring that it goes from 0 to n-1.
In [ ]:
### ENTER CODE HERE ###Feature Engineering
Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models. In this section, we will create a new column called “net_feelies” (calculated by the authors as the total joy count minus the total despair count).
First, let’s narrow down our data to make working with it easier.
Exercise20: Select only the Q6 candy columns (Q6 | 100 Grand Bar through Q6 | York Peppermint Patties) in the data set and save this as a new DataFrame called candy_reduced.
In [ ]:
### ENTER CODE HERE ###Next, we will create two Series, one with JOY counts and one with DESPAIR counts to add to our candy_reduced data.
Exercise21: Create a Series called joy_count that lists total counts for JOY for each column, making sure to keep it in the same order as the columns in the candy_reduced DataFrame. Hint: A simple way to do this is to filter the entire DataFrame for any JOY values and then use count(). See this stackoverflow question and answers.
In [ ]:
### ENTER CODE HERE ###Exercise22: Same as above except you will create a Series called despair_count that lists the total counts for DESPAIR for each column.
In [ ]:
### ENTER CODE HERE ###Exercise23: Take the transpose of the candy_reduced DataFrame and save this transposed data as candy_reduced_transpose.
In [ ]:
### ENTER CODE HERE ###Exercise24: Add a new column called “joy_count” using the joy_count Series above and a new column called ‘despair_count” using the despair_count Series above to the candy_reduced_transpose DataFrame.
In [ ]:
### ENTER CODE HERE ###Exercise25: Add a new column to the candy_reduced_transpose DataFrame called “net_feelies” that takes the joy_count column and subtracts the despair_count column.
In [ ]:
### ENTER CODE HERE ###Exercise26: Select only the joy_count, despair_count, and net_feelies columns from the candy_reduced_transpose DataFrame. Sort this DataFrame in descending order by net_feelies and save this as candy_net_sorted.
In [ ]:
### ENTER CODE HERE ###Encoding
We now want to get the candy DataFrame ready to run a machine learning algorthim to determine if we could predict a person’s gender based on what candy they prefer.
You will learn more about this in the machine learning classes, but some algorithms work exclusively with numeric values. We will now turn all of our values into numeric values. There are easier ways to do this with sklearn, which you will study in later courses, but we will use Pandas to perform these exercises for further practice.
Exercise27: For grading purposes, we want to leave the candy DataFrame as is. Make a copy of the candy DataFrame and save this new DataFrame as candy_encode.
In [ ]:
### ENTER CODE HERE ###Exercise28: For the candy_encode DataFrame, replace any Female values with 0 and any Male values with 1.
In [ ]:
### ENTER CODE HERE ###Exercise29: Again, you will learn more about this later, but we need to separate the column that we want to predict (called the response) and the columns that we will use to make the predictions (called the features). For both of the items below, make sure that the index is reset and goes from 0 to n-1.
Select only the Q2: GENDER column from candy_encode and save this as candy_response. Note: This should be a Series.
Drop the following column

April 2, 2025
“Inside the World of Software Development: The Role and Skills of a Programmer”

What do programmers do? (Theory of Python)
Real Physics
The Role of a Programmer: Understanding Software Development and Management
Essential Skills for Software
Developers
Managing Managers and Team MembersFinancial Considerations for Developers: Salary, Stock Options, and DividendsUnderstanding Business and Startup CultureTips for Negotiating Salary and Career Advancement

March 29, 2025
Understanding the Role and Skills of a Programmer in Software Development and Management

What do programmers do? (Theory of Python)
Real Physics
The Role of a Programmer: Understanding Software Development and Management
Essential Skills for Software Developers
Managing Managers and Team MembersFinancial Considerations for Developers: Salary, Stock Options, and DividendsUnderstanding Business and Startup CultureTips for Negotiating Salary and Career Advancement

March 29, 2025
Finding the Second Longest Average Visit Time for an Attraction # Importing necessary libraries import sqlite3 # Connecting to the database conn = sqlite3.connect(‘VASTChallenge.db’) cur = conn.cursor() # Query to find the

very simple task…I need to answer 1 question from the database “.db” file that I provided. Use python. show the code and also the output. I need source code and also the output for the question. please use sqlite3 library and python. question: What ride (note that not all attractions are rides) has the second longest average visit time? It has to be a name of the ride.
Additional Information:
The information contained in each of these tables is listed below:
checkin: – The check-in data for all visitors for the day in the park. The data includes two types of check-ins: inferred and actual checkins.
– Fields: visitorID, timestamp, attraction, duration, type
attraction: – The attractions in the park by their corresponding AttractionID, Name, Region, Category, and type. Regions are from the VAST Challenge map such as Coaster Alley, Tundra Land, etc. Categories include Thrill rides, Kiddie Rides, etc. Type is broken into Outdoor Coaster, Other Ride, Carousel, etc.
– Fields: AttractionID, Name, Region, Category, type
sequences: – The check-in sequences of visitors. These sequences list the position of each visitor to the park every five minutes. If the visitor has not entered the part yet, the sequence has a value of 0 for that time interval. If the visitor is in the park, the sequence lists the attraction they have most recently checked in to until they check in to a new one or leave the park.
– Fields: visitorID, sequence

March 29, 2025
Title: “Exploring IoT Opportunities for Entrepreneurs: A Business Idea and Implementation using Python” Introduction: The Internet of Things (IoT) has revolutionized the way we interact with technology, connecting everyday objects to the internet and allowing for

Exercise from:
Intro to Python for Computer Science and Data Science 1e
Exercise 17.21
(Research Project: IoT for the Entrepreneur) If you’re inclined to start a company, IoT presents many opportunities. Research IoT opportunities for entrepreneurs and create and describe an original idea for a business.
Teacher instructions:
Research project 17.21 – Write a 4-5 page paper on IoT and what you think would be a good idea!! Incorporate how you would use Python in this project.

March 29, 2025
“Completing the Solution: Solving Parts F and G of a Code Challenge”

Provided is the code and solution for part a,b,c. I want you to continue to solve only parts ( F + G ) completly

March 29, 2025
“Word Frequency Dictionary Generator”

I need help with coding assignment asap! Write a Python program that takes a sentence as input and returns a dictionary containing the frequency of each word in the sentence. Ignore case sensitivity and punctuation marks.

March 28, 2025
“Optimizing the Model: Correcting Errors on Page 19”

on page 19, can you make it right? because i’m trying to optimize the model, but it went to a totally wrong, please fix it.

March 24, 2025
“Computational Thinking Concepts and Approaches Applied to a Selected Scenario”

Task 1 – Applying Computational thinking Concepts and Approaches on a Selected Scenario
nI this part of the portfolio you shall work individually and select one of the scenarios from Appendix – Aat the latest by the end of first week and start analysing and applying computational thinking concepts and approaches. There are following two tasks ni Part :I
Part 1: Decomposition: Problem Identification and Requirements
You’ll apply decomposition, identify one or more problems and derive requirements ot eb solved. Your write-up should include:
.I alist of possible problems,
Il. the process and techniques that you have used ot think about the problems, and
I. analysis of the problems and derive requirements which seem necessary to solve those problems.
This part of the portfolio si worth 30 marks ni terms of the overall module assessment.
Part 2: Applying Computational Thinking approaches and design a solution
nI this task you wil select one or more problems identified ni the task 1 and design a solution by applying suitable computational concepts and techniques. Individually, you should consider decomposition, pattern recognition, abstraction, modelling, algorithm/pseudocode design that would help solve the problem. Your design should be written ni such a way that the pseudocode could be translated more or less line by line into Python code. You must also specify how you would test that your design si correct and give your test data. To gain high marks ni this task you should provide at least one suitable example for patterns, abstraction and modelling of the selected scenario. Task 2 is worth 50 Marks
Part 2: Applying Computational Thinking approaches and design a solution
In this task you will select one or more problems identified in the task 1 and design a solution by applying suitable computational concepts and techniques. Individually, you should consider decomposition,
pattern recognition, abstraction, modelling, algorithm/pseudocode design that would help solve the problem. Your design should be written ni such a way that the pseudocode could be translated more or less line yb line into Python code. You must also specify how you would test that your design is correct and give your test data. To gain high marks ni this task you should provide at least one suitable example for patterns, abstraction and modelling of the selected scenario. Task 2 si worth 05 Marks ni the overall module assessment.
Task 2 – Evaluation and Consolidated Solution of a Selected Scenarie
nI this part you are required ot evaluate and reflect on the solutions provided by each student ni task 1. You should provide two strengths and two weaknesses for the solution provided ni Task 1 and how these can be improved. This evaluation should also consider how those solutions ot coleritualombesterildseriesconsolot codid or simulated solution or part of a solution wil be provided, to gain higher marks for this task you should provide working solution ni programing.

March 24, 2025