# Data Structures and Algorithms

Data Structures and Algorithms

topic : Data Structures and Algorithms

Data Structures and Algorithms

22:544:613:SEC03 & 26:198:685SEC03 Data Structures and Algorithms

Homework 4: Data Structures and Algorithms

Instructor: Farid Alizadeh Due Date: Sunday December 12 , 2021, at 11:50PM

Note: This is a strict dealine for Data Structures and Algorithms assignment

last updated on December 4, 2021

Please answer the following questions in electronic form and upload and submit your files to the Canvas Data Structures and Algorithms Assignment site before the due date. Make sure to click on the submit button.

You have two choices for the format.

First Choice: Submit in pdf format all theoretical questions. You may ei- ther typeset or hand-write your answers. In case of hand-write transform your work into pdf using apps such as CamScan. You should have only a single pdf file. For Python scripts, include the text of the script as a separate *.py file. You should have a single file for Questions 2 & 7. For other questions, if you use Python to compute your answers include it with the pdf text above.

Second Choice: You can use a single Python Notebook *.ipynb file. You should use the text format of the notebook for the theoretical results, and the Python script for the programming parts.

1

22:544:613SEC03 & 26:711:685SEC03, Fall 2021 Homework 4 Due date: 12/12/2021 at 11:50PM

1. Consider the following set of symbols, along with their frequency of oc- currence in a document:

‘a’ ‘b’ ‘c’ ‘d’ ‘e’ 5 10 3 17 24

Step by step show how the Huffman coding will work in this case. Show the status of the binary tree at each step. Show the final tree and the final binary encoding of each symbol.

2. The knapsack problem is an optimization problem that attempts to find the best combination of items. To motivate the problem, assume that a vendor-hiker can carry a number of items with him for a steep hike. He plans to sell these items (for example, bottles of water and soda, cookies, snacks, etc.) The items are numbered 1 through n. Each item i has selling price pi and a weight of wi. The hiker can only carry with him a total weight of W kilograms. Also for each item he has only one, so he has to decide whether to take the item or not. Therefore, the problem is for item i whether to take it with him (xi = 1) or not (xi = 0.) so that he maximizes his revenue, without the total weights of items he carries with hom exceeding his total weight of W. We assume that demand is high and the hiker can sell all of his items.

This problem can be formulated as an optimization problem as follows:

max p1x1 + · · ·+ pnxn s.t. w1x1 + · · ·+ wnxn ≤ W

xi ∈ {0, 1}

For example, let us assume that the items are as in the following table:

item: water bottle bottle of soda pack of chips pack of trail mix value: $4 $3 $2.5 $1.2

weight: 4 3 2 1

And let’s assume that the hiker can carry a total of 5kg. Then we want to solve how many of each item the hiker wishes to take with him to get the maximum revenue.

2a) Propose a greedy algorithm for solving this problem. What is the complexity of your algorithm? Give an example that shows that your greedy algorithm actually does not always find the optimal solution.

2

22:544:613SEC03 & 26:711:685SEC03, Fall 2021 Homework 4 Due date: 12/12/2021 at 11:50PM

2b) Now develop a dynamic programming approach. To do so, since our goal is to find the optimal values of x1, x2, . . . , xn, define

V(k, w) = the best solution if we can take only

items 1…k and can have maximum value weight of w

Here by “items 1 … k” we mean some arbitrary order which is set at the beginning and is fixed throughout the execution of the algorithm. First, find what is the value V(1, w). In other words, if we could only carry item 1 with weight w1 and price p1, then how may of this item would we carry and what would be our revenue?

2c) Now show that the following recurrence relation is correct:

V(k, w) = max { V(k − 1, w), V(k, w − wk) + pk

} In words: The optimal value of using the first k items and weight limit of w is

• either equal to the optimal value using the first k−1 items (that is the new kth item is not used at all,)

• or it uses the kth item at least once, so the remainder now is using the items 1 through k but with only weight limit of w−wk. This remainder must also be optimal for these parameters (the dynamic programming principle.)

2d) The recurrence above gives only the optimal value of revenue. To find the actual value of how many of each item we need to use, that is to determine the variables xi, we first need to set up an- other variable x(k, w) indicating whether item k is taken or not, when maximum weight capacity is w and only the first k items are used. Then x(k, w) = 1 if item k is used with maximum capacity w, and x(k, w) = 0 when item k is not used. The decision whether to use item k or not is made when building the value table V(k, w).

From the values of x(k, w) it is possible to determine x1, x2, . . . , xn.

2e) Using the recurrence for V(k, w), and the definition of x(k, w) devise a dynamic programming algorithm as follows and apply your algorithm to the example given above along with the table.

• First, set up a values table to record V(k, w) with rows indexed from 1 to k. In row i we can only use items 1 through i. The columns are from 1 to w (or actually from v = mini wi, because if the hiker’s maximum capacity is less than the lightest item, then he cannot take anything with him.)

• Then explain how you would fill this table row by row, starting from row 1. In the table in cell [k, w] you should write both x(k, w) and the optimal value V(k, w).

3

22:544:613SEC03 & 26:711:685SEC03, Fall 2021 Homework 4 Due date: 12/12/2021 at 11:50PM

p k ↓ w → 1 2 3 4 5 3 1 x, v x, v x, v x, v x, v 3 2 x, v x, v x, v x, v x, v 2.5 3 x, v x, v x, v x, v x, v 1.2 4 x, v x, v x, v x, v x, v

Here, in each cell k, w, the red item is x(k, w) and the black item is V(k, w)

2f) Explain how from the values of x(k, w) in the table you can recover the actual values of xi the amount of each item to take. (Hint: Start from the last cell in the table and work backwards.)

2g) Give a time complexity analysis of the dynamic programming algo- rithm. Is your algorithm polynomial time?

2h) Write a Python function knapsack(weights, prices, max weight) which takes a list of weights, a list of prices, and a maximum capacity and returns an integer list amount which indicates the amount xi of each item taken, the total revenue rev, the vTble and xTbl the tables for the x(k, w) and V(k, w). Inside the function you should set up the tables for V(k, w) and x(k, w) and carry out the computations as explained above.

3. Consider the following graph with weights on the edges.

4

22:544:613SEC03 & 26:711:685SEC03, Fall 2021 Homework 4 Due date: 12/12/2021 at 11:50PM

s

a

b

c

d

e

f

g

h

1

2

1

3

5

2

8 3

1 6

2

9

3

6

3a) show the adjacency list representation of the graph.

3b) Run Kruskal’s algorithm to find the smallest spanning tree. Assume the set of edges are sorted in increasing order of weight (so, no heap is needed.) At each iteration show the status of the forest as repre- sented by the data structure for union-find operations. Show the tree representation of sets only in tree format. Also show the status of this tree representation after every union-find operation.

3c) Now run Prim’s “grow-a-tree” algorithm starting at node ‘s’. At each iteration make sure that at each stage to show the partially constructed tree, and the frontier set.

3d) Now apply Dijkstra’s algorithm find all the shortest paths from vertex ‘s’ to all other vertices. Again, at each stage show the shortest path to the nodes discovered, and highlight the frontier set.

4. Run the Bellman-Ford algorithm to find the shortest path from vertex ‘a’ in the following graph to all other nodes, or show that there is a negative- length cycle.

At each iteration show the ordered set of edges, and the relabeling and the predecessor of each vertex. Use the lexical ordering of the edges, for

5

22:544:613SEC03 & 26:711:685SEC03, Fall 2021 Homework 4 Due date: 12/12/2021 at 11:50PM

instance edge a-b comes before b-c etc. Determine if the graph has a negative cycle or not, and if so, how does the Bellman-Ford algorithm finds it.

a b c

def

2 -1

1

-3

1

-1 8

3

1

5. Consider the following 2-3 tree:

na

k

mh

g

b

d

f

c e i j l

5a) Insert the letter ‘i’ into this tree, and show the status of the tree step by step until the final result. At each step also show the corresponding red-black tree with red links leaning left. Show all the rotations and change of color on the links which are needed.

5b) Apply DeleteMin operation on the resulting tree. Again show the status of the 2-3 tree step by step. (No need to show the corresponding

6

22:544:613SEC03 & 26:711:685SEC03, Fall 2021 Homework 4 Due date: 12/12/2021 at 11:50PM

red-black tree operations.) Make sure to show how you backtrack to eliminate all the illegal “4-nodes” that may have arisen in the process.

6. Consider a hash table of length m = 11 where data are stored using chaining. Using the multiplicative method, that is the hash function

h(k) = bm(kA mod 1)c with A = √ 5 − 1

2

6a) Insert the keys 1, 2, . . . , 20 into the hash table. Show the final status of table. (You may write a very short script to find the hash values of 1,2,. . . ,20. You may also run this homework using a Python script. However, this is optional. If you do, include your Pythion code as well.)

6b) Search for number 21, and show the cells you have to search to realize it is not in the table.

6c) Delete 1 from the table and show the resulting table.

7. Mini-Python project: In this assignment you will run an experiment to compare several hash functions. Suppose the set of items we are going to store in the table is the set of integers [1…N]; of course with possible repetitions. Suppose we allocate a table of length m for this purpose.

7a) The hash functions we are considering are:

h1(k) = bm(kA1 mod 1)c with A1 = √ 5 − 1

2

h2(k) = bm(kA2 mod 1)c with A2 = ln(2) h3(k) = bm(kA3 mod 1)c with A3 = 0.7 h4(k) = k mod m

7b) Generate N random integers from the range [1, . . . , N] (with repeti- tion) and put them in a list called L. Experiment with N = 1000, and m = 11.

7c) Map the numbers in L to an index using the hash functions hi(k) for i = 1, 2, 3, 4 as above. Put these values in lists called Hi for i = 1, 2, 3, 4.

7d) Compute the histogram of each array, which produces the frequency count of each index using the numpy.histogram function.

7

22:544:613SEC03 & 26:711:685SEC03, Fall 2021 Homework 4 Due date: 12/12/2021 at 11:50PM

7e) Plot each of the four histograms using matplotlib.pyplot function hist.

7f) Compare these four histograms and see if they all “look” reasonably uniform. Does any of them behave poorly?

Do you have a similar assignment that you would wish our expert writers to handle for you? Place an order with us on assignmentsproficient.com and expect the best grades and help from qualified writers