## Description

CS 341: Algorithms

Homework 4

You are allowed to discuss with others but are not allowed to use any references other than the course

notes and the three reference books. Please list your collaborators for each question. You must write your

own solutions. See the course outline for the homework policy.

There are totally 52 marks. The full mark is 50. This homework is counted 10% of the course grade.

1. Programming Problem: Keyboard Edit Distance (20 marks)

In this problem, we are given two strings A and B, and we want to write a program to measure the

“keyboard edit distance” between them. Specifically, starting from string A, we want to calculate

the minimum cost of obtaining string B through a series of insertions, deletions, and single-character

substitutions, where each operation has a prescribed cost. (Strictly speaking, it is not a “distance”,

because starting from string A and starting from string B could result in different costs.)

A and B can only contain uppercase English letters, commas (,), periods (.), and spaces. The cost

of inserting any character is I. The cost of deleting any character is D. The cost of changing one

character to another depends on the distance between the two characters on the keyboard – the closer

the two characters are, the more likely it is to mistype one as the other.

Figure 1: The QWERTY keyboard and the overlaying triangular grid. Note that some of the characters will

not appear in the input strings, so they are not covered by the grid.

Refer to Figure 1. For this problem, we use the QWERTY keybaord layout. We draw a triangular

grid over it, where each vertex corresponds to a character (except the space which spans six vertices).

The cost of changing one character to another is then given by the shortest distance between them on

the grid. For example, the cost of changing from ‘A’ to ‘S’ is 1, the cost of changing from ‘Z’ to ‘P’

is 9, and the cost of changing from ‘ ’ to ‘.’ is 2.

Find the minimum cost to obtain string B from string A through a series of insertions, deletions, and

single-character substitutions.

Input: The first line of input contains four integers: |A| (the length of A), |B|, I, and D. The second

line of input contains the string A. The third and final line of input contains the string B.

A and B can only contain uppercase English letters, commas (,), periods (.), and spaces.

It is guaranteed that 1 ≤ |A|, |B| ≤ 2000 and 1 ≤ I, D ≤ 1000.

1

Output: Output one single integer, the minimum cost of obtaining string B from string A.

Sample Input 1:

5 6 1000 1

ALICE

SPICES

Sample Output 1:

1002

Explanation: ALICE → SLICE → SPICE → SPICES. The cost is 1 + 1 + 1000 = 1002.

Sample Input 2:

12 12 4 3

ASASASASASAS

SASASASASASA

Sample Output 2:

7

Explanation: ASASASASASAS → SASASASASAS → SASASASASASA. The cost is 3 + 4 = 7.

2. Written Problem: Sketching Data with Outliers (12 marks)

We now examine how allowing outliers changes the data sketching problem from HW3. Recall that

our input in this problem is a list of data points (xi

, yi) with increasing xi and an error tolerance ?. In

the original problem, we were required to partition the x-coordinates into intervals from sj to fj and

to compute a value hj for each interval so that

max

i:sj≤i≤fj

|hj − yi

| ≤ ?.

In the variant we now consider, we can proclaim some subset of the data points to be outliers. We

then remove the outliers from the data set, and solve the original problem on the remaining points. If

A ⊂ {1, . . . , n} is the set of outliers, this is equivalent to requiring that

max

i:sj≤i≤fj and i6∈A

|hj − yi

| ≤ ?. (1)

The problem would be too easy if we were allowed to proclaim every point an outlier. Rather, we are

told that we can label at most k points outliers. Our goal is to find a set A of size at most k and a

partition of the input into intervals that satisfy (1) while using as few intervals as possible.

Design the fastest algorithm that you can to solve this problem. Prove the correctness and analyze the

time complexity.

(Hint: You may begin by considering the following simpler variant of this problem: how can we find

the smallest set A and value h so that

max

i:1≤i≤n,i6∈A

|h − yi

| ≤ ?.

Once you can solve this, you should be able to solve the whole problem by dynamic programming.)

2

3. Written Problem: Covering Set (10 marks)

Given an undirected graph G = (V, E), a subset S ⊆ V is a covering set if for every vertex v ∈ V − S

there exists u ∈ S such that uv ∈ E (i.e. every vertex v ∈ V − S has a neighbor in S). We are

interested in finding a covering set of minimum total weight, but this problem is NP-hard in general

graphs. Show that this problem is easy in trees.

Input: A tree T = (V, E) in the adjacency list representation, a weight wv for each vertex v ∈ V .

Output: A covering set S ⊆ V that minimizes the total weight P

v∈S wv.

Figure 2: The black vertices form a minimum covering set assuming all weights are one.

Design the fastest algorithm that you can to solve this problem. Prove the correctness and analyze the

time complexity.

4. Written Problem: Running to Meetings (10 marks)

A professor has many meetings to go to every day. In fact, it is impossible to get to all of them. We

will figure out how to get our professor to as many meetings as possible.

Our problem has two inputs: a list of meetings with their locations and a map of campus. We will

model the campus as a weighted complete graph G = (V, E, l), where V is the set of possible locations

of meetings, E consists of all possible pairs a, b ∈ V , and l gives the (shortest path) distance between

vertices. We will assume that the time required to get from vertex a to vertex b equals the distance

between them, l(a, b). Meeting i is specified by a vertex, ai

, the time that the meeting starts, si

, and

the time that meeting ends, fi

.

Design an algorithm that takes as input the graph G and the list of meetings, and returns a subset of

meetings for the professor to attend. The professor must be able to get from one meeting to the next.

That is, if the professor goes from meeting i to meeting j, it must be the case that fi + l(ai

, aj ) ≤ sj .

You should assume that the professor’s first meeting is with a coffee shop at vertex a0, that the professor

can always get to this meeting, and that it ends at time f0. Design the fastest algorithm that you can

that gets the professor to as many meetings as possible. Prove the correctness and analyze the time

complexity.

3