AMAT 587: Algorithms for Data Science
Spring 2020, Class #10186
MW 11:40-1:00, Zoom
Instructor: Michael Lesnick
mlesnick [at] albany [dot] [the usual thing]
Office Hours: Office Hours: T, Th 1:00-2:00, and by appointment.
Send Anonymous Feedback
About this Course:
This course is intended primarily for students in Albany's Data Science
MS program in their second year. It may also be suitable for some
first-year students with a strong background in math, and for math
Ph.D. students interested in computation. There are no formal
prerequisites, but mathematical maturity and comfort with proofs is required.
The course will be taught in an online-synchronous format via Zoom.
Students are expected to attend and partipate, to the extent
possible. All lectures will be recorded and made available. Course
materials, including Zoom links for class and office hours, will be hosted on Blackboard.
The first part of the course will cover general background on
algorithms, with an emphasis on topics of interest to data
scientists. For some of the material, I will follow parts of the text "Introduction to Algorithms" by Cormen,
Leiserson, Rivest, Stein.
In the second part of the course, we will take a close look at a few
algorithms of particular interest in data science. This part will
center around individual student
projects, where students learn independently about a fundamental
algorithm and its applications in data
science. Each project will culminate in a presentation to the class
(30 minutes) and also an expository paper.
Possible topics (either for my lectures or for student projects) include:
- asymptotic complexity
- basic data structures and algorithms (e.g. sorting, binary search trees, priority queues)
- graph algorithms (e.g., breadth/depth-first search, shortest
paths, MST, matching)
- NP-completeness
- floating point arithmetic and condition numbers
- Union-find, applications to clustering
- Delaunay triangulations, applications to TDA
- Assignment problems
- Dynamic programming, applications to sequence alignment
- Spectral clustering
- Auctions
- approximation algorithms
- randomized algorithms
- Decidability
The course will emphasize the mathematical thinking behind algorthms,
and not implementations/coding. There will be no required coding
assignments, though final projects can potentially involve implementation work and computational experiments.
The material covered in this course will be challenging (more so than
my TDA I and II courses) and students
should expect to put in at least 8-10 hours of focused, independent
work on the class outside of lecture.
Resources:
Some of the following resources might be useful, e.g., for your project:
- "Algorithm Design," by Klienberg and Tardos is a popular
algortihms book. Here are (a revision of) the publisher's official
slides accompanying the book.
- "Foundations of Data Science" by Blum,
Hopcroft, and Kannan. Video Lectures by Kannan.
- Jelani Nelson's coure "Sketching Algorithms" (including video
lectures and notes)
- David Woodruffs's course "Algorithms for Big Data" (including video
lectures and notes)
- A list of online courses on sketching and algorithms
for big data
- Ankur Moitra's text "Algortihmic Aspects of Machine Learning"
- Sipser's text "Introduction to the Theory of Computation"
is a classic text on Turing machines, formal notions of
computation, etc.
Homework and Quizzes (tentative):
Homework will be assigned semi-regularly. In addition, we may
occasionally have quizzes. Homeworks and quizzes will be weighted
equally. The lowest two scores among the homework
and quizzes will be dropped. Homework is to be handed in at the beginning
of class on the day it is due (you will have a 5 minute grace period), and this rule will be enforced
strictly. Homework handed in at most one day late may be accepted with a 30%
penalaty, or at most two days late with a 50% penalty. You may
discuss homework with your classmates, but homework must be written up
on your own.
Grading:
The class will use the university's A-E grading scheme.
35%: Homework and quizzes
20%: Midterm
20%: Project Presentation
20%: Project Writeup
5%: Class participation and engagement
NOTE: The midterm may be curved, but not downward.
Pandemic-Related Challenges:
The pandemic--and the move to online instruction in
particular--creates a complex set of potential difficulties for
students. I intend to hold this class to a high standard of effort,
but at the same time, I am mindful of the unique challenges our
situation presents, and I intend to conduct the class accordingly. If
you are dealing with issues created or exacerbated by the pandemic
that risk getting in the way of your being a focused, active
participant in this class, please let me know.
Academic Regulations:
Naturally, the University's Standards of Academic Integrity apply to
this course, and students are expected to be familiar with these.