AMAT 587: Algorithms for Data Science
Spring 2022, Class #9460, Massry B014
T, Th 1:30-2:50
Instructor: Michael Lesnick
mlesnick [at] albany [dot] [the usual thing]
Office Hours: T, Th 3:00-4:00, and by appointment.
Send Anonymous Feedback
About this Course:
This course is intended primarily for students in Albany's Data Science
MS program in their second year. It may also be suitable for some
first-year students with a strong background in math, and for math
Ph.D. students interested in computation. There are no formal
prerequisites, but mathematical maturity and comfort with proofs is required.
Students are expected to attend and participate. Course
materials will be hosted on Blackboard.
The course will cover general background on
algorithms, with an emphasis on topics of interest to data
scientists. For some of the material, I will follow parts of the text "Introduction to Algorithms" by Cormen,
Leiserson, Rivest, Stein.
Towards the end of the course, we will take a close look at a few
algorithms of particular interest in data science. This part will
center around individual student
projects, where students learn independently about a fundamental
algorithm and its applications in data
science. Each project will culminate in a presentation to the class
(20-30 minutes).
Possible topics (either for my lectures or for student projects) include:
- asymptotic complexity
- basic data structures and algorithms (e.g. sorting, binary search trees, priority queues)
- graph algorithms (e.g., breadth/depth-first search, shortest
paths, MST, matching)
- NP-completeness
- floating point arithmetic and condition numbers
- Union-find, applications to clustering
- Delaunay triangulations, applications to TDA
- Assignment problems
- Dynamic programming, applications to sequence alignment
- Spectral clustering
- Auctions
- approximation algorithms
- randomized algorithms
- Decidability
The course will emphasize the mathematical thinking behind algorithms,
and not implementations/coding. There will be no required coding
assignments, though final projects can potentially involve implementation work and computational experiments.
The material covered in this course will be challenging (more so than
my TDA I and II courses) and students
should expect to put in substantial effort outside of lecture.
Resources:
Some of the following resources might be useful, e.g., for your project:
- "Algorithm Design," by Klienberg and Tardos is a popular
algortihms book. Here are (a revision of) the publisher's official
slides accompanying the book.
- "Foundations of Data Science" by Blum,
Hopcroft, and Kannan. Video Lectures by Kannan.
- Jelani Nelson's coure "Sketching Algorithms" (including video
lectures and notes)
- David Woodruffs's course "Algorithms for Big Data" (including video
lectures and notes)
- A list of online courses on sketching and algorithms
for big data
- Ankur Moitra's text "Algortihmic Aspects of Machine Learning"
- Sipser's text "Introduction to the Theory of Computation"
is a classic text on Turing machines, formal notions of
computation, etc.
Midterm:
We'll have an in-class (Zoom) midterm, tentatively on Thursday, March 24.
Homework and Quizzes (tentative):
Homework will be assigned semi-regularly. The lowest homework
score will be dropped. Homework is to be handed in at the beginning
of class on the day it is due (you will have a 5 minute grace period), and this rule will be enforced
strictly. Homework handed in at most one day late may be accepted with a 30%
penalty, or at most two days late with a 50% penalty. You may
discuss homework with your classmates, but homework must be written up
on your own.
Grading:
The class will use the university's A-E grading scheme.
30%: Homework
30%: Midterm
30%: Final Presentation
10%: Class participation and engagement
NOTE: The midterm may be curved, but not downward.
Pandemic-Related Challenges:
The pandemic creates a complex set of potential difficulties for
students. I intend to hold this class to a high standard of effort,
but at the same time, I am mindful of the unique challenges our
situation presents, and I intend to conduct the class accordingly. If
you are dealing with issues created or exacerbated by the pandemic
that risk getting in the way of your being a focused, active
participant in this class, please let me know.
Academic Regulations:
Naturally, the University's Standards of Academic Integrity apply to
this course, and students are expected to be familiar with these.